Winter Corporation


WinterCorp

411 Waverley Oaks Road

Waltham, MA 02452

Phone: 781.642.0300

Fax: 781.642.7222

 

Contact us | Privacy

      

Winter Corporation VLDB News

Intelligent Enterprise
April 1999

Saving More

Richard Winter


Just how scalable do scalable systems, the subject of this issue’s theme, need to be? There is endless rhetoric on this topic, usually linked to the rate at which disk prices are falling. And the rhetoric is not groundless: If disk prices fall, as they have, by roughly a factor of 10 over three years, implementers do have to take into account that databases will grow.

But there’s more to how much data goes into databases than the price of disk space. That’s why it’s worthwhile to look at what some users are doing — and planning — in terms of database growth. The Winter VLDB (very large database) Survey provides a view.

Reported Actual Growth

It’s always interesting to compare past growth to users’ projections of future growth. So the first question we ask regarding database growth is, “By what percentage did your database grow in the last year?”

In our 1998 survey, 29 percent of respondents reported that the size of their database had at least doubled in the prior year. This figure increased from 1997, when 17 percent reported a growth of 100 percent or more in their database’s size.

You can infer two things from this information. First, it’s not a small fraction of VLDB users that are coping with databases that doubled within the last year. Second, the percentage of users facing this challenge is growing rapidly. This statement is remarkable — almost a third of the really large databases we can find doubled in the past year. And if your database didn’t double last year, stay tuned, because it very well may this year; the percentage is climbing fast.

As remarkable as this information is, it provides a noteworthy contrast with the statements of some well-known pundits, who can be heard to remark at every conference that database size has been doubling annually in virtually every company for the last five years. That claim is evidently false — at least in reference to databases that are already big.

At the same time, there certainly are databases growing faster than 100 percent per year. In both of our last two surveys, we found that about eight percent of respondents had VLDBs that more than tripled in the previous year. In fact, the percentage of respondents whose databases had quadrupled during the preceding 12 months had increased from two percent in 1997 to about five percent in 1998. Several of these respondents were companies that had databases of a few hundred gigabytes 12 months earlier and now had over a terabyte of data. One company had more than a terabyte of data in 1997 and quadrupled its size to more than 4TB in 1998!

If these numbers don’t knock you over, because you are used to hearing salespeople talk about 10- and 20TB databases as though they’re all over the place, bear in mind that most vendor literature uses an inflated figure: Usually, vendors talk in terms of the total complement of disk connected to the processor, often as much as 10 times the volume of user data. The Winter VLDB Survey expresses database size in a measure that includes only user data, summaries, and aggregates. Thus, our figures for database size are closer to the volume of data that actually must be stored and are smaller than those quoted by the vendors. You need the kind of figure the vendors quote when estimating your disk requirements for an installation, but for most other purposes it’s better to focus on a metric closely related to user data.

Projected One-Year Growth

Our respondents’ projections also support the notion that the rate of growth is accelerating. In the 1997 survey, 19 percent of respondents expected the size of their database at least to double in the coming 12-month period. By 1998, that figure had increased to 24 percent, so the portion of those planning on at least doubling data volume has been increasing. Furthermore, most of those who had doubled stored data in the prior year were expecting at least to double it again.

Perhaps more telling is the percentage of respondents planning on at least tripling database size in the year ahead. This figure grew from 5 percent in 1997 to 13 percent in 1998! An example of the sort of planning you see within this group is a telecommunications company that had a 1.1TB database in production at the time of our 1998 survey. This company was planning a 500 percent increase in its data volume within 12 months, resulting in an estimated 6.6TB database size; I’m waiting with interest to see whether it achieves that. So the percentage of respondents planning to double or triple their database size within the coming 12-month period is also accelerating.

Projected Three-Year Growth

Three-year growth plans of our respondents also reflect some acceleration. In 1998, 49 percent of respondents expected their databases would at least double within three years. This figure increased from 42 percent the year before. Similarly, 26 percent of respondents expected in 1998 at least to triple their database size within three years. This portion increased from 19 percent the year before. The group expecting database size at least to quadruple within three years increased slightly, from 13 to 15 percent.

I’m particularly interested in discovering what percentage of VLDB users expect their database to increase by a factor of 10 within three years. In 1997, there were none. In 1998, this group comprised six percent of respondents. To paraphrase Crocodile Dundee, that is what I call a scalability requirement!

Observations

I see two interesting points in this data. First a group of “high fliers” within the VLDB community has databases exhibiting the swift growth rates we all read about. These are the users with the extraordinary scaling requirements that are challenging the industry on virtually every front. Probably the broadest appropriate definition of that group in the VLDB space would include the companies where large databases are at least doubling annually. Our data helps estimate the proportion of this group to be in the range of 24 to 29 percent and growing fairly rapidly.

The high fliers are the people whose database growth is at or above the rate at which disk prices are declining. A subset of the high fliers is a group we might call “eagles,” whose database sizes are racing up by a factor of between 5 and 10 over relatively short periods ranging from 12 to 36 months. An important signal to all who have a stake in large databases — both users and vendors — is that both the high fliers and the eagles appear to represent a rapidly increasing proportion of the population.

But the high fliers represent somewhere between a quarter and a third of VLDB users. The other two-thirds to three-quarters of users have been experiencing lower database growth rates. If you listen to most of the pundits on the conference circuit, however, you would never know this. They blithely tell us that “everyone’s” data volume is doubling annually. But according to our data, most VLDB users are not seeing such a growth rate.

So what is going on? In reality, disk prices do not drive database growth. They enable database growth in an economic sense. Only business value can drive database growth. And there is a cycle involved in delivering value.

Business value requires two things: a business case (for example, storing complete histories of each customer’s purchases will allow us to anticipate customer needs better and hence buy the right merchandise) and an implemented business reality that actually delivers on the actions — and the value — promised in the case. Without both, the cycle won’t be sustained and database growth will sputter to a halt.

A halt in database growth happens in many companies for a period of time. In fact, I think it has happened in a lot of companies in which big data warehouses were built without attention to value. After the money is spent and no value materializes, funding for new projects slows or stops for a year or two.

The hard part is not, of course, buying the disk or making the business case. The hard part is delivering real business value through running systems. Some elements in this equation are difficult with databases of any size.

But, one element that is particularly difficult in VLDBs is data acquisition. To deliver business value, you have to complete and sustain the entire cycle of data acquisition. And, in the VLDB setting, you need to accomplish this acquisition with a huge volume of data. So, you need to identify the data sources and assess the quality and semantics of the data, as well as transform, cleanse, eliminate duplicates from, “household” and/or “individualize,” and load it. That’s a tall order for the database growth rates I’m talking about.

I believe that high fliers are managing to sustain this cycle of defining, implementing, and delivering business value from ever more detailed data. They are actually completing the cycle so their clients and management continue to fund ongoing investments, increasing data warehouse size or operational system detail. They are able to keep this cycle going in part because they have the infrastructure of skills, methods, tools, partners, and standards to execute large-scale data acquisition and implementation projects effectively. So the high fliers sustain or increase their annual spending on disk and other database-related products — and actually fully consume the marginal disk capacity made available to them by each decrease in disk price. And the high fliers are able to realize competitive advantage through their ability to leverage detailed data.

Meanwhile, the rest of the crowd is not able to create business value by adding detailed data — or at least is not able to do so at the same pace. In some industries, the incremental value of more detail may not be as high. In others, the infrastructure may not exist for data acquisition. In others, the business approach to exploit the data — and hence the business case to benefit from it — may not yet have been designed. Whatever the precise explanation, these users either lack the business drivers to move ahead or the IT capabilities to implement larger databases. Hence their databases do not grow rapidly, and they are not limited by disk prices.

This explanation is consistent with what I observe in the field and with the available data. Furthermore, it’s consistent with the trend exposed by the data: The rest of the VLDB crowd is learning because the percentage of high fliers is growing. If current trends continue for a few years, almost all VLDB users will be coping with extraordinary, ongoing demands to handle huge, rapidly growing databases.