Winter Corporation


WinterCorp

411 Waverley Oaks Road

Waltham, MA 02452

Phone: 781.642.0300

Fax: 781.642.7222

 

Contact us | Privacy

      

Winter Corporation VLDB News

Intelligent Enterprise
December 1999

E-Scalability Challenge

Richard Winter


The Internet hasn't quite changed everything, but it has certainly transformed the database scalability challenge in ways that have only recently come into clear focus. And, from where I sit, it looks as though the database engine is right at the fulcrum of the most powerful forces unleashed by the remarkable rise of e-business.

Growing Requirements of E-commerce

Here are a few of the fundamental, breathtaking changes putting pressure on the capabilities of even the most advanced database architectures:

Huge numbers of concurrent users. As e-commerce gathers momentum in the distribution of retail products and services, we are marching steadily toward the moment when there will be online, constantly updated databases that will be used directly by truly enormous populations . much larger than even the largest ones existing today.

Established online service providers already serve millions of people. As such notions as . pervasive computing. and . the Internet appliance. catch on, we will have millions of Internet users who do not even need a PC. Services for such populations as . most U.S. consumers,. . most European homeowners,. or . most soccer fans worldwide. appear to be just around the next corner. Governments will be in this game, too, with online services for . all taxpayers. and other immense populations.

As these services take hold, online users of a single e-commerce service may soon reach levels ranging from 50 to 100 million. I think we will pass the hundred million mark within a few years.

With huge online user populations, we will see enormous swings in the number of online, concurrent users. IBM already cites data showing that on e-commerce sites the peak demand is much higher . compared to the average demand . than on ordinary server sites.

Continuous availability. It. s now beyond discussion: Large e-business operations simply must be up all the time. The difficulty of meeting this requirement increases with two factors: database size and transaction volume. In e-commerce, both these factors grow rapidly as user volume grows.

Extremely large stored-data volume. Today, the trend is toward storing the clickstream in a data warehouse where it is then analyzed and mined to better understand customer behavior, customer and product profitability, and other e-business issues. Clickstreams for large user populations are the biggest things around. It doesn. t take long to accumulate a terabyte of clickstream when you have a large-scale, actively used Web site. So, I think we. ll see data warehouses with a hundred terabytes of clickstream data within a few years . and probably petabyte databases of e-commerce activity in something like five years.

Near realtime, very large-scale decision support. The data warehouse became a mainstream phenomenon in business on a foundation of periodic batch update. That is, data warehouses are most widely used to support analytical applications that produce results based on events from yesterday or last week.

But e-commerce is increasingly focused on a faster moving world, in which the events of a minute ago are important to decision making. An e-commerce operator may want to extend a special offer to a customer based on something the customer communicated a moment ago. In fact, it may be critical to reconfigure the . store. the customer is visiting based on information that can be pieced together from something the customer did a moment ago, combined with something he or she did at the site last month or last year.

So, information must flow into the data warehouse continuously and in large volumes, be exploited more or less immediately, and then be relayed to operational systems . all in a time frame that might plausibly range from 10 seconds to a minute.

The e-commerce requirements for the database of the next few years then include:

  • A large step upward in user populations and transaction rates.
  • Continuous availability .
  • Extremely large data volumes in the data warehouse.
  • Continuous update of the data warehouse.
  • Immediate exploitation of new data in the data warehouse.

Considering the size and growth rate of e-commerce operations, this combination of requirements would severely strain today's database engines and today's decision-support architectures.

So once again, only a few years after the last set of revolutionary advances in database architecture (such as parallelism, cost-based optimizers, and advanced indexing techniques), changes in the business environment have brought us a new set of scale-related database challenges.

Pervasive Computing

In my view, one of the fundamental forces powering this change is pervasive computing: the notion that specialized, often connected, information appliances will be just about everywhere in our lives, performing tasks on our behalf as we move around the house, office, or neighborhood.

The Internet has brought astonishing change already, but we forget that virtually all Internet use today occurs when someone is seated at a relatively expensive, complex device we know as a personal computer. As inexpensive Internet appliances (mobile and otherwise) really catch on, the Internet will become accessible to hundreds of millions more people around the world - people who might never buy or extensively use a personal computer.

Many people who are attracted to the Internet still make limited use of it because they simply cannot be seated at a computer more than a few hours a day. As pervasive computing catches on, these people will generate many more Internet transactions a day because the use of the Internet will be integrated into a many more of their daily activities. Of course, there is already endless talk in the industry about the scalability requirements of e-commerce. But most people forget that the scalability requirement is most intense at the database.

Why? At the heart of most e-commerce sites is a database that has to be updated. At a minimum, the database tracks the identity and registration of the service's users. More typically, it is recording data on customers, accounts, shopping baskets, transactions, inventory, and other subjects needed to operate the service.

The updatable database is different from all other elements of the e-commerce infrastructure in that it cannot be replicated in the ordinary sense.

For example, if a front-end Web server becomes overloaded, there is a more-or-less straightforward solution: replicate it. The Web servers operate essentially autonomously of one another. Infrastructure is needed to route users, distribute and manage workload, and coordinate failover.

Operating multiple, replicated Web servers has its complications, but it is a well worked out problem. Not much information must flow between them. The same is true for the application server.

However, the database server is different. You can't simply replicate a large-scale, continuously updated, high transaction-volume database. At least, you can't do it and double your capacity. And, the extent to which you can do it at all - with the scale, reliability, and performance required in high-end e-commerce - is in doubt.

So at present, the volatile elements of the database must remain a single, integral, unreplicated element of the e-commerce infrastructure. And a single database server - or perhaps a cluster of such servers - must handle the immense, rapidly growing workload.

E-commerce sites may have a shadow database for backup or disaster recovery, and they may replicate static databases that are used to serve up information that doesn't change moment-to-moment. But the heart of the operation is a volatile database implemented on a centralized database architecture. And that volatile database is at the fulcrum of rapidly escalating requirements for scalability and performance, with no room to compromise on availability.

Just around the Next Corner

The bottom line is this: Our most successful, booming e-commerce operations are going to need another very large factor in scalability over the next three to five years - a factor that far exceeds any projected growth in the capacity of hardware components. Furthermore, the data warehouses that go with them, expected to provide the business intelligence so integral to the e-commerce strategies, need to handle much larger data volumes and continuous updates. Some large-scale data warehouses in operation do continuously update. There certainly has also been noteworthy progress in database scalability and availability over the last several years. But we have a long way to go before the next level of e-commerce requirements can be satisfied. Those requirements are just around the next corner.

Richard Winter is a specialist in large data base technology and implementation, and is president of Winter Corp., Waltham, MA. You can reach him via email at (Richard.Winter@wintercorp.com or by fax at 617-338-4499.)