Winter Corporation


WinterCorp

411 Waverley Oaks Road

Waltham, MA 02452

Phone: 781.642.0300

Fax: 781.642.7222

 

Contact us | Privacy

      

Winter Corporation VLDB News

Intelligent Enterprise
July 1999

Testing the Terabytes

Richard Winter


News of breathtakingly large business intelligence (BI) projects has been circulating lately. projects has been circulating lately. Citibank aims to build a base of a billion customers. MatchLogic Inc. soon will be recording a billion Web page impressions a day in its database so it can tell its customers how best to spend their Web advertising dollars. Every month, if not every week, brings another story of immense investments in BI projects that implement an extraordinary new business vision.

But, guess what? As well as thrilling us, these huge BI systems had better work.

You absolutely must test early and on a large scale. The idea of huge BI systems that simply have to work, of course, leads to the Richard Winter theme song: When it comes to database scalability, the bigger it is, the more you have to test it — and before you get too far downstream. That is, you can’t wait until you’ve bought all the equipment, built the applications, and created the production database before testing.

But early large-scale testing is the very issue that causes most people to throw up their hands. After all, how can you possibly create a 1TB database (or worse, a 10TB one) early in the design process? It’s difficult to assemble the equipment at this point in the project. It’s difficult to free up the resources to design, implement, and evaluate large-scale tests. It’s also difficult to prevent other project activities from disrupting a large-scale test process. And for many users, it’s difficult to sort out what to test.

Traditional Testing Options

The top vendors of very large databases (VLDBs) do regularly assist their clients with large-scale proof-of-concept as part of their professional service offerings. We spoke with Compaq Computer Corp., Microsoft, Hewlett-Packard, NCR, Oracle, and Sun Microsystems to get an idea of what they offer.

NCR described a proof-of-concept approach used for a long time at its benchmark centers and at customer sites. Oracle generally conducts large proof-of-concept projects at the customer site. Most vendors described lab facilities or benchmark centers that are occasionally pressed into service to do a more extensive customer proof-of-concept.

A New Option

When we have been involved, both NCR and Oracle have done a fine job, and NCR’s experience with extreme requirements has shown through. However, something no vendor has done until recently goes another step beyond.

IBM has created a group of test and integration centers intended to help customers who need to complete a large-scale BI proof-of-concept. Called the Teraplex Integration Centers, these are permanent, large-scale testing laboratories with dedicated equipment and personnel, open for customers and business partners to use.

There is a Teraplex Center for each of IBM’s major BI platforms: RS/6000, OS/390, AS/400, and Netfinity (for Windows NT). They are staffed with dedicated, interdisciplinary teams of IBM personnel who can cover issues in hardware, system software, database software, and so on. Each center has terabytes of disk and large complements of hardware and software in place. In addition, each center has the ability to obtain additional hardware, software, and people to support specific projects.

The idea behind these centers is to provide a laboratory where real-world integration and testing can take place. Thus, customers ordinarily bring real data. A typical test reproduces all the critical elements of the customer’s environment — or planned environment. Thus, when customers use independent software vendors’ tools, utilities, or applications, they install them at the center under a temporary license. As a result, there have been Teraplex projects involving Oracle, Informix, and other database engines. The point is that BI solutions are ordinarily created with the products of multiple suppliers, and a critical element of the challenge is determining the performance and scalability of the resultant integrated system.

Not Benchmark Centers

A noteworthy point about the Teraplex Centers is that they’re entirely separate from IBM’s benchmarking centers. IBM uses the benchmarking centers principally for running industry-standard benchmarks (such as TPC) and for competitive measurements that are part of the sales cycle.

The Teraplex Centers, however, are more for client-defined feasibility studies and proof-of-concept exercises. These studies and exercises are focused on integration, performance, and scalability in relation to a specific business problem, where the client is intent on implementing a solution. The experience is more realistic, in the sense that there seldom are artificial rules or artificial deadlines; there is no contest.

Case Study: Aetna

Aetna U.S. Healthcare, which manages 13 million medical policies, came to the RS/6000 Teraplex Center to test the stability, performance, and scalability of its data warehouse with a multiuser workload under a new version of DB2. Aetna expected its warehouse to grow from 200GB to more than 1TB in the first year, and subsequently, up to 2 to 3TB. The company needed to ensure that the technology could deliver on a database that large.

The objectives of Aetna’s Teraplex project were:

•Test the performance and scalability of DB2 Universal Database in a large, multiuser, data warehouse environment using real data and up to 33 concurrent queries.

•Test the stability of DB2 Universal Database. Aetna would be one of the first customers to put it into production for a data warehouse expected to exceed 1TB.

Aetna used its own data, delivering 100GB to the Teraplex, from which IBM created the two larger databases of 500GB and 1TB. The largest table in the 1TB database was over 400GB, with 935 million rows. All sensitive information was encrypted for security and patient confidentiality. Aetna also provided several complex queries — many of which required joins involving between 11 and 17 tables, and represented a star or snowflake pattern. The tests ran during a three-month period.

Aetna was able to meet its performance requirements on the 1TB database when running up to 33 concurrent, extremely complex queries, verifying the viability of its hardware and software architecture for future business needs. The testing also enabled Aetna to put its data warehouse into operation a few months sooner than it would have otherwise.

More is Better

We would like to see more full-scale test and integration centers in the industry. Not only would it help the customers, which indirectly helps the vendor, but it directly aids the vendor as well. The VLDB producer providing such a service gains the opportunity to work hands-on with a new product running against a realistic, large-scale, customer workload in an environment in which it can be fully metered, analyzed, and subjected to experimentation.

The experience of participating in one of these projects feels different from that of running a typical benchmark, and it leaves you with a remarkable sense of its distinctive value. The combination of dedicated centers, dedicated staff, professional management, institutionalized executive support from the highest levels of the organization, and a systematic process results in a palpable difference.

Also, hardware can be reconfigured much more quickly at the center than is possible at a customer site. At the same time, this Teraplex idea allows you to have the process last longer and take on a more exploratory character than what typically happens in a benchmark center. Because there is no pressure to win a “contest,” there is an opportunity to investigate options more thoroughly, furthering the process of meeting your demands.