WinterCorp Logo


WinterCorp

245 First Street

Suite 1800

Cambridge, MA 02142

Phone: 617-695-1800

Fax: 617-209-4938

 

Contact us | Privacy

      

Winter Corporation VLDB News

The Big Time
Richard Winter and Kathy Auerbach

The 1998 Winter VLDB Survey Program winners are bigger, better, and overwhelmingly relational

With the 1998 VLDB Survey Program results in, we can now report a historic milestone. For the first time, the world's largest in-production, commercial database--a 16.8TB, DB2-based system at United Parcel Service (UPS)--is running on an off-the-shelf relational DBMS.

For those of you who have watched the relational saga unfold, this moment is a truly dramatic one. When E. F. Codd published his revolutionary paper in 1970, the relational concept was a radical idea that few people thought would actually work. Indeed, many professionals of the day would have scoffed at the thought of the world's largest databases running on relational engines.

Twenty-eight years after Codd's seminal paper, relational database engines dominate the entire database scene, including the very largest systems. Of course, relational database vendors can't rest on their laurels: object-oriented technology is knocking on the door. But for this year at least, we can say that relational database management more or less rules the VLDB industry.

There is also a historic "nonmilestone" in the '98 program. For the second year in a row, mainframes show no sign of losing ground in the VLDB marketplace. Not only are mainframes far from dead, but they don't even look endangered. For high-end decision support, Unix is the leading choice. But according to VLDB Survey Program data, mainframes never owned this market in the first place. Mainframes continue to dominate for large-scale, mission-critical OLTP.

For the second year in a row, we nominate NT as the biggest VLDB phenomenon that didn't happen. Although no one forecasted terabyte-sized NT databases for early '98, over the past three years, the drumbeat of NT database scalability has been growing steadily louder. We're ready to create separate categories and awards in the program for NT, but thus far, there has been no reason to do so. In fact, the '98 program garnered only one qualifying NT response, although the entry was impressive: 542GB of data in Microsoft SQL Server running on a DG AViiON server. This may be a harbinger of things to come, but we're not there yet.

Meanwhile, just like last year, the databases are a whole lot bigger. (We hope that the vendors will quickly be able to make their engines smarter, because we're going to need all the optimization they can deliver.) The biggest ones more than doubled in size over the last 12 months. And for the first time, we're into double-digit terabytes for raw data; some organization may even reach the 25TB mark by year's end.

THE SURVEY PROGRAM

With companies expanding because of strong financial performances, favorable economic conditions, mergers, and acquisitions, VLDBs have become de rigeur for many corporations. Now more than ever, the need to better understand, plan for, implement, and manage the systems that transform raw data into savvy business intelligence is a top priority. Regardless of whether the company is new, mature, or well established, the corporate database is among the most prized components of the enterprise. The Winter VLDB Survey Program, which assists companies in improving business results and reducing risks, speaks directly to this compelling business need.

The VLDB Survey Program, now in its fourth year, assesses the methodologies, product choices, and practices of the world's large database installations. The survey, which is cosponsored by Database Programming & Design, is a central component of the Winter Research and Recognition Program, a research service devoted to tracking trends and issues in large database technology exclusively. Program reports and publications feature fact-based research about using large-scale data resources to implement successful and timely business strategies. The VLDB Survey Program has these objectives:

• Build a reservoir of actionable knowledge about very large databases

• Assist VLDB professionals in raising organizational productivity, managing risk, and reaching strategic objectives

• Identify the largest databases in the world and celebrate those responsible for them.

Interest in the survey program has grown steadily and spread to publications and industries outside of computing. Articles in Computerworld, Information Week, PC Week, Chicago Software News, and Bank Technology News cited the 1997 survey program, and it was even covered on television.

Preparation for the 1998 program began in the fall of 1997, four months before the launch. Winter Corp. drafted the 1998 questionnaire and circulated it for review among staff members, outside consultants, and representatives of the leading database, storage, and system vendors. The final survey contained a demographic section and 12 multipart questions on topics such as DBMS choice, hardware and storage environments, database size and usage, database and system architectures, growth patterns, and workload activity.

The data collection campaign was launched in December 1997 and completed in February 1998. One of its objectives was to distribute the questionnaire as widely as possible. As with each preceding campaign, Winter Corp. increased the number and variety of distribution techniques. Program information was mailed to former participants, conference attendees, and database professionals listed in commercial mailing directories, followed by telemarketing calls to select recipients. The survey program was also listed in The Data Administrator's Newsletter (TDAN), an Internet-based publication for corporate data managers. One of the high points in publicizing the campaign was the article on the program that appeared in Information Week in early February.

This year, the Internet played a leading part in distributing the questionnaire and collecting the data. Seventy percent of participants either downloaded the survey electronically or printed it out locally. Not surprisingly, the Internet was the primary conduit for the international sites, which represented 14 countries and contributed more than 30 percent of the surveys received this year. In a typical week, there were about 4,300 hits to the Winter Corp. Web site, 15 to 20 percent of which originated abroad.

THE 1998 GRAND PRIZE WINNERS

The 1998 program used three metrics to identify the leading installations:

• Most data, including user data, summaries, aggregates, and indexes, but excluding freespace and redundancy. You may see articles about databases that are larger than those in the survey program, but their figures will include disk space allotted to freespace and redundancy. The program assesses size based only on the dimensions of usable database components.

• Most rows, records, or objects.

• Peak online activity. For OLTP systems, this metric represents the highest number of transactions per second (TPS). For decision-support systems, the criterion applies to the most concurrent, online (not batch), in-flight user queries, reports, and updates.

Separate awards were given for decision-support and transaction processing systems, and Unix environments were distinguished from traditional operating platforms. This year, Winter Corp. also differentiated systems by database architecture. Participants were asked to characterize their system as either centralized, distributed, or federated. A federated database is implemented on multiple engines that operate autonomously but presents a single, integrated image of the database to users and applications. Because federated systems differ significantly from conventional databases, in instances where the Grand Prize (first place) winner is a federated system, Winter Corp. also announced a nonfederated Grand Prize winner. Based on these definitions, the '98 program had 12 database categories and 17 Grand Prize winners. The awards ceremony occurred in March at the VLDB Summit in Beverly Hills, Calif.

Most Data and Most Rows In a Transaction processing System, All Environments, Federated Architecture

Sheer volume is the metric that captures the most imaginations. Accordingly, by far the largest database of any entrant in the '98 program is the one run by UPS. This unrivaled system weighs in at 16.8TB--11.3TB of data and 5.5TB of indexes. (Here's another way of describing how big the database is at UPS: Each DBA is responsible for supporting 216 billion records.) Some of you will recall the twin UPS Grand Prize winners in this category last year; UPS has combined these two databases and three others into a single federated system called the Package Level Detail Repository (PLDR). Richard Bader, UPS database analyst, submitted the survey.

The PLDR "virtual" database comprises five distinct nodes and functions: pickup information, internal scanning data, package delivery details, premium delivery service, and a repository for service exceptions. Two pieces of software, both developed by UPS, provide an integrated view of the database: Internal users access the Online Custom Automation System program while external customers use the Full Visibility Tracking system. Anyone who has tracked a package through the UPS package center network will confirm the merits of this remarkable system.

PLDR summaries and aggregates are housed in a separate data warehouse. The database is implemented in IBM's DB2 and hosted on a five-node cluster of SMPs, including multiple IBM S/390 machines and a Hitachi Data Systems 427. There are 45 processors in the total configuration. IBM RAMAC and EMC 5500 and 5430 devices provide storage.

Also contributing to PLDR's expansion is its steady growth rate over the course of 1997. What's more, UPS reports that the system is still growing; Bader projects that the system will swell by 20 percent next year.

Table 1 shows that many of the other winners in this category are repeat winners from 1997. Telstra, Experian, and IZB Software--all well-known names from past programs--again finish among the leaders.

Rank

Organization

DBMS

Processor

Architecture

Storage

Data (GB)

1 UPS (PLDR; federated) DB2 IBM S/390, Hitachi Skyline SMP EMC, IBM 11,090
1 Telstra DB2 Hitachi Skyline Cluster HDS, EMC, IBM 4,350
2 Experian (credit reporting) DB2 Hitachi Skyline 62 SMP EMC 2,739
3 SK Telecom DB2 IBM S/390 MPP HDS, IBM 1,955
4 IZB Software Adabas Amdahl 5995 SMP EMC, Comparex 1,310
5 Caixa Economica Federal CA-IDMS IBM S/390 SMP EMC, HDS, IBM 661
6 Victoria's Secret Catalogue DB2 Amdahl SMP IBM, EMC 660
7 DBS Systems Corp. Oracle RDB Digital VAX7830 MPP DEC 631
8 Mitsukoshi Information Service Co. Teradata NCR 5100 MPP EMC 600
9 CentreLink CCA Model 204 IBM S/390 Cluster IBM, EMC 582
10 Metromail Corp. Oracle Sequent Symmetry 5000 SMP EMC 405
TABLE 1. Database size, all environments,
transaction processing systems.

UPS also gained a second Grand Prize for most rows in a transaction processing system, any environment. PLDR tops the field with a whopping 324 billion rows of data, a figure more than six times that of the next contender. Telstra walked off with a Grand Prize for most rows in a centralized database, while the second place citation was again awarded to Experian.

Most Data and Most Rows in a Transaction processing System, All Environments

A champion in its own right, Telstra also earned a pair of Grand Prizes. The Melbourne, Australia telecommunications company doubled its laurels from 1997. Michel Antoine, manager of capacity planning and tuning, submitted the survey.

The Telstra installation is a four-year-old customer billing system. Its 4.35TB of data is a hefty 35 percent increase from the 3.2TB reported last year. Telstra's second Grand Prize was for most rows--an impressive 51 billion. In this database, a single online table 1.2TB in size contains a copy of every customer invoice issued over the past three years. Thus the Telstra database contains a single table that's a bona fide VLDB in its own right.

IBM DB2 is the mainstay DBMS at Telstra; it's distributed among a three-node cluster of Hitachi Data Systems Skyline and IBM S/390 machines. There are 20 processors in the cluster and a total of 13.5TB of memory involved. Hitachi, along with IBM and EMC, provide storage devices for the system. In peak times, the system can process 300TPS.

Table 1 and Table 2 illustrate the extraordinary strength and staying power of IBM and mainframe-class products in transaction processing applications. At the core of many Global 1000 companies are tried-and-true mainframe solutions. DB2, Adabas, CA-IDMS, Model 204, and others are performing the daily operations of many of the world's most sophisticated and successful companies.

Rank

Organization

DBMS

Processor

Architecture

Storage

Rows/Records (m)

1 UPS
(PLDR; federated)
DB2 IBM S/390,
Hitachi Skyline
SMP EMC,IBM 324,000
1 Telstra DB2 Hitachi Skyline Cluster HDS, EMC, IBM 51,000
2 Experian
(credit reporting)
DB2 Hitachi Skyline 62 SMP EMC 15,018
3 SK Telecom DB2 IBM S/390 MPP HDS, IBM 5,870
4 Caixa Economica
Federal
CA-IDMS IBM S/390 SMP EMC, HDS, IBM 5,000
5 Deere & Co. DB2 IBM RS/6000 SMP IBM 2,503
6 Metromail Corp. Oracle Sequent
Symmetry 5000
SMP EMC 2,500
7 IZB Software Adabas Amdahl 5995 SMP EMC, Comparex 1,900
8 DBS Systems Corp. Oracle RDB Digital VAX7830 MPP DEC 915
9 Victoria's Secret Catalogue DB2 Amdahl SMP IBM, EMC 800
10 CentreLink CCA Model 204 IBM S/390 Cluster IBM, EMC 734
TABLE 2.Most rows, all environments, transaction processing systems.

 

MOST DATA IN A TRANSACTION PROCESSING SYSTEM, UNIX SYSTEMS ONLY

Leading all Unix-based, transaction processing systems for honors in database size is Mitsukoski Information Service Co. Ltd. of Tokyo. This 600GB database is implemented in NCR Teradata and hosted on a WorldMark 5100. The system configuration contains four nodes and 32 processors; storage is provided by two EMC disk drives. Seiichiro Honda, manager of analyst relations for NCR Japan, submitted the survey on behalf of the company.

Mitsukoshi Information Service is the financial arm of The Mitsukoshi, one of the premier department stores in Japan. For fiscal year 1997, The Mitsukoshi reported annual revenues of $5.8 billion, making it the second-highest grossing department store in that country. The Mitsukoshi database comprises 100GB of user data, 400GB of summaries and aggregates, and 100GB of indexes. It contains 400 million rows/records of data. 1997 was a banner year for the database, which doubled in size in 12 months. Mitsukoshi predicts an even greater increase for 1998, forecasting that the database will triple in size by the end of the year.

Although the system performs a mixed workload of transaction processing and decision support, its primary use is transaction processing. On average, it processes 100 TPS, a figure that more than doubles to 220 TPS during heightened activity.

Table 3 shows the remaining winners in this category. Notice that a unique combination of DBMS and system hardware supports each winner. At this time, no specific vendor of these components has established dominance in Unix-based transaction processing activity. However, in terms of storage, EMC is clearly the leader.

Rank

Organization

DBMS

Processor

Architecture

Storage

Data (GB)

1 Mitsukoshi Service Co. Teradata NCR 5100 MPP EMC 600
2 Metromail Corporation Oracle Sequent Symmetry 5000 SMP EMC 405
3 Deere & Co. DB2 IBM RS/6000 SMP IBM 356
4 Chase Manhattan Bank Informix-Dynamic Server HP 9000 SMP EMC 175
TABLE 3. Database size, Unix only, transaction processing systems.

MOST ROWS IN A TRANSACTION PROCESSING SYSTEM, UNIX SYSTEMS ONLY, FEDERATED ARCHITECTURE

A new face in the crowd this year is Deere & Co., the winner for rows in a transaction processing, Unix-only environment. Deere & Co. was awarded a Grand Prize for having exactly 2,502,739,507 rows in its database. The system is a hybrid comprising primarily DB2 with some additional Oracle and SQL Server components. Martin Spratt, project manager at the company, submitted the winning survey.

As a federated database, Deere & Co. uses IBM's DataJoiner software to provide an integrated view of the various database modules. DataJoiner does not contain the actual data. Instead, it serves as a large, intelligent metadata catalog for globally distributed physical tables and indexes.

The Deere & Co. installation runs on a two-node cluster of IBM RS/6000 machines and plans to move to four-way SMP capability. Big Blue disk drives also provide most of the storage capacity. In the year and a half since it went into production, the system has ballooned from 7,000 transactions per week to its current range of 30,000 to 45,000 transactions per week.

Spratt reports that sometime next year he expects to add IMS and VSAM databases, now only part of the development environment, to the federation. He characterizes the future configuration as a "virtual database geoplex" that will be nearly twice the size of the current database.

Table 4 offers clear evidence of the steadily expanding dimensions of VLDBs. Last year in this category, The Handleman Co. took the Grand Prize for a database containing 1,300 million rows. This year, Deere & Co. and the second-place finisher, Metromail Corp., almost double that mark with 2,503 and 2,500 million rows, respectively.

Rank

Organization

DBMS

Processor

Architecture

Storage

Rows/Records (m)

1 Deere & Co. DB2 IBM RS/6000 SMP IBM 2,503
2 Metromail Corp. Oracle Sequent Symmetry 5000 SMP EMC 2,500
3 Mitsukoshi Information Service Co. Teradata NCR 5100 MPP EMC 400
4 Chase Manhattan Bank Informix-Dynamic Server HP 9000 SMP EMC 344
TABLE 4. Most rows, Unix only, transaction processing systems

Peak Online Activity in a Transaction processing System, All Environments

In the category of peak online workload in all environments, the Grand Prize goes to Roadway Express. Roadway is a seasoned VLDB Survey Program participant whose database proved no match for the other challengers this year. This mixed-usage system primarily performs OLTP and can process 1,820 TPS. In fact, even under average conditions, Roadway Express executes 650 TPS, a figure that would also have captured first place for the company. Kevin Carracher, manager of development support services, supplied the winning survey with assistance from Chris Orlowski, a consultant with Caliber Technology Inc.

The Roadway Express database is a shipment management system that has been in production for 11 years. The system uses CCA's Model 204 on an IBM S/390 platform with seven processors. Most of the data by far--133GB--is kept at the detail level, with 1.4GB of summaries and aggregates and just 25GB of indexes. The database contains 336 million rows of data. IBM and Hitachi Data Systems provide the storage.

If you want to know where the frenzied OLTP workload activity is taking place, Table 5 reveals the industries on the edge of the envelope. Three of the top five companies--Telstra, Pacific Telecom, and SK Telecom--are telecommunications organizations. Government and banking/financial services are also well represented among the leaders

Rank

Organization

DBMS

Processor

Architecture

Storage

TPS

1 Roadway Express Inc. CCA Model 204 IBM S/390 SMP IBM, HDS 1,820
2 Telstra DB2 Hitachi Skyline Cluster HDS, EMC, IBM 300
2 Pacific Telecom Inc. CA-IDMS Amdahl 5995 M SMP IBM, Spectris, EMC 300
3 CentreLink CCA Model 204 IBM S/390 Cluster IBM, EMC 272
4 SK Telecom DB2 IBM S/390 MPP HDS, IBM 250
5 UPS (PLDR) DB2 IBM S/390, Hitachi Skyline SMP EMC, IBM 220
6 Caixa Economica Federal CA-IDMS IBM S/390 SMP EMC, HDS, IBM 205
7 Progressive Corp. CA-IDMS IBM S/390 SMP EMC 140
8 IZB Software Adabas Amdahl 5995 SMP EMC, Comparex 135
9 DBS Systems Corp. Oracle RDB Digital VAX7830 MPP DEC 110
10 Metromail Corp. Oracle Sequent Symmetry 5000 SMP EMX 92
TABLE 5. Peak online activity, all environments, transaction processing systems.

 

MOST ROWS AND PEAK ONLINE ACTIVITY IN A TRANSACTION PROCESSING SYSTEM, UNIX SYSTEMS ONLY

The next winner is a veteran program participant, but 1998 marks its first appearance in the Grand Prize winner's circle. We are proud to bestow not just one, but two Grand Prizes on Metromail Corp. The Metromail database, a centralized transaction processing system, achieves distinction in two categories: most rows or records and highest online workload for a Unix-based system. Brian Foreman, DBA for the company, submitted the winning survey.

The Metromail database contains names, addresses, phone numbers, and other public information about more than 100 million U.S. households. Commercial organizations use the data for direct marketing purposes and nonprofit organizations use it for fund-raising activities. Metromail uses Oracle aboard Sequent Symmetry 5000 servers, with EMC 5500 devices providing storage. The database contains 179GB of user data and 226GB of indexes. There are no summaries or aggregates because the applications running against the system require data at the detail level (actual names, addresses, and so on).

Metromail captures its first Grand Prize for the 2.5 billion rows of data in the database. In capturing the crown, the company's achievement can be traced to growth of the database, which nearly doubled in size over the past year. Metromail earns a second Grand Prize award for average transaction processing speed, 26 TPS, but peaks at more than three times that speed, 92 TPS.

Table 6 shows the other winners in this category. This list illustrates the diversity in row design among large databases. Furthermore, number of rows does not necessarily correlate to database size. Notice that the Metromail database, which is two-thirds the size of the Mitsukoshi installation, contains eight times as many rows.

Rank Organization DBMS Processor Architecture Storage TPS
1 Metromail Corp. Oracle Sequent Symmetry 5000 SMP EMC 92
2 Chase Manhattan Bank Informix-Dynamic Server HP 9000 SMP EMC 25
3 Mitsukoshi Information Service Co. Teradata NCR 5100 MPP EMC 15
4 Deere & Co. DB2 IBM RS/6000 SMP IBM 11
TABLE 6. Peak online activity, Unix only, transaction processing systems

 

MOST DATA AND MOST ROWS IN A DECISION-SUPPORT SYSTEM, ALL ENVIRONMENTS, FEDERATED ARCHITECTURE

Now under a new name, the next blue ribbon company repeats as a double Grand Prize winner. The Dialog Corp., formerly Knight Ridder Information, achieves distinction in two categories. In the realm of decision support in any environment, this federated installation led all participants in database size and most rows. Shelley Giles, programmer analyst, entered the winning questionnaire.

Weighing in at an imposing 6.3TB, the Dialog system is a commercial information retrieval and document delivery service that draws information from many different types of data--bibliographic, company directory, patent, newspaper, trademark, chemical, and more. Over the past 12 months, 50 billion new rows of data were added to the system to reach the 150 billion-row milestone.

One of the Dialog system's unique characteristics is that it uses a proprietary database management system that has evolved significantly during its 26 years in operation. The DBMS runs on an SMP system comprising a seven-processor Hitachi Data Systems GX8724 box plus four uniprocessor Sun SPARC servers. Three primary operating systems support the system: VM/CMS for online retrieval, MVS for file updating, and Unix for user access to the file servers. Storage is provided by a medley of devices: EMC, Hitachi, IBM, and Sun disk devices for DASD and Kubic Multi CD-ROM for offline storage.

Table 7 underscores how Unix is the preferred platform for large decision-support installations. When we assess the sheer amount of data, the Dialog mainframe-based system is the largest DSS site. However, it is a federated system. In terms of centralized or distributed DSS installations, the four largest and seven of the top 10 run on Unix platforms. Unquestionably, Unix is the operating environment of choice for meeting high-end decision-support requirements.

Rank

Organization

DBMS

Processor

Architecture

Storage

Data (GB)

1 The Dialog Corp.(federated) Proprietary Hitachi GX8724,Sun SPARC SMP EMC, HDS, IBM,Sun, Kubic 6,300
1 Sears (SPRS) Teradata NCR 5100 MPP EMC 4,630
2 HCIA Informix-Dynamic Server Sun 6000 SMP Seagate,Quantum 4,500
3 Wal-Mart Stores Inc. Teradata NCR 5100 MPP Seagate 4,422
4 Tele Danmark A/S DB2 IBM RS/6000 MPP IBM 2,840
5 CitiCorp DB2 IBM SP MPP IBM 2,468
6 MCI (database marketing) Informix-Dynamic Server EP IBM SP MPP IBM 1,884
7 NDC Health Information Services Oracle Sequent NUMA-Q 2000 NUMA EMC, DG Clariion 1,850
8 Dayton Hudson Corp. NonStop SQL Tandem Himalaya MPP Tandem 1,315
9 Sprint Teradata NCR 5100 MPP NCR 1,300
10 Ford Motor Co. Oracle Sequent NUMA-Q 2000 NUMA EMC 1,200
TABLE 7. Database size, all environments, decision-support systems.

Table 8 provides even more evidence that VLDB dimensions are expanding rapidly. Figure 1 compares the figures in this category between the 1997 and 1998 programs. To eliminate any aberrations, we'll disregard the largest and the smallest site from each top 10 list. Last year, the second largest site had 20 billion rows, the number nine winner had 6 billion, and the average for the top 10 sites was 16.6 billion. This year, the second largest site has 50 billion rows, the number nine winner has more than 9 billion, and the average row count for the category is 19.7 billion. 9808win1.gif (6377 bytes)
 

Rank

Organization

DBMS

Processor

Architecture

Storage

Rows/Records (m)

1 The Dialog Corp.(federated) Proprietary Hitachi GX8724, Sun SPARC SMP EMC,HDS, IBM, Sun, Kubic 150,000
1 Wal-Mart Stores Inc. Teradata NCR 5100 MPP Seagate 50,000
2 Sears (SPRS) Teradata NCR 5100 MPP EMC 33,000
3 Dayton Hudson Corp. NonStop SQL Tandem Himalaya MPP Tandem 24,000
4 MCI (database marketing) Informix-Dynamic Server EP IBM SP MPP IBM 16,345
5 Catalina Marketing Corp. Red Brick Digital Alpha 8400 SMP EMC, MTI 15,277
6 Tele Danmark A/S DB2 IBM RS/6000 MPP IBM 10,100
7 HCIA Informix-Dynamic Server Sun 6000 - 1000 SMP Seagate, Quantum 10,000
8 CitiCorp DB2 IBM SP MPP IBM 9,744
9 VarTec Telecom Inc. MS SQL Server DG AViiON 3600 SMP DG Clariion 9,600
10 Sears (data warehouse) Informix-Dynamic Server EP IBM SP-SMP MPP IBM 8,229
TABLE 8. Most rows, all environments, decision-support systems.

 

Most Data and Most Rows in a Decision-Support System, All Environments

Winter Corp. was pleased to confer another two Grand Prize awards on Sears, Roebuck and Co. Sears outpaces all centralized or distributed in database size in two categories: all environments and Unix environments only. Jean Brizzolara, systems manager for Sears, submitted this survey.

The Sears system is known as the Strategic Performance Reporting System (SPRS). Designed for decision support, SPRS is the single authoritative source for the company for merchandising information such as sales, inventory, and margin analysis.

Sears received two Grand Prizes for the amount of data in SPRS, 4.63TB. This figure breaks down into 4.3TB of user data and 330GB of summaries and aggregates. Within SPRS, a single table contains 40 percent of the data; it contains weekly inventory information down to the SKU level for each Sears store, distribution center, and warehouse! Not only is the overall size of the database extraordinary, but one table, on its own, contains nearly 2TB of data. Counting the disk allotted for freespace and redundancy, the database approaches the 10TB mark, more than double the size of the data alone.

SPRS is implemented in Teradata and runs on a 48-node NCR WorldMark 5100M system with 384 processors. EMC provides storage for the system.

Table 9 shows how the top participants in this category provided the closest competition in the '98 program. In amount of data, Sears, HCIA, and Wal-Mart differed by only 2 to 3 percent.

Rank Organization DBMS Processor Architecture Storage Data (GB)
1 Sears (SPRS) Teradata NCR 5100 MPP EMC 4,630
2 HCIA Informix-Dynamic Server Sun 6000 SMP Seagate, Quantum 4,500
3 Wal-Mart Stores Inc. Teradata NCR 5100 MPP Seagate 4,422
4 Tele Danmark A/S DB2 IBM RS/6000 MPP IBM 2,840
5 CitiCorp DB2 IBM SP MPP IBM 2,468
6 MCI (database marketing) Informix-Dynamic Server EP IBM SP MPP IBM 1,884
7 NDC Health Information Services Oracle Sequent NUMA-Q 2000 NUMA EMC, DG Clariion 1,850
8 Sprint Teradata NCR 5100 MPP NCR 1,300
9 Ford Motor Co. Oracle Sequent NUMA-Q NUMA EMC 1,200
10 Acxiom Corp Oracle Digital 8400 Cluster DEC 1,125
TABLE 9. Database size, Unix only, decision-support systems.

 
MOST ROWS IN A DECISION-SUPPORT SYSTEM, ALL ENVIRONMENTS AND UNIX ENVIRONMENTS ONLY

Another repeat winner from last year is Wal-Mart Stores Inc. The company captured top honors in double categories: most rows in a decision-support system (centralized or distributed) in all environments as well as in Unix environments only. This mixed-purpose system is a merchandising data warehouse implemented in Teradata and supported by a 96-node NCR WorldMark system. Seagate Barracuda drives provide more than 16TB of DASD. Randy Salley, director of IS, entered the survey.

Wal-Mart outpaces all other entries by reporting a colossal 50 billion rows of data in the system. This figure represents a 150 percent explosion from 20 billion in 1997. This astonishing growth corresponds with a gigantic increase in database size. Over the last 12 months, the database nearly doubled, leapfrogging from 2.4TB in early 1997 to a prodigious 4.2TB a year later. What's more, Wal-Mart reports that the database is undergoing voracious growth and projects another 50 percent gain by 1999.

Table 10 shows the leaders in this category. The companies on this list characterize the types of industries making decision-support activities an integral part of their operations. At these sites users are using their systems to compose customer profiles, track sales incentives, understand and anticipate buying patterns, hone customer service techniques, and so on. Three out of the top four companies and half of those in the top 10 are retail businesses; the telecommunications industry is also well represented.  

Rank

Organization

DBMS

Processor

Architecture

Storage

Rows/Records (m)

1 Wal-Mart Stores Inc. Teradata NCR 5100 MPP Seagate 50,000
2 Sears (SPRS) Teradata NCR 5100 MPP EMC 33,000
3 MCI (database marketing) Informix-Dynamic Server EP IBM SP MPP IBM 16,345
4 Catalina Marketing Corp. Red Brick Digital Alpha 8400 SMP EMC, MTI 15,277
5 Tele Danmark A/S DB2 IBM RS/6000 MPP IBM 10,100
6 HCIA Informix-Dynamic Server Sun 6000 - 1000 SMP Seagate, Quantum 10,000
7 CitiCorp DB2 IBM SP MPP IBM 9,744
8 Sears (data warehouse) Informix-Dynamic Server EP IBM SP-SMP MPP IBM 8,229
9 Walgreen Co. HOPS Digital Alpha 4100 SMP DEC 6,600
10 Union Pacific Railroad Teradata NCR 5100M MPP Seagate 6,112
TABLE 10. Most rows, Unix only, decision-support systems.

PEAK ONLINE ACTIVITY IN A DECISION-SUPPORT SYSTEM, ALL ENVIRONMENTS AND UNIX ENVIRONMENTS ONLY

For decision-support systems, the VLDB Survey Program defines peak online activity as For decision-support systems, the VLDB Survey Program defines peak online activity as the highest number of concurrent, online, in-flight queries, reports, and updates. Outdistancing all other contenders in this consideration is JCPenney. The company earned a pair of Grand Prizes for its ability to execute 784 concurrent online processes. This figure exceeded all entrants operating on any platform as well as those in Unix-only environments. John Mayrack, systems development manager, supplied the winning survey.

The JCPenney database is a customer-centric data warehouse that performs a mixed workload of ad hoc queries and data maintenance processes. It is implemented in Teradata and hosted on an NCR WorldMark 5100M system with 12 nodes and 96 processors. EMC 5000 and 5430 devices provide storage for the system. Most of the data--560GB--is at the detail level. There are 22GB of summary and aggregate data plus an additional 20GB of indexes, for a total of 602GB.

Keep an eye on this database. Not only did it grow by 50 percent last year, but JCPenney projects an even greater expansion, 70 percent, for 1998. And over the next three years, the company predicts the database will triple in size, easily propelling it over the terabyte bar.

One significant observation you can glean from Table 11 is Unix's supremacy in handling high online workload levels for DSS. Other than The Dialog Corp.'s use of MVS and a minor presence of enduring TOS/System 3600 combos, every other winning site in this category uses Unix. With Unix's proven strength in supporting large amounts and many rows of data, this finding clearly confirms Unix as the platform for DSS.

No less obvious is the DBMS best suited for high workload activity. Both Tables 11 and 12 reveal Teradata and Oracle as the premier choices. The Dialog Corp.'s proprietary solution excluded, all top 10 winners use either of these DBMSs.
 

Rank Organization DBMS Processor Architecture Storage Concurrent Queries
1 JCPenney Teradata NCR 5100M MPP EMC 784
2 SBC Corp. Teradata NCR 5100 MPP NCR 750
3 AT&T Teradata NCR 3600 MPP Seagate 600
4 Fidelity Systems Co. Oracle Sun e10000, e6000 Cluster Sun 500
5 The Dialog Corp. Proprietary Hitachi GX8724, Sun SPARC SMP EMC,HDS,IBM, Sun, Kubic 500
6 SNCF Teradata NCR 5100M MPP EMC 300
7 Hewlett-Packard Oracle HP 9000 SMP HP 300
8 Experian (financial database marketing) Oracle SGI Challenge XL SMP Amdahl 256
9 NCR Corp. Teradata NCR 3600, NCR 5100M MPP Symbios 214
10 UPS (data warehouse) Oracle HP 9000 SMP EMC 180
TABLE 11. Peak online activity, all environments, decision-support systems.


 

Rank

Organization

DBMS

Processor

Architecture

Storage

Concurrent Queries

1 JCPenney Teradata NCR 5100M MPP EMC 784
2 SBC Corp. Teradata NCR 5100 MPP NCR 750
3 Fidelity Systems Co. Oracle Sun e10,000, e6000 Cluster Sun 500
4 Hewlett-Packard Oracle HP 9000 SMP HP 300
5 SNCF Teradata NCR 5100M MPP EMC 300
6 Experian (financial database marketing) Oracle SGI Challenge XL SMP Amdahl 256
7 NCR Corp. Teradata NCR 3600, NCR 5100M MPP Symbios 214
8 UPS (data warehouse) Oracle HP 9000 SMP EMC 180
9 Boeing Teradata NCR 5100M MPP Symbios 150
10 National Association of Securities Dealers Oracle Sequent NUMA-Q NUMA EMC 150
TABLE 12. Peak online activity, Unix only, decision-support systems.

 

INDUSTRY TRENDS

Several shifts in the industry are readily apparent from the 1998 data. As we mentioned earlier, the first is the growing presence of Unix at decision-support sites. In the 1997 program, 68 percent of the DSS participants were running on a Unix platform; this year, the number leaped to 87 percent. However, Unix has only a minor presence at transaction processing installations. In the 1997 program, 21 percent of OLTP installations were hosted on Unix platforms. A year later, this number decreased slightly to 19 percent.

Another development is the increased visibility of the federated database architecture. In the 1997 program, only one participant, National Processing Co., reported a federated database. One year later, that number has jumped to seven, of which four run transaction processing applications exclusively or primarily and three are used for decision support.

It goes without saying that VLDBs just keep getting bigger and bigger. Consider how transaction processing systems have expanded in the past 12 months. Over the course of 1997, UPS combined a pair of more than 3TB systems with several other systems to form one monstrous 16TB federated system, the survey program's first double-digit terabyte site. Fellow OLTP leaders Telstra erupted by 36 percent and Experian mushroomed by an incredible 56 percent to reach the 4.3TB and 2.7TB marks, respectively.

The boundaries of the DSS world are stretching as well. The Sears database, no shrinking violet last year at 1.3TB, catapulted into the top tier in the '98 program: The Sears system grew by more than three and a half times to reach a formidable 4.63TB. As part of this growth, Sears added 550 percent more rows for a total of 33 billion. Wal-Mart, a perennial leader in the VLDB Survey Programs, almost doubled its data content and more than duplicated its row count to reach the 4.42TB and 50,000-row marks.

WHERE DO WE GO FROM HERE?

The answer to the question is already clear: up and up and up. The report on the 1997 program a year ago ("Giants Walk the Earth," September 1997) predicted the presence of a 9TB database in the '98 program. This musing actually underestimated database growth in practice. At 16.8TB, the UPS package tracking and delivery system exceeded our prediction by more than 60 percent! What other company--or companies--will surpass the 10+TB border next year?

In addition to celebrating the ever-expanding VLDB frontier, we as database specialists must also capitalize on this trend in practice. From these leading implementations, we can learn how to identify the risks, master the critical elements, and understand the success factors of VLDBs by building a common knowledge base about large databases. We begin collecting data for the next campaign in September 1998. Be a participant in the program and find out how to guide your VLDB into terabyte territory.

More information about the Winter VLDB Survey is available at www.wintercorp.com.


Richard Winter is president and Kathy Auerbach is research program manager of Winter Corp. in Boston, an international consulting practice that advises executives on large database strategies, parallel architectures, risk management, and critical implementation projects. You can reach them by email at richard.winter@wintercorp.com and Kathy.Auerbach@wintercorp.com respectively, or by telephone at (617) 695-1800.

What's in It For You?

Participating in the VLDB Survey Program helps you create industry recognition for yourself, your database team, and your company. The program gives you an opportunity to tell customers, prospects, and competitors about your success in implementing and operating a large database. At the same time, you're helping to build a body of knowledge about the best practices in the VLDB industry.

The program creates far-reaching professional visibility via an ongoing press relations campaign. Furthermore, Winter Corp. posts all winners' names, company information, and database accomplishments on our Web site for a year. Achievements in the program serve as an excellent source of material for company collateral, advertising, press releases, Web site content, and so on.

All program winners receive a variety of awards, including gourmet chocolate, airline miles, framed certificates, and crystal plaques. We'll also send you a free copy of the Members Report, a summary of program highlights and research findings. If you're a Grand Prize winner, you'll also receive a complimentary Winter Corp. technology briefing at a location of your choosing.