Make your own free website on

The Exploration Warehouse and Mart

by Sand Technologies Systems
The Exploration Warehouse and Mart

Tables and Figures are not provided.

Over the course of the 1960s, 1970s, and 1980s, most medium-to-large businesses successfully moved key operational aspects of their enterprises onto large computing systems. The 1980s saw relational database technologies mature to the point where they could play the central role in these systems. Naturally, the requirements of operational systems, being substantial and unforgiving, forced database vendors to focus development efforts almost exclusively on issues like transaction speed, integrity, and reliability.

Unfortunately the methods employed to achieve transaction speed, integrity, and reliability were completely contrary to the requirements of reporting and freeform data inquiry. Indexing techniques, integrity checking, locking schemes, data models and transaction logging impaired the ability to obtain information from operational data stores. Moreover, operational data stores usually did not have enough data on line to answer questions against data more than 90 or 180 days old.

When business questions could be answered, it was not unusual to wait weeks for answers. Sometimes, executives would be given the non-choice between stopping the business and producing a particular report. They would also be confronted with contradictory information from multiple systems. It seemed inconceivable that so much time, money, and attention could be paid to technology only to have relatively modest inquiries turned back.

The Data Warehouse

A handful of technologists, most notably Ralph Kimball, founder of Redbrick Systems, and Bill Inmon, founder of Prism and author of Building the Data Warehouse, foresaw the large-scale reporting and decision-support demand that would follow the transactional systems binge. They pioneered and advocated the data warehouse the counterpart to operational systems. The last ten years has seen widespread acceptance of the data warehouse concept and the consequent growth of an entire industry.

The Promise of the Data Warehouse

The promise of the data warehouse is straightforward. As operational systems are dedicated to recording information, the data warehouse is dedicated to returning information to the enterprise. Operational systems run the business today. Analysis of the data in the warehouse determines how the business is run tomorrow.

The State of the Data Warehouse

The premise of the data warehouse is that it is physically separate from operational systems, and has a mission completely different from that of operational systems. The virtue of a separation of systems is twofold. It ensures that the data warehouse will not interfere with business operations, and it facilitates the acquisition, reconciliation, and integration of data, not only from different operational systems within the enterprise, but also from sources external to the business.

A few technical challenges to successful data warehousing have been overcome during the last several years. First, the workload of satisfying broad inquiry against large volumes of data is fundamentally different from the workload of recording business transactions. Successful on-line transaction processing (OLTP) requires the ability to satisfy a large numbers of requests, numbering as high as millions per day, each of which must access and manipulate a very small amount of data in a large database. Successful inquiry or decision support system (DSS) processing requires the ability to satisfy a modest-to-large number of requests, each of which must access and process a large amount of data within a very large database.

First consider the transaction processing request example of consumers using their retail credit cards to make purchases. There may be millions of these that occur during a given day. There may be hundreds proceeding simultaneously at any given time. Each one, however, involves locating a limited amount of account information for a particular consumer and modifying it. The information is usually measured in bytes.

Then consider the inquiry and reporting scenario. Someone in credit card merchandising and marketing has a fairly simple question: Among new cardholders, how many did we market to in a particular region, who match a particular demographic and made purchases from a particular product line? This single question involves accessing and comparing perhaps hundreds of millions of records depending upon the size of the organization.

Developments in hardware and software parallelism have evolved to a point where they are acknowledged as indispensable in handling the latter type of workload. Data warehousing is particularly dependent upon parallel processing technology due to the large data volumes involved. Parallel processing is essentially a divide-and-conquer approach to the problem, bringing many processors, memories, and data buses to bear on any given request. Even the requirement of periodically moving large volumes of data from operational systems into the data warehouse depends largely on parallel data loading and index building in order to fit within acceptable system down-time windows.

Developments in large memory computer configurations also help to satisfy the data warehouse workload. Even more significantly, however, advances in indexing, compact data representation, and data processing algorithms mean that fewer actual bytes of data are accessed and manipulated to answer a given question.

The concept of the data warehouse has become accepted to the point that virtually all Global 2000 companies and most medium-sized companies have a data warehouse development project underway or are planning for one. The market for data warehouse-related hardware, software, and services is measured in the tens of billions of dollars worldwide.

Even so, variants of the data warehouse have emerged to meet some of the specialized real-world needs of companies and company departments everywhere. For example, data warehouse satellites, called data marts, are deployed and tailored to the needs of a specific audience. There is also the up-to-the hour, or up-to-the minute transactional warehouse hybrid, called the operational data store, for companies that have a requirement for extremely fresh information.

What Data Warehouses and Data Marts Leave on the Table

Although the concept of data warehouse is universally accepted, they are still hard to build, and they frequently leave some of the corporate information appetite unsatisfied, even when deployed successfully. The basic mandate of the data warehouse or the data mart is enormous: satisfy the information requirements of an entire company or an entire department, regularly and in a timely way.

The technology used to build data warehouses and marts is antagonistic to this all-purpose sensibility. Parallelism, indexing, clustering, and even novel storage architectures like proprietary multi-dimensional data storage all come at a cost. Effective parallelism and indexing depend extensively upon knowing in advance what questions will be asked, or if not the specific questions, at least the form of the question.

In practice, this means that data warehouses and even marts usually discourage extraordinary lines of inquiry. Given complete freedom of interrogation, power users will bring a data warehouse to its knees with their queries. That is, parallel data striping and indexing suitable for one query may be ill suited to another. Whenever the data warehouse is not indexed or tuned for a particular query the physical resources of the system can be overwhelmed to the detriment of other clients.

Most data warehouse and data mart end-users have modest information requirements and keep businesses running by querying inside the lines. For the most part it is this relatively large audience that data warehouses and marts end up satisfying. In order to provide most of the users with timely service most of the time, technical organizations take the defensive approach, prohibiting non-standard (ad hoc) queries, or scheduling them only at odd hours. In this way they inhibit the smaller number of business analysts in the organization those most likely to find breakthrough opportunities.

What these elite knowledge workers and business analysts desire most from their mart or warehouse is the ability to go wherever their mind or intuition may take them, exploring for patterns, relationships, and anomalies. This is how they cultivate business knowledge.

Ultimately, the difference between information and knowledge is confidence; confidence to act decisively. When decisions are pending, knowledge beats information every time. And knowing depends on the ability to ask any question and get fast answers from the corporate data, the ability to fortify a hunch, to supplement a report, or question assumptions in real time. Here is the money that most data warehouses and marts leave on the table. It recalls the theme that inspired data warehouse: ask anything.

The Exploration Warehouse

There is one rule in the exploration warehouse or mart: there are no rules. The best business analysts, or data explorers, care far less about yesterday s answers than about tomorrow s questions.

What is an Exploration Warehouse?

Data explorers will query the corporate data in unpredictable and non-repetitive ways. They want to see detailed data. They want to see historical data. They will engage in long sequences of forensic-style inquiry, where the answer to one question begs the next. In some cases the explorer just wants a number, in others a pattern, association or relationship, in still others perhaps an inconsistency. Sometimes the data explorer will adjust a particular query repeatedly, submitting it over and over with slight variations, until the result set is appropriately sized.

Beyond just querying data, data explorers occasionally want to store intermediate results in the exploration warehouse. That is, take the answer to one query and store it in a second table. Consider a retail scenario where a user wishes to explore or examine transaction volume by new credit cardholders, aggregated by store and product across all stores and products. For a large retailer, this basic query will likely involve joining a handful of very large tables (from hundreds of thousands of records to hundreds of millions each). Even the result, if there are tens or hundreds of thousands of products and hundreds of stores, will be too large to examine visually. Given the ability to store the result of this first query, and then query further with a dramatic reduction in response time, the explorer will frequently adopt this approach.

Just as data warehouses exist separately from operational systems, exploration warehouses and marts must live apart from the primary data warehouse or data mart. They will be populated by data from primary data warehouses and marts, and occasionally with data from operational systems or external sources. But they must live on physically separate adjunctive computing systems as the demands placed on the host system by the exploration warehouse will frequently interfere with the predictable performance expected of the data warehouse. This leaves a number of concerns worthy of a detailed discussion.

Who Administers the Exploration Warehouse?

Basic data warehouse and data mart deployment and Year 2000 projects have created an insatiable demand for people with proven information systems management skills. Considering this demand, it is unlikely that exploration warehouses will be accorded a full complement of systems administrators, database administrators (DBA), data architects, and data analysts.

In order to survive in this climate, exploration warehouses must be relatively administration-free. In fact, they must be able to thrive and return value to the company with little or no administrative attention on an ongoing basis. Some exploration warehouses will do better in this regard than others. Some may see the services of a DBA. Many, however, will only get attention at setup time. Still others will be left largely to the knowledge workers themselves.

Exploration Warehouses Must be Efficient

Exploration warehouses and marts must not be wasteful, as they occupy a lower position in the hierarchy than data warehouses and marts. Put another way, they must be resource-light. The inflationary tendencies of most relational database management systems (RDBMS) and multi-dimensional storage products will destroy an exploration warehouse initiative.

As soon as exploration warehouses begin to consume machine and human resources on the order of primary data warehouses and marts, they become the likeliest budget-cut targets. Most exploration warehouses and marts will have to survive on a computing resource diet a fraction the size of the primary systems.

The freeform, ask-anything workload entertained by the exploration warehouse also demands resource-efficiency. A product that succeeds as an exploration warehouse must be notable for its processing economy. Processors, memory, storage devices and bandwidth must all be used effectively. Ultimately, the processors must be made as busy as possible, but not by wasting cycles. It must be done, for example, with economical data representation and intelligent caching.

Whatever these systems do with the physical resources at hand, they must permit their users to formulate questions and get answers in real-time. For knowledge workers, the ability to pursue a line of reasoning without administrative assistance and without leaving the chair is paramount. If this cannot be achieved, the exploration warehouse will lose the interest of its users and in turn its value.

The Exploration Warehouse Must be Elastic

Clearly, in order to meet the ask-anything objective, an exploration warehouse must be flexible. It must provide a spectacular level of performance regardless of the query. Naturally, some kinds of queries will execute faster than other kinds of queries. None the less, across the spectrum of queries from very simple to very complex, answers must come back in seconds or minutes most of the time.

B-tree indexing approaches, performance tied to the usage of a particular data model, static data partitioning and striping, index clustering, and pre-joining tables are all brittle, rigid techniques for achieving performance. They will not survive in the exploration warehouse, as they will always be resistant to change. They will force administrative intervention whenever users choose to pursue an iterative query-store-query strategy. And, of course, choosing a performance-enhancing tool to suit one family of queries will always be the worst choice for some other family of queries.

Cost of Ownership

Users of exploration warehouses and marts, due to their unpredictable usage patterns, will see value from these systems coming in bursts. There may be periods where not much of interest comes from the exploration warehouse. Then two or three significant discoveries over the course of a month or two may save the company hundreds of thousands or millions of dollars.

All of the exploration warehouse qualities mentioned above come together in cost of ownership. Total cost of ownership is the key to the survival and the success of the exploration warehouse because it is an unorthodox tool, for unorthodox and extraordinary lines of inquiry, whose value will be difficult to quantify in the short run.

It is widely agreed that the cost of administration and consulting can be several times the software and hardware infrastructure. So if the total cost of ownership approaches that of primary systems, even to within say a factor of two, the exploration warehouse becomes a large budget-cutting target. If it requires too much administration, too many resources, or an excessive amount of maintenance, it is doomed. At the same time, seriously reduced total cost of ownership puts the exploration warehouse in a position to be the strategic information systems tool that creates a transcendent business knowledge advantage.

The Nucleus Exploration Warehouse and Mart

Nucleus is a powerful analytical processing tool for end users and power users of existing data warehouses and marts. Nucleus produces the business intelligence that ends in quick and decisive action by providing ad hoc, complex, anything-goes query capability. Nucleus allows data analysts to explore corporate data in a boundless, interactive, and iterative fashion.

Traditional systems are designed, set up, and indexed for the questions that business users are known to want to ask. Nucleus answers these questions, but more importantly, it handles the questions that are unknown when the warehouse is built. Nucleus patented technology allows the toughest and most arbitrary questions, the ones that can kill marts and warehouses, to be answered in seconds and minutes instead of in hours, days, and weeks.

Performance, economy, flexibility, and simplicity are the essence of Nucleus technology and the Nucleus Exploration Warehouse and Mart. As described in earlier sections, these qualities constitute a blueprint for the exploration warehouse. The following pages demonstrate precisely how Nucleus produces these benefits.

Patented, High Performance, Data Storage Architecture

Nucleus uses a domain-based data storage architecture, implemented with tokens representing atomic data values and encoded bit vectors representing data relationships. These three features are central to Nucleus data storage, retrieval, and manipulation. Just as importantly, they are entirely transparent to Nucleus users with the exception of the domain construct, which is surfaced in standard SQL.

Token database technology differs radically from conventional relational database technology. In a conventional relational database, when a record is added to the system, a physical representation of the data is recorded on disk. Consider a few simple records that might be found in a conventional database management system (table 1).

Each time a transaction is completed, a new record is added to the conventional database. The scaling of data is said to be linear, because the volume of data is a function of the number of records, and query performance is usually a function of data volume.

A closer examination of the records in table 1 shows that there is redundancy of atomic data values throughout the database. The name James appears three times. The number 1000.00 appears five times. The postal abbreviation for Texas, i.e., TX, appears 10 times and so forth. There is significant repetition of values in the database. This tends to be the case with databases in general.

Consider assigning an integer token to represent each distinct entry in the database. (Nucleus uses 1-byte, 2-byte, and 4-byte integers.) James would have a token of 01 among first names. TX would have a token of 01 among state names. Dallas might have a token of 02 among city names, and so forth. The same can be done for the telephone number area codes and local exchanges (note that these are frequently represented as character strings and not numeric values in databases). The above database could be reduced to a series of tokens that can be represented as in table 2.

Nucleus employs a domain construct to record the correspondence between actual scalar data values and their respective tokens. Table 2 illustrates, though somewhat crudely, this domain construct. The records from table 1 can then be represented in a more compact way as shown in table 3. (For more efficient storage, integer values actually represent themselves, i.e., self-encode, in Nucleus databases.)

Tokens are assigned to the elements of a domain on a first-come, first-served basis. At table creation time, or when tables are altered, columns are assigned to domains. What this means is that multiple columns can share a domain. This usually improves both storage reduction and query performance. In fact, query evaluation frequently involves only the tokens, omitting the values from processing.

It is important to understand that the storage of values in domains and the assignment of tokens are handled transparently by the Nucleus database engine. Users or front-end applications only introduce data into the database through the standard SQL INSERT statement or a high-speed bulk load. Data is manipulated with the standard SQL DELETE, UPDATE, and SELECT statements.

Finally, Nucleus uses very compact encoded bit vectors to represent the positional relationships of tokens (and by association their respective atomic data values) in columns. In a Nucleus database every column is represented in two ways: as an array of tokens (like table 3), and as a collection of bit vectors associated with the domain values used in a particular column.

Point your browser to to obtain a more detailed description of the Nucleus bit-vector architecture.

Nucleus derives most of its superior query performance by exploiting the speed and space economics provided by these techniques. First, only the columns of interest for any given query are required in memory. This is to say that the domain structure, the associated collection of encoded bit-vectors, and possibly some, or all, of the array of tokens would be required. This translates into significantly reduced I/O to resolve most queries.

Second, most database queries are resolved by way of patented algorithms that perform Boolean operations on the encoded bit vectors. As an example, consider the following query against our sample table:

  • select count(*) from table_1 where amount > 500.00;

Nucleus would resolve this query as follows. Search the amount domain for values satisfying the inequality. Obtain the tokens corresponding to those values, i.e., tokens 01, 02, and 05 corresponding to 1000.00, 550.00, and 575.00 respectively. Obtain the bit-vectors for this column associated with these particular tokens and logically OR them together to produce a result vector. The number of one-bits in the result vector is the answer to the query.

Suppose the query asked for entire records to be returned instead of just a count. A fourth step, record reconstruction, would follow. It involves using the result bit-vector as a mask against the token arrays for each of the columns in the table. The data values associated with the tokens returned for each column are retrieved and displayed.

It was mentioned near the beginning of this section that as records are added in conventional RDBMS systems, the database grows -- and performance degrades -- accordingly. The only factor that conceivably works against this rule is parallel hardware. And that only works when the questions are anticipated.

The Nucleus data architecture, as described here, actually defies this rule. Encoded bit vectors do not necessarily get physically large as records are added, at least not proportionately so. Domains do not necessarily get physically large as records are added; only for highly unique (or high cardinality) columns. These two qualities make degradation in Nucleus query evaluation performance sub-linear with respect to the growth in the numbers of records in tables. In other words Nucleus query performance will not deteriorate as fast as the database grows.

Resource Efficiency

Earlier it was stated that an exploration warehouse must be resource-efficient. Nucleus introduces resource economy at every level of the computing infrastructure: network, disk, I/O bus, memory, memory bus, and CPU.

The compact representation of data in Nucleus databases consumes less disk storage than conventional RDBMS products. Compact data representation combined with the tendency to involve only columns absolutely required for query resolution yield small working sets. This makes Nucleus a parsimonious user of the network (where applicable), the I/O bus, and memory. And because Nucleus keeps useful data close to the CPU -- data consisting largely of encoded bit-vectors -- the CPU resource is usually employed to do productive work.

This distinguishes Nucleus as a CPU-bound data processing engine, and has the effect of tying Nucleus performance to CPU performance itself. This is actually a very favorable quality since it ties Nucleus bottom-line performance to CPU performance improvement.

Two other aspects of the Nucleus architecture are worth examining in the context of resource efficiency: memory management and concurrency control.

The data storage and manipulation architecture that Nucleus uses actually calls into question traditional approaches to memory management, even the basic organization of the database space. Traditional locking approaches to concurrency, i.e., providing safe read-write access to the records in the database for multiple simultaneous users, are also dubious given the novel data architecture of Nucleus.

Nucleus confronts some of the same issues faced by object-oriented database systems. It is almost impossible to anticipate the size of the basic elements of a Nucleus database. The physical size of a bit-vector depends upon the number of occurrences of data values as well as the pattern of those occurrences. Both are unpredictable. Domain size depends upon the number of distinct values throughout a collection of columns. Only the token arrays can be considered somewhat predictable, since their size is proportional to the number of non-null rows in a table, with element sizes being one, two, or four bytes.

Consequently, Nucleus uses a container-based approach to storage that addresses the need to store irregularly sized objects. Starting from the outside and working inward, a Nucleus database consists of some number of files in one or more file systems or directories known to the computer operating system, say NT or UNIX. Files can even be distributed over a network file system. Together these files constitute a single database. Nucleus stores data -- domains, bit-vectors, token arrays and more -- throughout these files on fixed-size pages that can range from 16 kilobytes for small systems to one megabyte for large systems.

Each Nucleus page holds containers ranging in size from 16 bytes to the full size of the page. All Nucleus data is stored in these containers. When domains, bit-vectors, or token arrays become too large to store on a single page they are segmented across multiple pages. The large page size minimizes the software overhead allowing the database to get extremely large while optimizing the use of the memory hierarchy and preserving the performance scalability of data access.

Nucleus has a rigorously layered system architecture out of necessity. Since tabular data are stored in such a decomposed and compact state, various layers of the system are dedicated to providing users SQL access and to the abstraction of tables made up of records. Similarly, the core algorithms that manipulate domains, bit-vectors, and token arrays call on the services of a pair of Nucleus system layers. The Nucleus object manager locates objects in their respective containers. The Nucleus memory manager handles the demand paging in response to requests from the object manager.

The Nucleus memory manager uses a unique, patented, method for paging the database space. Aside from the raw efficiency of the paging algorithms, the most important observation to be made here is that since the memory manager is responsible for object faulting, the core data manipulation algorithms do not concern themselves with I/O, buffering, or caching. Therefore, when working sets are resident in memory, Nucleus exhibits performance only seen in strictly in-memory database technologies.

Nucleus achieves CPU-bound in-memory performance when working sets are memory resident, without the down side presented by in-memory database systems. That is, Nucleus is designed to page efficiently when required.

The layered architecture shown in is further exploited to provide two features perfectly suited to the exploration warehouse. The first of these features is called Nucleus Scalable Systems Server (S3). Nucleus S3 uses an enhanced memory manager layer to enable multiple concurrent server access against a single database. The servers can be deployed on a single SMP system, on a clustered system, or even over a network file system. Each S3 server provides a view of the database entirely independent of the other servers.

Each Nucleus S3 server reads from the persistent database space, and writes to a private temporary space to provide its users with a virtual database (VDBâ„¢). The following unique capabilities and benefits derive from the Nucleus S3 functionality:

  • Efficient multi-user workload parallelism for SMP and clustered SMP architectures
  • Priority service for selected user populations
  • Different database "views" for separate user populations
  • Transitory "what if" analysis, simultaneously on multiple servers
  • Unrestricted database access with concurrent load
  • Unrestricted database access with concurrent backup

Within a single server instance, Nucleus provides a lock-less, optimistic concurrency control scheme referred to as generation-based concurrency control. RDBMS products generally provide concurrency through page-level or record-level locking. Because Nucleus does not use the record or block of records as the basic unit of storage, a lock-based concurrency control scheme is of questionable value.

Generation-based concurrency control (GBCC) exploits the vertical (i.e., columnar) partitioning and underlying bit-vector architecture of Nucleus to give each transaction the appearance of having its own copy of the database without the actual cost of copying the whole database. In fact, Nucleus uses bit-vectors extensively to implement this multi-versioning of the database economically.

The premise that makes this approach successful is that readers are many, writers are few, and the database is large. In such cases, the probability of conflict is low. The Nucleus GBCC certifies transactions against a consistent, public version of the database at COMMIT time, automatically rolling back transactions that fail certification. Transactions that pass certification have their changes merged into the consistent public state of the database for subsequent transactions to "see."

Exploration Flexibility with Near-Zero Administration

More than performance, and more than resource-efficiency, it is probably the administrative overhead that makes RDBMS engines most unsuitable for the exploration warehouse. The exploration warehouse workload swamps RDBMS in administration. Together, index maintenance, data partitioning and striping, management of various database spaces including index space, table space, and temporary space, and locking make the high-change, ask-anything proposition impossible.

What most clearly distinguishes Nucleus from RDBMS products is its nearly complete absence of administration. Consider management of the database space.

Nucleus database space, and database file management requires only one administrative step. A database configuration utility is used to indicate where, in terms of file system or directory path, database files should be stored and how large they can become. A newly created Nucleus customer database consists of a single file positioned in the first of four designated locations (file systems or directory paths): /ndb1, /ndb2, /ndb3, and /ndb4.

Nucleus will store data in that first file, cdb.n00, until it reaches the maximum prescribed size for files in the /ndb1 location. At that point, Nucleus will start a new file in the next location, /ndb2. This continues round-robin-style, without administrative intervention, across Nucleus locations. A Nucleus database, after a considerable amount of data has been loaded, consists of eight database files, each 1GB in size with the exception of the last file, /ndb4/ndb.n07 which stands at less than its maximum size of 1GB.

Nucleus uses space on file systems only as needed. When space is exhausted on a particular file system, Nucleus automatically takes it out of the rotation. Nucleus also uses the database space for temporary and permanent storage. This takes yet another administrative concern out of the overall scheme. Since Nucleus databases are 100% indexed by virtue of the storage architecture, there is no index space to maintain.

It is also important to note that Nucleus memory manager uses full 64-bit addressing to address the pages contained in the collection of files that make up a given database. This allows Nucleus databases to get very large with little administrative attention. A very large Nucleus database can then be easily grown with the provision of enough space on host file systems. This quality also allows Nucleus to make full use of a very large memory (VLM) configuration.

Another common concern with large databases is backup and recovery. First, Nucleus provides automatic database recovery after a system failure by incorporating a checkpoint mechanism. Recovery proceeds automatically upon mounting a database following the failure.

Second, because Nucleus permits the OS to handle file storage, backup is absolutely straightforward. Live Nucleus databases, that is databases being actively queried, can be backed up by simply copying the database files. Commercially available file system backup utilities can be used if preferred. Database backup can even be performed in parallel. The fact that Nucleus databases are made up of operating system files also makes them readily transportable from one system to another.

Ultimately, the system takes the details of storage, which are at best a science and at worst black art in any RDBMS, out of the hands of the user. This allows users to proceed with completely freeform exploration ad hoc query, transformation, bulk load, create and destroy steps, etc. without the administrative intervention that would otherwise be required at each turn.

Nucleus removes two additional administrative activities from the exploration warehouse equation. First, there is the lock-less, optimistic concurrency control scheme that Nucleus employs. It is notable for its complete absence of administration, since the management of RDBMS resource locks is yet another activity with a voracious appetite for DBA resources.

Second, there is no performance tuning. Performance tuning, in fact, is antithetical to the exploration warehouse proposition. The largely manual discipline of performance tuning, as it is normally conducted in data warehouses and decision support applications, makes the overall solution too sluggish to negotiate the twists and tight turns presented by aggressive data exploration. Nucleus provides exceptional performance, from the simplest queries to the queries from hell, without requiring manual intervention. Its agility masters the exploration warehouse.

Fast and Simple Bulk Data Load

Nucleus integrates seamlessly with existing data warehouses and data marts. That is, Nucleus can be brought into play whenever serious exploration on some subset of data from the data warehouse or data mart is to be conducted. All that is required is appropriate meta-data, i.e., a description of the tables involved, including table name, columns and their respective data types.

Nucleus will accept data directly from legacy applications, from the data warehouse or mart, or even from external sources. Nucleus uses a high-speed bulk loader that loads data wholly or incrementally. There is no index-build step. Once data is loaded it is query-ready.

Nucleus connects to data warehouse systems over TCP/IP and SNA networks. Blasting data out to a Nucleus Exploration Warehouse is made as seamless as possible to encourage the data explorer to incorporate new data sources into the exploration process as needed. The simplicity of the Nucleus data load embodies a particular philosophy, one that dictates the removal of all barriers to data exploration.

Cost of Ownership

All of the exploration-friendly qualities of Nucleus come together in cost of ownership. Together, the across-the-board resource efficiency and administrative economy of Nucleus yield a cost-of-ownership bill a small fraction of what an RDBMS incurs. Nucleus allows the exploration warehouse to proceed in stealth fashion, without coming up on the radar of budget cutters.

In addition to the economies already described, Nucleus provides access to the exploration warehouse through ODBC enabled front-ends. Therefore, knowledge workers can use their front-end of choice to query and manipulate the exploration warehouse without introducing a large-scale retraining cost.

Nucleus final cost of ownership evaluation (the most important yardstick) has to be measured against the ability of the knowledge worker to ask anything and get the answer.


The Nucleus Exploration Warehouse is designed to achieve one goal: to turn operational data or warehoused data into business intelligence as rapidly, efficiently, and simply as possible. Nucleus and the exploration warehouse address the next step in the evolution of data warehouse processes and practices: progress from an information-delivery orientation to more results-oriented business intelligence delivery.

Nucleus brings together a family of very unique and effective technologies designed specifically to breathe life into the exploration warehouse. The exploration warehouse, in turn, puts an edge on strategic and tactical business decisions.


Sand (NASDAQ: SNDCF) provides high performance, scalable software solutions for data mining, data marts, data warehouses and online analytical processing (OLAP). Sand's product suite, The Nucleus Series, brings patented technology to the business user allowing for more timely and accurate decision processing within the disconnected client, desktop, workgroup, departmental and enterprise computing environments.

Copyright 1996-98. All Rights Reserved. Nucleus, Nucleus Server, and N-Vector are registered trademarks of Sand Technology Systems. Nucleus Exploration Mart, Nucleus Exploration Warehouse, Nucleus Virtual Database (VDB), and Nucleus Scalable Systems Server (S3) are trademarks of Sand Technology Systems. Other trademarks are the property of their respective owners.