Data Warehousing Concepts for AS/400

Table of Contents

Introduction
What is a data warehouse?

  • Benefits
  • A data warehouse customer example
  • Data warehousing concepts and terms
  • Operational versus informational data
  • Transforming operational data into informational data
  • Multi-dimensional analysis and OLAP
  • Data marts
  • Metadata
  • Business intelligence software and data mining
  • Data mining
  • AS/400 information warehouse architecture and components
  • Components
  • Transformation/propagation tools
  • Product examples
  • Data warehouse database servers
  • Product examples
  • Data analysis and end-user tools
  • Product examples
  • Managing information about the warehouse--metadata
  • Product examples
  • Putting it all together (or where to start)
  •  
  • Product examples
  • Consulting services
  • Summary
  • What customers say
  • For more information

    Introduction

    Data warehousing is an increasingly popular and powerful concept of applying information technology to solve business problems. Meta Group, Inc., a leading consultant in the data warehousing environment, suggests that over 900f the Fortune 2000 businesses will put into place a data warehouse by the end of 1996.

    But what is data warehousing, and why is it so important to so many businesses? How can IBM help move your company into the data warehousing environment? Why does AS/400 Advanced Series make an excellent data warehousing platform? This paper focuses on answers to these questions and describes the products and components that make up AS/400 data warehousing solutions.


    What is a data warehouse?

    Data warehousing is a concept. It is not a product that you can buy off the shelf. It is a set of hardware and software components that can be used to better analyze the massive amounts of data that companies are accumulating to make better business decisions. The data you use to operate your business represents a wealth of knowledge that you may not be fully tapping into. It is an asset that is probably not being used to its fullest potential. Data warehousing can help you take advantage of the knowledge base you've created over time--your data!

    Benefits

    The data warehouse can be used to:

    • Understand business trends and make better forecasting decisions
    • Bring better products to market in a more timely manner
    • Analyze daily sales information and make quick decisions that can significantly affect your company's performance

    In short, data warehousing can be a solution for maintaining your company's competitive edge.

    A data warehouse customer example

    One AS/400 customer, a large supplier of consumer goods, found that their sales force lacked the kind of information needed to expand their business. Monthly sales reports simply were not timely or detailed enough to be very useful. To help make strategic business decisions, the company needed to analyze historical information to be able to spot trends. Furthermore, the company needed this analysis to be fast and on-demand. Their existing applications could not meet either of these requirements.

    With their new AS/400 data warehouse, the company's salespeople can formulate sales strategies based on the information the warehouse provides. This includes information about the success of previous promotions, regional trends, product profitability, and the effect of product packaging. Salespeople have access to immediate information in order to make on-the-spot decisions that are customized to particular clients.

    Other company departments, such as finance and operations, are also using the warehouse to identify and analyze product successes across regions, salespeople, and time. Personnel and machinery resources can be shifted based on demand or projections, to ensure that those resources are optimized for production purposes.

    Data warehousing can be a key differentiator in many different industries. Data warehouse applications include:

    • sales and marketing analysis across all industries
    • inventory turn and product tracking in manufacturing
    • category management, vendor analysis, and marketing program effectiveness analysis in retail
    • profitable lane or driver risk analysis in transportation
    • profitability analysis or risk assessment in banking
    • claims analysis or fraud detection in insurance

    Data warehousing concepts and terms

    Operational versus informational data

    Operational data is the data you use to run your business. This data is what is typically stored, retrieved, and updated by your Online Transactional Processing (OLTP) system. An OLTP system may be, for example, a reservations system, an accounting application, or an order entry application.

    Operational data is typically stored in a relational database, but may be stored in legacy hierarchical or flat file formats as well. Some of the characteristics of operational data include:

    • Updated often and through online transactions
    • Non-historical data (not more than three to six months old)
    • Optimized for transactional processing
    • Highly normalized(1) in the relational database for easy update, maintenance, and integrity

    Informational data is typically stored in a format that makes analysis much easier. Analysis can be in the form of decision support (queries), report generation, executive information systems, and more in-depth statistical analysis.

    Informational data is created from the wealth of operational data that exists in your business. Informational data is what makes up a data warehouse.

    Informational data is typically:

    • Summarized operational data
    • De-normalized and replicated data
    • Infrequently updated from the operational systems
    • Optimized for decision support applications
    • Possibly "read only" (no updates allowed)
    • Stored on separate systems to lessen impact on operational systems

    Transforming operational data into informational data

    Creating the informational data, that is, the data warehouse, from the operational systems is a key part of the overall data warehousing solution. Building the informational database is done with the use of transformation or propagation tools. These tools not only move the data from multiple operational systems, but often manipulate the data into a more appropriate format for the warehouse. This could mean:

    • The creation of new fields that are derived from existing operational data
    • Summarizing data to the most appropriate level needed for analysis
    • Denormalizing the data for performance purposes
    • Cleansing (2) of the data to ensure that integrity is preserved.


    Why is this transformation necessary? Consider operational data that contains every detail about every purchase made in a retail outlet. This data holds the answers to the questions:

    • what items were purchased?
    • what store did the transaction take place in?
    • how was it paid for?
    • and who was the consumer?

    But because the data is stored in a format appropriate for recording these transactions, querying this data to analyze it can be a very time consuming effort and can adversely affect the performance of the transactional system.

    A business analyst tracking sales may want to review data collected from every store by product to forecast inventories, determine profit margins, or track revenues compared to last year. The ability to generate summary level informational data from the operational data provides the performance benefits the analyst can take advantage of to quickly view the trends and problem areas affecting the business.

    Multi-dimensional analysis and OLAP

    Relational databases store data in a two dimensional format: tables of data represented by rows and columns. Multi-dimensional analysis solutions, commonly referred to as On-Line Analytical Processing (OLAP) (3) solutions, offer an extension to the relational model to provide a multi-dimensional view of the data. For example, in multi-dimensional analysis, data entities such as products, geographies, time periods, and sales channels may all represent different dimensions. Actual sales revenues and forecasted revenues may represent measurements to analyze.

    Multi-dimensional data structures provide both a mechanism to store the data and a way for a business analyst to, for example, view actual sales versus forecast numbers across the different dimensions in a very timely, very powerful fashion. The power of the data is given to the users who need it most, without the need for users to wait for complex reports to be generated.

    Multi-dimensional solutions provide the ability to:

    • analyze potentially large amounts of data with very fast response times
    • "slice and dice" through the data, and drill down or roll up through various dimensions as defined by the data structure
    • quickly identify trends or problem areas that would otherwise be missed

    Multi-dimensional data structures can be implemented in many different manners. Relational databases can support this structure through:

    • specific database designs (schema), such as "star-schema", intended for multi-dimensional analysis
    • highly indexed or summarized designs
    These structures are sometimes referred to as relational OLAP (ROLAP)-based structures.

    Another method of providing multi-dimensional analysis is through the use of specialized databases that form a "cube"-like structure for achieving high performance analysis.

    Data marts

    Data marts are workgroup or departmental warehouses, which are small in size, typically 1-10GB. The data mart contains informational data that is departmentalized, tailored to the needs of the specific departmental work group. Multi-dimensional data structures are one way of defining a data structure that can be used for data marts.

    Metadata

    Information about the data warehouse and the data that is contained in the data warehouse comes in two basic parts. There is technical data the warehouse administrator uses, and there is business data that is of use to the warehouse users. All of this data is referred to as metadata, data about the data.

    The technical data contains a description of the operational database and a description of the data warehouse. From these two descriptions, or schema, the data movement operation can be implemented. This data helps the data warehouse administrators maintain the data warehouse and know where all of the data is coming from.

    The business data helps users find information in the data warehouse without knowing the underlying implementation of the database. This information is presented in business terms, instead of the terms used by the programmers when the database was built. The business data also gives the user information about:

    • When the data was moved into the warehouse (how current it is)
    • Where the data came from (which operational database)
    • Other information that lets the user know how reliable the data is

    Business intelligence software and data mining

    Business intelligence software is a fairly new term referring to the tools that are used to analyze the data. This software can consist of:

    • Decision support systems (DSS) tools, which allow you to build ad hoc queries and generate reports
    • Executive information systems (EIS), which combine decision support with extended analysis capabilities and access to outside resources (such as Dow Jones News services)
    • Data mining tools, which allow automation of the analysis of your data to find patterns or rules that you can use to tailor business operations

    These tools are typically easy to use and graphically oriented, with point-and-click functionality. The tools often can present the data to the user in tabular or graphical format, in a report, or with key performance indicators highlighted. A significant benefit of today's analysis tools is the ability to quickly and easily put the power of the data in the hands of the people who can best use it.

    Data mining

    Data mining tools, such as Intelligent Miner from IBM, can find potential gold mines of valuable information in the massive amounts of data that a company accumulates. These tools take decision support and executive information systems to a new level. With traditional DSS and EIS systems, the user forms a hypothesis and uses the query tools to verify or reject the hypothesis. With data mining, the system researches the data and determines patterns, classifications, and associations, while the analyst determines what to do with the results.

    Data mining has been quite useful in the retail industry to analyze consumer buying patterns and form marketing programs to take advantage of the analysis results. For instance, data mining can find patterns in your data to answer questions like:

    • what item purchased in a given transaction triggers the purchase of additional related items?
    • how do purchasing patterns change with store location?
    • what items tend to be purchased using credit cards, cash, or check?
    • how would the typical customer likely to purchase these items be described?
    • did the same customer purchase related items at another time?

    Once the buying patterns have been discovered, the retailer can use this information to tailor a marketing strategy that appeals to each type of buyer, thereby maximizing profits or minimizing costs by optimizing inventory management.

    But the retail industry is not the only industry to take advantage of data mining. Other uses for data mining include: risk assessment and portfolio management in the finance industry; fraud detection and policy assessments in health and insurance; and optimization, scheduling, and visual inspection systems in manufacturing and process control.

    AS/400 information warehouse architecture and components

    IBM's approach to data warehousing is to provide solutions for building the warehouse and to assist the decision-making process through a set of products and services. This set of products and services, or components, defines the AS/400 information warehouse architecture.

    The information warehouse architecture provides the means to change raw data into information for making effective business decisions--the emphasis on information, not data. This architecture provides the overall structure for a comprehensive, flexible, and integrated data warehouse solution.



    Components

    There are four main components of the AS/400 information warehouse architecture. These are:

    • Transformation/propagation tools to load the data warehouse
    • Data warehouse database server
    • Analysis/end-user tools
    • Tools to manage information about the warehouse (metadata)

    Transformation/propagation tools

    A key component of the warehouse implementation is the loading of the warehouse. This component consists of building the informational database from the operational data in the format optimized for decision support. It also includes regularly updating the informational database as additional transactions to the operational system(s) occur.

    The complexity of loading the warehouse can vary depending on a number of factors, including compatibility of operational and informational systems, amount of data massaging that must occur, and timing requirements of the data. Moving data between homogeneous systems is always easier than moving data between heterogeneous systems, because of different database support, differences in (or lack of) support of industry standards for database and/or networks, different security implementations, and so on. Special programming may be required to ensure that the integrity of data is maintained in a heterogeneous environment.

    Product examples

    • DataPropagator Relational Capture and Apply for OS/400 is an IBM solution for moving data into the data warehouse. DataPropagator moves data from DB2 for OS/400 operational systems as well as from any other DB2 family database. DataPropagator allows you full SQL capability as you move the data--an important feature for creating summarized data, derived fields, and multiple summarized data marts.

      Being able to capture the data from the operational systems and to propagate it on a timed basis is another benefit of DataPropagator. Your business will dictate how often the data warehouse needs to be refreshed: once a day, once a week, or every five minutes. The automated apply process handles this so you need only configure DataPropagator once, and the data warehouse is refreshed automatically.

    • Other IBM products that can extract data from non-DB2 data sources include DataJoiner, DataPropagator Non-Relational, and Data Refresher. These products can be used in conjunction with DataPropagator to pull data from IMS, VSAM, and non-IBM relational databases.
    • The DB2 for OS/400 SQL client integration exit program interface can be used to provide seamless access to many different database systems. Vendor-supplied application requestor driver programs can be snapped into OS/400. Then, an AS/400 program (such as DataPropagator) that uses SQL for database access can read from or write directly to databases supported by the driver code. With this function, direct access to non-IBM relational databases can be achieved without the need for an intermediate database gateway. Benefits include application portability, higher reliability in distributed data environments, and data access in a heterogeneous network.
    • Several IBM Business Partners offer data transformation/propagation products as well. Specific requirements will dictate which product best fits any particular situation. See Table 1 for a list of Business Partner solutions.

    Data warehouse database servers--the heart of the warehouse

    There are at least five requirements of a data warehouse server:

    1. Performance

      The performance of the database server must be good enough to support the user's performance expectations during analysis. The warehouse may be very big; it is not uncommon that a warehouse could be over 100GB of data, with fairly complex analysis tasks being executed against the database.

      Parallel computing technologies, although not a requirement, become quite important once the warehouse solution grows to a large size.

    2. Capacity

      The capacity of the server and hardware must be enough to store your entire data warehouse. Data warehouses may be partitioned or split over multiple systems, but the server must support a view of the data that hides where the data is physically stored. In addition to the overall storage capacity requirements of the warehouse, the server and hardware must be able to support very large file or table sizes.

    3. Scalability

      The scalability of the server must be such that your choice of server can handle the requirements of an ever-growing amount of data in the data warehouse. The typical data warehouse implementation starts out at the data mart level--that is, a small, departmental data warehouse that will grow over time into an enterprise-wide warehouse.

    4. Open interfaces

      Availability of open interfaces to support the analysis tool of choice is another requirement. Support of open industry standard interfaces allows the solution to utilize the end-user analysis tool of choice. Without this, you may be locking yourself into proprietary interfaces that could cause major expenses to be incurred to support new analytical tools as the business changes.

    5. Multiple data structures

      Multiple data structures, both relational and multi-dimensional, must be supported.

    Product examples

    The AS/400 Advanced Server models, with 64-bit PowerPC technology, provide an excellent data warehouse solution. This line of AS/400 servers combined with the most widely installed multi-user database in the industry, DB2 for OS/400, provide the basis for meeting the five requirements noted above. Let's take a look at some of these individually.

    Server models

    The AS/400 Advanced Server models are specifically tuned for the type of work loads used in data warehousing applications. For example, the DSS or EIS tools used to access the warehouse utilize a client/server database serving workload. The loading of the data warehouse, massaging of the data, and additional analysis tools take advantage of the high performance batch mode on the server models.

    Scalability within the server models provides data mart type systems, low-end servers that can handle 10 to 25 users, all the way to enterprise-wide warehouses supporting very large databases and large numbers of users. The PowerPC based server models, combined with the clustering technologies described in reference #1, provide an almost unlimited growth path for data warehouse implementations.

    Parallel processing techniques

    Storing and analyzing very large amounts of data require the database to take full advantage of the hardware it runs on, in order to get the most performance out of the system and meet the user's expectations. Several advances in the area of parallel computing have been announced for AS/400 and DB2 for OS/400, available in three stages.

    Parallel I/O processing: The first stage of the parallel technologies is parallel I/O processing. This feature, available in Version 3 Release 1 and subsequent releases of OS/400, provides parallel computing to occur at the I/O processor level for a single job.

    AS/400 business computing systems consist of multiple processors: the main CPU(s) and multiple I/O processors (microprocessor-based adapters) on which to offload the main processors and create a more efficient system. This has been an inherent feature of the AS/400 architecture since its inception.

    Parallel I/O processing also takes advantage of another inherent architectural feature of the AS/400: single level storage. Single level storage allows the system to automatically spread data across all available disk drives in the system for more efficient processing. No manual "data striping" is required on an AS/400 system.


    Parallel I/O processing means that a single user can submit a query operation against the database, and multiple I/O processor tasks are initiated internally to process the request. This capability to have multiple resources in the system working to retrieve the query result data has produced, in some cases, up to a 700 0mprovement in I/O bound query performance.

    Symmetric multiprocessing: The second stage is symmetric multiprocessing, or SMP. Similar to parallel I/O processing, SMP will take advantage of multiple main CPUs for a single job/query. DB2 Symmetric Multiprocessing for OS/400 is required for this function.

    AS/400 business computing systems today can contain up to four CPUs, in addition to the multiple I/O processors. With SMP, all four CPUs and the I/O processors can work together to retrieve the query result data for a single job. DB2 for OS/400 uses industry-leading techniques in its symmetric multiprocessing implementation to actually break down the query (or SQL) statement across the multiple processors. SMP has resulted in performance increases of over 400 0.000000or certain queries. Even single processor configurations can make use of the SMP feature, and have shown significant performance improvements through enhanced optimization techniques.

    Loosely Coupled systems: (Ref #1.) The third stage provides an almost unlimited scalability option. Loosely coupled systems will provide the capability to attach multiple (up to 32) systems together in a "Shared Nothing" (4) cluster. Once connected, those loosely coupled systems appear as a single database to the users; the database can be spread across the storage unit on each coupled system. Whereas there is a limit of 520 gigabytes on the high end server models, with the loosely coupled environment you can expand your disk capacity to 16 terabytes. DB2 Multisystem for OS/400 must be installed to support the loosely coupled technology.

    These multiple systems can be coupled together using any form of communications transport. A typical environment might be through the AS/400 high-speed, fiber-optic connection product, OptiConnect for OS/400. The high transfer rate provided by OptiConnect (1 gigabit per second on PowerPC systems) prevents the connection method from being a bottleneck when passing large amounts of data across a coupled set of systems.

    The combination of SMP with loosely coupled systems effectively provides a massively parallel processing (MPP) environment. Extremely large data warehouse implementations can now be achieved. In a single query, it is feasible that you could have as many as 128 (32 coupled systems times four CPUs in each system) CPUs all working together with potentially hundreds of I/O processors to retrieve the query result data.

    Openness

    A major focus area for AS/400 development has been to ensure compliance with industry and de facto standards for database access. Providing support for open standards allows AS/400 customers the most flexibility in choosing tools and inter-operating with non-AS/400 platforms and databases. Some of these standards include:

    • ANSI X.3.135.1992, ISO 9075-1992 and FIPS 127-2 Structured Query Language
    • IBM Distributed Relational Database Architecture (DRDA) Distributed Unit of Work--Application Directed
    • Microsoft Open Database Connection (ODBC)
    • X/Open SQL Call Level Interface (CLI)
    • Apple Data Access Language (DAL)

    The benefit of this is that client/server tools that also support these standards can very easily use DB2 for OS/400 as their database server. Many decision support and EIS front-end tools have been successfully tested with the new AS/400 ODBC driver that is included in the Client Access product. For an extensive list of client/server tools, refer to the AS/400 Client/Server Applications Directory, G325-6202.

    Internet access and Lotus Notes: The Internet and collaborative computing are important components of today's information technology departments. AS/400 support of the Internet and the leading collaborative groupware solution, Lotus Notes, enhances AS/400 data warehouse capabilities.

    All that is needed to access the data warehouse by way of the Internet is a standard web browser on the client PC to allow the AS/400 HTTP World Wide Web server to access your data.

    One way to access the data warehouse using the Internet is through server-based applications. Any AS/400 application can be accessed through the Internet using the 5250/HTML Workstation Gateway. The Workstation Gateway enables the web browser to act as an AS/400 workstation by converting text-based data streams into HTML (hyper-text markup language) documents as the application runs. Therefore, an existing AS/400 application that accesses your data warehouse can be accessed through the Internet, and new applications can be created quickly with existing tools, libraries, and procedures.

    A second way to access your data through the Internet is by using the DB2 World Wide Web (WWW) offering that is part of Internet Connection for AS/400. DB2 WWW is a program that works from SQL and HTML macro files to set up DB2 for OS/400 queries, with no programming required. You use HTML forms on web browser displays to specify input values for the query. The input values are sent back to the server to query the data warehouse using SQL statements.


    As a third option, you can write a program, using the Internet standard Common Gateway Interface (CGI), to collect input from the user with a web browser and to run a DB2 for OS/400 query. This provides the maximum flexibility for creating dialogs with the user to gather input and to display the results, perhaps using graphs and charts to show the information.

    The AS/400 World Wide Web server simplifies the accessibility of your data warehouse data to your company, customers, and potential markets.

    Lotus Notes is an industry leading groupware product. Its powerful collaborative computing and data replication features can be seamlessly integrated with the AS/400 data warehouse. Built-in replication technology is provided with the Lotus Notes implementation on AS/400, providing the ability to share information from a DB2 for OS/400-based warehouse and Notes applications.

    Support for multiple data structures

    AS/400 supports many different data structures appropriate for analytical processing. DB2 for OS/400 provides a leading edge relational database management system, with multiple interfaces for defining, loading, accessing, and managing the data warehouse. OLAP data structures are supported through IBM Business Partner products that take advantage of the scalability and performance that AS/400 provides. See Table 1 for a list of Business Partner solutions.

    Data analysis and end-user tools

    The open interfaces of DB2 for OS/400, combined with the high performance programming interfaces in Client Access, give you the base with which you can select any number of tools for analysis purposes. These are typically PC-based tools that access the AS/400 data warehouse through the Client Access product. The choice of tools includes report writers, query tools, EIS tools, spreadsheets, and client/server programming tools for highly customized implementations. The wealth of tools does not seem to be a problem; picking one from the hundreds on the market is not as easy!

    Product examples

    IBM and its Business Partners offer a wide range of end-user products to choose from for analyzing the data warehouse from the desktop. IBM can assist you in determining the right tool for you, based on your specific requirements. The IBM Client Series program defines a selected set of leading-edge products to help you through the selection process. These products include IBM Visualizer and Lotus Approach. The IBM Client Series end-user certified products are included in Table 1.

    • IBM and its Business Partners offer a wide range of client/server development tools specifically designed for building your own executive information system. For papers and a directory differentiating the many development tools in the market, refer to the AS/400 Application Development Handbook, G325-6249.
    • Intelligent Miner, from IBM, provides a powerful set of data mining tools for finding patterns and relationships in your data. The Intelligent Miner tools include neural network and statistical algorithms to create leading-edge data mining solutions. Data mining techniques included in this product are:
      • Association

        This technique allows you to discover which items are related. For example, you can discover what items are purchased in a retail store at the same time. The output from this type of data mining is a rule such as: "If a customer buys product X, then he will also buy product Y 75 percent of the time."

      • Sequential pattern

        This technique allows you to discover associations over some period of time. For example, it can be used to determine the likelihood that a person will buy a product next week if he buys a related one today. This information could be used to determine what items a retail store should put on sale together in order to make both related sales, or which items should be stocked in the same aisle. The output from this type of data mining is a rule such as: "If a customer buys product X, then 75 percent of the time he will also buy product Y within a week."

      • Classification

        Historical data frequently consists of a set of values and a classification for those values. Mining data that has already been classified allows you to discover which attributes and values contributed to the classification. For example, you could mine a database of currently insured customers and previous customers who allowed their insurance to lapse. You could then use the resulting classification model to predict which current customers are likely to lapse in the future. Results can also be viewed in a decision tree format.

      • Clustering

        This technique places records into groups with similar characteristics. For example, the customer database has information about customers and how much money they have spent in the past. Clustering techniques be applied to this information to find out which customers have similar attributes and would be likely to spend more money if they became targeted with a marketing campaign. Another important use of clustering is to learn why the items (in this case, customers) fall into particular groups.

      • Prediction

        Also known as scoring, this technique involves mapping a set of input values to a single output value. An example is an application that determines real estate market values. The application receives a set of input values such as location, number of bedrooms, square footage, and so on. The output of the application is the value of the property.

      Combined with these state-of-the-art techniques are data preprocessing functions for transformation and cleansing, and a graphical administration facility that makes data mining easy to set up and run. Each technique provided includes appropriate graphical visualization to make it easy to understand the mining results.

      In addition to allowing you to discover new information, the models developed from your data can be used in applications using an extensive set of APIs. For example, the Intelligent Miner can tell you what attributes make a loan applicant a credit risk. Discovered from past loan history, this information can then be imbedded in an application that can screen future applicants.

    Managing information about the warehouse--metadata

    Factors such as the number of operational systems, diversity of platforms, and consistency and size of data across the platform can affect the need for metadata. Metadata can be found in database catalogs, data dictionaries, and repositories. Anything that contains information about the operational and data warehouse databases can be used to supplement this information. By far the largest source of this information, especially the business information, is the person who created it. The correlations between business terms and the data warehouse structure do not exist in any online facility and must be created by hand.

    Product examples

    With Version 3 of OS/400, the enhancements to the system-wide catalog may mean that all the metadata you need is already in the AS/400. The DB2 for OS/400 system-wide catalog contains information about your files--independent of whether they were created with data description specifications or using SQL. Information about tables, fields, referential integrity constraints, and cross-reference information are kept in the system-wide catalog.

    The IBM DataGuide set of products provide a more extensive information catalog that can include data from multiple platforms. Extraction utilities allow object descriptions of tables, columns, queries, spreadsheets, and so on to be gathered from a wide range of sources. The DataGuide administrator function provides a common point of control for gathering and organizing information for the catalog for all end-user environments. Business views allow users to access the data warehouse while shielding them from the complexities of the database.

    Many IBM Business Partner products offer their own layer of metadata for use with their product sets. For example, IBM Visualizer includes an administrative function that can provide descriptions of files or tables and fields or columns, and can even control join processing.


    Putting it all together (or where to start)

    Up until now we've addressed the various components that make up a data warehouse implementation. Particular requirements will dictate which components are appropriate for each individual situation. Do you need a multi-dimensional data structure? Is 80 percent of your operational data in legacy flat files? How sophisticated should the desktop presentation be for the users of the warehouse? Has someone already built the transformation programs to extract data from the software package that you run your business with today? Do you have the tools you need to help you manage the warehouse?

    Selecting the data warehouse platform is just one piece to the puzzle. Defining requirements, selecting the right components, and implementing the solution completes the process. Whether you build your own solution, or choose to have the solution built for you, IBM can help.

    Product examples

    Following is a list of IBM and Business Partner solutions and their appropriate solution areas.

    Table 1. Data warehouse solutions

    Solution area Product Provider Features
    Warehouse management RODIN Coglin Mill Build, administer, secure and describe the data warehouse. Extract data to data marts or presentation tools.
    Warehouse management ShowCase Strategy ShowCase Corp. Data distribution, administration, and tightly integrated analysis tools.
    Data mart management Visual Warehouse IBM Data mart administration and analysis tools in a packaged offering
    Data propagation and transformation DataPropagator IBM Data replication across all IBM DB2 platforms
    Data propagation and transformation DataMirror DataMirror AS/400 to AS/400 data propagation including S/36 files and bidirectional support.
    Metadata Data Guide IBM Information catalog tool for database administrator and end users.
    Parallel database DB2 Symmetric Multiprocessing for OS/400 IBM Turbo-query feature to significantly enhance performance of query applications.
    Parallel database DB2 Multisystem for OS/400 IBM Database functionality to take advantage of coupled AS/400 systems providing almost unlimited growth.
    OLAP, data mart AMIS/400 Hoskyns Group PLC, IBM Multi-dimensional analysis tools.
    OLAP, data mart MIT/400 SAMAC Multi-dimensional analysis tools.
    OLAP, data mart ShowCase Strategy with ESSBASE/400 ShowCase Corp. Multi-dimensional analysis tools.
    OLAP, data mart Sales Tracker Silvon Software, Inc. Multi-dimensional analysis applications for manufacturing and distribution.
    Executive information system InfoManager Ferguson Information System (U.S.)
    InfoManager O.Y. (Europe)
    Health and banking industry multi-dimensional executive information system applications.
    Decision support tools Visualizer IBM Query, report writing, graphics.
    Decision support tools ShowCase Strategy with VistaPro ShowCase Corp. Integrated query, report writing, graphics.
    Decision support tools Impromptu, PowerPlay Cognos Query, report writing, and desktop OLAP
    Decision support tools BrioQuery Brio Technologies Query, report writing, and desktop OLAP
    Decision support tools Lotus Approach Lotus PC-based database management
    Data mining tools Neural Network Utility IBM Visual development environment for mining data with neural networks; includes graphical training monitor, training language, data translation.
    Data mining tools Intelligent Miner IBM Decision support tool for mining data; includes neural network, decision tree, statistical algorithms, graphical interface, and data translation functions.

    Consulting services

    IBM offers an extensive set of consulting services specializing in data warehouse and data mining solutions. Expert consultants can help in the planning, design, systems integration, product installation, and testing of a data warehouse implementation, as well as provide the follow-on support for managing and automating the data warehouse processes.


    Summary

    Data warehousing implementations need not be complex. The AS/400 Advanced Server models can meet the demands of data warehousing while letting you focus your energies on running your business, and not on running your computer. These models build on the strengths of AS/400 business computing systems: ease of use, advanced architecture, open interfaces, scalability, and investment protection.

    IDC has completed a number of studies on cost of ownership showing AS/400 business computing systems as the least expensive system compared to LANs and UNIX environments. (5) Recent benchmarks by Client/Server Labs, an independent benchmarking firm, shows AS/400 beating 15 other competitive systems by at least 52%. (6)

    The affinity you have with your operational systems make the AS/400 Advanced Server models the most logical choice for the data warehouse. Not only is data transformation made simpler, but systems administration is significantly easier as well. You'll have the same backup procedures, security, and operations as your existing system, but also the ability to use the AS/400 server for operations such as batch offload, application development, or communications.

    What customers say

    One customer had this to say about an AS/400 warehouse solution: "It has allowed our users to analyze and spot trends we simply could not do previously. By implementing a data warehousing solution with the AS/400, we projected an improvement in our merchandising and category management areas by ten percent."

    Another: "I believe that it has changed our style of management throughout the whole company, since managers at all levels are getting the information they need at the right time and in the right format!"

    And still another: "We measure the performance of each SKU relative to plans, historical trends, and forecasts. This in turn helps us make management decisions--taking advantage of the more profitable business and controlling the less profitable."


    For more information

    For more information about products or getting started on your data warehousing solution:


    Footnotes:

    (1) Normalized data refers to data defined according to one of the key rules of a relational model. Multiple tables are built to represent one-to-many, many-to-many, or one-to-one relationships within the data. Normalization minimizes duplication of data and preserves integrity of the database by ensuring that changes to data fields are made once, and that all applications and views of the data are unaffected.

    (2) Cleansing the data in data warehousing terms means to ensure the integrity of data when it is loaded into the warehouse. For example, if data were pulled from two operational systems that use different field names for the same data, cleansing techniques would ensure that the data fields conform to the warehouse definition upon loading.

    (3) E.F. Codd, who defined the 12 rules of a relational database model, has developed 12 rules that define an OLAP model. Products that conform to a portion or all of Codd's rules label themselves OLAP products. Some of the key rules are multi-dimensionality and intuitive data manipulation (for example, drilling down to new levels of data).

    (4) Shared Nothing refers to each system in the coupled network owning and managing its own main memory and disk storage. In other words, the memory and disk are not shared.

    (5) "Cost-to-Use of Midrange and PC LAN Systems in the Networked Enterprise", International Data Corporation.

    (6) "Performance Profile", Aberdeen Group, April 1996.


    International Business Machines Corporation 1996

    Printed in the United States of America
    08-96
    All rights reserved.

    Mail comments to:
    IBM Corporation
    Attention Department HTP
    3605 Highway 52 North
    Rochester, MN USA 55901-7829

    Fax comments to:
    1-800-937-3430 (U.S. or Canada)
    1-507-253-5192 (outside the U.S. or Canada)

    This publication may refer to products that are not currently available in your country. IBM makes no commitment to make available any products referred to herein.

    IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to the IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594, USA

    This publication could contain technical inaccuracies or typographical errors.

    AIX, AS/400, BrioQuery, Client Series, DataGuide, DataJoiner, DataPropagator, DB2, Distributed Relational Database Architecture, DRDA, IBM, IMS, OS/2, OS/400, PowerPC, and 400 are trademarks of the IBM Corporation in the United States or other countries or both.

    Apple is a registered trademark of Apple Computer, Inc.

    DAL is a trademark of Independent Technologies, Incorporated.

    Impromptu is a registered trademark of Cognos, Incorporated.

    Microsoft is a registered trademark of Microsoft Corporation.

    PowerPlay is a registered trademark of Cognos, Incorporated.

    ShowCase is a registered trademark of Rochester ShowCase Corporation.

    UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited.

    X/Open is a trademark of X/Open Company Limited.

    Other company, product and service names may be trademarks or service marks of others.

    All AS/400 manufacturing and development sites meet the ISO 9000 standard for quality assurance.

    Source:http://www.as400.ibm.com/db2/dataware.htm