Building the Data Warehouse

Author: William H. Inmon

Publisher: QED Information Sciences

ISBN: N.A

Category: Data warehousing

Page: 272

View: 4219

DOWNLOAD NOW »

"Data warehouses provide a much-needed strategy for organizations to collect, store, and analyze vast amounts of business data. As businesses expand both brick-and-mortar and online activities, the field of data warehousing has become increasingly important. Since it was first published in 1990, W. H. Inmon's Building the Data Warehouse has been the bible of data warehousing - it is the book that launched the data warehousing industry and it remains the preeminent introduction to the subject. This new edition covers the latest developments with this technology, many of which have been pioneered by Inmon himself."--BOOK JACKET.
Release

Building the Data Warehouse

Author: W. H. Inmon

Publisher: John Wiley & Sons

ISBN: 0471774235

Category: Computers

Page: 576

View: 9575

DOWNLOAD NOW »

The new edition of the classic bestseller that launched thedata warehousing industry covers new approaches and technologies,many of which have been pioneered by Inmon himself In addition to explaining the fundamentals of data warehousesystems, the book covers new topics such as methods for handlingunstructured data in a data warehouse and storing data acrossmultiple storage media Discusses the pros and cons of relational versusmultidimensional design and how to measure return on investment inplanning data warehouse projects Covers advanced topics, including data monitoring andtesting Although the book includes an extra 100 pages worth of valuablecontent, the price has actually been reduced from $65 to $55
Release

Building the Unstructured Data Warehouse

Architecture, Analysis, and Design

Author: W. H. Inmon,Krish Krishnan

Publisher: Technics Publications

ISBN: 1935504045

Category: Computers

Page: 216

View: 2146

DOWNLOAD NOW »

Learn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now! Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.Master these ten objectives: • Build an unstructured data warehouse using the 11-step approach • Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure • Overcome challenges including blather, the Tower of Babel, and lack of natural relationships • Avoid the Data Junkyard and combat the “Spider's Web” • Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0,including iterative development • Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement • Design the Document Inventory system and link unstructured text to structured data • Leverage indexes for efficient text analysis and taxonomies for useful external categorization • Manage large volumes of data using advanced techniques such as backward pointers • Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances The following outline briefly describes each chapter's content: • Chapter 1 defines unstructured data and explains why text is the main focus of this book. The sources for text, including documents, email, and spreadsheets, are described in terms of factors such as homogeneity, relevance, and structure.• Chapter 2 addresses the challenges one faces when managing unstructured data. These challenges include volume, blather, the Tower of Babel, spelling, and lack of natural relationships. Learn how to avoid a data junkyard, which occurs when unstructured data is not properly integrated into the data warehouse. This chapter emphasizes the importance of storing integrated unstructured data in a relational structure. We are cautioned on both the commonality and dangers associated with text based on paper.• Chapter 3 begins with a timeline of applications, highlighting their evolution over the decades. Eventually, powerful yet siloed applications created a “spider's web” environment. This chapter describes how data warehouses solved many problems, including the creation of corporate data, the ability to get out of the maintenance backlog conundrum, and greater data integrity and data accessibility. There were problems, however, with the data warehouse that were addressed in Data Warehouse 2.0 (DW 2.0), such as the inevitable data lifecycle. This chapter discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development.• Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL). This chapter has separate sections on extracting text, transforming text, and loading text. The chapter emphasizes the issues around source data. There are a wide variety of sources, and each of the sources has its own set of considerations. Extracting pointers are provided, such as reading documents only once and recognizing common and different file types. Transforming text requires addressing many considerations discussed in this chapter, including phrase recognition, stop word filtering, and synonym replacement. Loading text is the final step. There are important points to understand here, too, that are explained in this chapter, such as the importance of the thematic approach and knowing how to handle large volumes of data. Two ETL examples are provided, one on email and one on spreadsheets.• Chapter 5 describes the 11 steps required to develop the unstructured data warehouse. The methodology explained in this chapter is a combination of both traditional system development lifecycle and spiral approaches.• Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value. The Document Inventory is discussed, which is similar to a library card catalog used for organizing corporate documents. This chapter explores ways of linking unstructured text to structured data. The emphasis is on taking unstructured data and reducing it into a form of data that is structured. Related concepts to linking, such as probabilistic linkages and dynamic linkages, are discussed.• Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes.• Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse. Both simple and complicated taxonomies are discussed. Techniques to help the reader leverage taxonomies, including using preferred taxonomies, external categorization, and cluster analysis are described. Real world problems are raised, including the possibilities of encountering hierarchies, multiple types, and recursion. The chapter ends with a discussion comparing a taxonomy with a data model.• Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important. Ways of reducing the amount of data are presented, including screening and removing extraneous data, as well as parallelizing the workload.• Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. The traditional data warehouse processing technology is reviewed. In addition, the data warehouse appliance is discussed.• Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies: the Ablatz Medical Group, the Eastern Hills Oil Company, and the Amber Oil Company.
Release

Building the Operational Data Store

Author: W. H. Inmon

Publisher: Wiley

ISBN: 9780471328889

Category: Computers

Page: 336

View: 7808

DOWNLOAD NOW »

The most comprehensive guide to building, using, and managing the operational data store. Building the Operational Data Store, Second Edition. In the five years since the publication of the first edition of this book, the operational data store has grown from an intriguing concept to an exciting reality at enterprise organizations, worldwide. Still the only guide on the subject, this revised and expanded edition of Bill Inmon's classic goes beyond the theory of the first edition to provide detailed, practical guidance on designing, building, managing, and getting the most of an ODS. With the help of fascinating and instructive case studies, Inmon shares what he knows about: * How the ODS fits with the corporate information factory. * Different types of ODS and how to choose the right one for your organization. * Designing and building an ODS from scratch. * Managing and fine-tuning an ODS for peak efficiency. * ODS support technology. * The pros and cons of competing off-the-shelf ODS products. * The advantages and disadvantages of various hardware and software platforms. * Integrating the ODS with data marts. * Distributed metadata using the ODS. * Data aggregation within the ODS. * Business process reengineering and the ODS. * The role of standards in the ODS. Visit our Web site at www.wiley.com/compbooks/
Release

Building a Scalable Data Warehouse with Data Vault 2.0

Author: Dan Linstedt,Michael Olschimke

Publisher: Morgan Kaufmann

ISBN: 0128026480

Category: Computers

Page: 684

View: 9323

DOWNLOAD NOW »

The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse Demystifies data vault modeling with beginning, intermediate, and advanced techniques Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0
Release

Building the Unstructured Data Warehouse

Architecture, Analysis, and Design

Author: Bill Inmon,Krish Krishnan

Publisher: Technics Publications

ISBN: 1634620348

Category: Computers

Page: 216

View: 3840

DOWNLOAD NOW »

Learn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now! Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text. Master these ten objectives: • Build an unstructured data warehouse using the 11-step approach • Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure • Overcome challenges including blather, the Tower of Babel, and lack of natural relationships • Avoid the Data Junkyard and combat the “Spider’s Web” • Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0,including iterative development • Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement • Design the Document Inventory system and link unstructured text to structured data • Leverage indexes for efficient text analysis and taxonomies for useful external categorization • Manage large volumes of data using advanced techniques such as backward pointers • Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances The following outline briefly describes each chapter’s content: • Chapter 1 defines unstructured data and explains why text is the main focus of this book. The sources for text, including documents, email, and spreadsheets, are described in terms of factors such as homogeneity, relevance, and structure. • Chapter 2 addresses the challenges one faces when managing unstructured data. These challenges include volume, blather, the Tower of Babel, spelling, and lack of natural relationships. Learn how to avoid a data junkyard, which occurs when unstructured data is not properly integrated into the data warehouse. This chapter emphasizes the importance of storing integrated unstructured data in a relational structure. We are cautioned on both the commonality and dangers associated with text based on paper. • Chapter 3 begins with a timeline of applications, highlighting their evolution over the decades. Eventually, powerful yet siloed applications created a “spider’s web” environment. This chapter describes how data warehouses solved many problems, including the creation of corporate data, the ability to get out of the maintenance backlog conundrum, and greater data integrity and data accessibility. There were problems, however, with the data warehouse that were addressed in Data Warehouse 2.0 (DW 2.0), such as the inevitable data lifecycle. This chapter discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development. • Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL). This chapter has separate sections on extracting text, transforming text, and loading text. The chapter emphasizes the issues around source data. There are a wide variety of sources, and each of the sources has its own set of considerations. Extracting pointers are provided, such as reading documents only once and recognizing common and different file types. Transforming text requires addressing many considerations discussed in this chapter, including phrase recognition, stop word filtering, and synonym replacement. Loading text is the final step. There are important points to understand here, too, that are explained in this chapter, such as the importance of the thematic approach and knowing how to handle large volumes of data. Two ETL examples are provided, one on email and one on spreadsheets. • Chapter 5 describes the 11 steps required to develop the unstructured data warehouse. The methodology explained in this chapter is a combination of both traditional system development lifecycle and spiral approaches. • Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value. The Document Inventory is discussed, which is similar to a library card catalog used for organizing corporate documents. This chapter explores ways of linking unstructured text to structured data. The emphasis is on taking unstructured data and reducing it into a form of data that is structured. Related concepts to linking, such as probabilistic linkages and dynamic linkages, are discussed. • Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes. • Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse. Both simple and complicated taxonomies are discussed. Techniques to help the reader leverage taxonomies, including using preferred taxonomies, external categorization, and cluster analysis are described. Real world problems are raised, including the possibilities of encountering hierarchies, multiple types, and recursion. The chapter ends with a discussion comparing a taxonomy with a data model. • Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important. Ways of reducing the amount of data are presented, including screening and removing extraneous data, as well as parallelizing the workload. • Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. The traditional data warehouse processing technology is reviewed. In addition, the data warehouse appliance is discussed. • Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies: the Ablatz Medical Group, the Eastern Hills Oil Company, and the Amber Oil Company.
Release

Building a Data Warehouse

With Examples in SQL Server

Author: Vincent Rainardi

Publisher: Apress

ISBN: 9781590599310

Category: Computers

Page: 523

View: 6939

DOWNLOAD NOW »

Building a Data Warehouse: With Examples in SQL Server describes how to build a data warehouse completely from scratch and shows practical examples on how to do it. Author Vincent Rainardi also describes some practical issues he has experienced that developers are likely to encounter in their first data warehousing project, along with solutions and advice. The relational database management system (RDBMS) used in the examples is SQL Server; the version will not be an issue as long as the user has SQL Server 2005 or later. The book is organized as follows. In the beginning of this book (chapters 1 through 6), you learn how to build a data warehouse, for example, defining the architecture, understanding the methodology, gathering the requirements, designing the data models, and creating the databases. Then in chapters 7 through 10, you learn how to populate the data warehouse, for example, extracting from source systems, loading the data stores, maintaining data quality, and utilizing the metadata. After you populate the data warehouse, in chapters 11 through 15, you explore how to present data to users using reports and multidimensional databases and how to use the data in the data warehouse for business intelligence, customer relationship management, and other purposes. Chapters 16 and 17 wrap up the book: After you have built your data warehouse, before it can be released to production, you need to test it thoroughly. After your application is in production, you need to understand how to administer data warehouse operation. What you’ll learn A detailed understanding of what it takes to build a data warehouse The implementation code in SQL Server to build the data warehouse Dimensional modeling, data extraction methods, data warehouse loading, populating dimension and fact tables, data quality, data warehouse architecture, and database design Practical data warehousing applications such as business intelligence reports, analytics applications, and customer relationship management Who this book is for There are three audiences for the book. The first are the people who implement the data warehouse. This could be considered a field guide for them. The second is database users/admins who want to get a good understanding of what it would take to build a data warehouse. Finally, the third audience is managers who must make decisions about aspects of the data warehousing task before them and use the book to learn about these issues.
Release

Data Warehousing

Using the Wal-Mart Model

Author: Paul Westerman

Publisher: Morgan Kaufmann

ISBN: 9781558606845

Category: Computers

Page: 297

View: 7243

DOWNLOAD NOW »

What is data warehousing? -- Project planning -- Business exploration -- Business case study and ROI analysis -- Organizational integration -- Technology -- Database maintenance -- Technical construction of the Wal-Mart data warehouse -- Postimplementation of the Wal-Mart data warehouse -- Store operations sample analyses -- Merchandising sample analyses.
Release

The Microsoft Data Warehouse Toolkit

With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset

Author: Joy Mundy,Warren Thornthwaite

Publisher: John Wiley & Sons

ISBN: 9781118067956

Category: Computers

Page: 704

View: 6309

DOWNLOAD NOW »

Best practices and invaluable advice from world-renowned data warehouse experts In this book, leading data warehouse experts from the Kimball Group share best practices for using the upcoming “Business Intelligence release” of SQL Server, referred to as SQL Server 2008 R2. In this new edition, the authors explain how SQL Server 2008 R2 provides a collection of powerful new tools that extend the power of its BI toolset to Excel and SharePoint users and they show how to use SQL Server to build a successful data warehouse that supports the business intelligence requirements that are common to most organizations. Covering the complete suite of data warehousing and BI tools that are part of SQL Server 2008 R2, as well as Microsoft Office, the authors walk you through a full project lifecycle, including design, development, deployment and maintenance. Features more than 50 percent new and revised material that covers the rich new feature set of the SQL Server 2008 R2 release, as well as the Office 2010 release Includes brand new content that focuses on PowerPivot for Excel and SharePoint, Master Data Services, and discusses updated capabilities of SQL Server Analysis, Integration, and Reporting Services Shares detailed case examples that clearly illustrate how to best apply the techniques described in the book The accompanying Web site contains all code samples as well as the sample database used throughout the case studies The Microsoft Data Warehouse Toolkit, Second Edition provides you with the knowledge of how and when to use BI tools such as Analysis Services and Integration Services to accomplish your most essential data warehousing tasks.
Release

Data Warehouse

From Architecture to Implementation

Author: Barry Devlin

Publisher: Addison-Wesley Professional

ISBN: N.A

Category: Computers

Page: 432

View: 2461

DOWNLOAD NOW »

Data warehousing is one of the hottest topics in the computing industry. Written by Barry Devlin, one of the world's leading experts on data warehousing, this book gives you the insights and experiences gained over 10 years and offers the most comprehensive, practical guide to designing, building, and implementing a successful data warehouse. Included in this vital information is an explanation of the optimal three-tiered architecture for the data warehouse, with a clear division between data and information. Information systems managers will appreciate the full description of the functions needed to implement such an architecture, including reconciling existing, diverse data and deriving consistent, valuable business information.
Release