DW 2.0: The Architecture for the Next Generation of Data Warehousing

Author: W.H. Inmon,Derek Strauss,Genia Neushloss

Publisher: Elsevier

ISBN: 9780080558332

Category: Computers

Page: 400

View: 4115

DW 2.0: The Architecture for the Next Generation of Data Warehousing is the first book on the new generation of data warehouse architecture, DW 2.0, by the father of the data warehouse. The book describes the future of data warehousing that is technologically possible today, at both an architectural level and technology level. The perspective of the book is from the top down: looking at the overall architecture and then delving into the issues underlying the components. This allows people who are building or using a data warehouse to see what lies ahead and determine what new technology to buy, how to plan extensions to the data warehouse, what can be salvaged from the current system, and how to justify the expense at the most practical level. This book gives experienced data warehouse professionals everything they need in order to implement the new generation DW 2.0. It is designed for professionals in the IT organization, including data architects, DBAs, systems design and development professionals, as well as data warehouse and knowledge management professionals. * First book on the new generation of data warehouse architecture, DW 2.0. * Written by the "father of the data warehouse", Bill Inmon, a columnist and newsletter editor of The Bill Inmon Channel on the Business Intelligence Network. * Long overdue comprehensive coverage of the implementation of technology and tools that enable the new generation of the DW: metadata, temporal data, ETL, unstructured data, and data quality control.
Release

Building the Unstructured Data Warehouse

Architecture, Analysis, and Design

Author: Bill Inmon,Krish Krishnan

Publisher: Technics Publications

ISBN: 1634620348

Category: Computers

Page: 216

View: 5764

Learn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now! Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text. Master these ten objectives: • Build an unstructured data warehouse using the 11-step approach • Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure • Overcome challenges including blather, the Tower of Babel, and lack of natural relationships • Avoid the Data Junkyard and combat the “Spider’s Web” • Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0,including iterative development • Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement • Design the Document Inventory system and link unstructured text to structured data • Leverage indexes for efficient text analysis and taxonomies for useful external categorization • Manage large volumes of data using advanced techniques such as backward pointers • Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances The following outline briefly describes each chapter’s content: • Chapter 1 defines unstructured data and explains why text is the main focus of this book. The sources for text, including documents, email, and spreadsheets, are described in terms of factors such as homogeneity, relevance, and structure. • Chapter 2 addresses the challenges one faces when managing unstructured data. These challenges include volume, blather, the Tower of Babel, spelling, and lack of natural relationships. Learn how to avoid a data junkyard, which occurs when unstructured data is not properly integrated into the data warehouse. This chapter emphasizes the importance of storing integrated unstructured data in a relational structure. We are cautioned on both the commonality and dangers associated with text based on paper. • Chapter 3 begins with a timeline of applications, highlighting their evolution over the decades. Eventually, powerful yet siloed applications created a “spider’s web” environment. This chapter describes how data warehouses solved many problems, including the creation of corporate data, the ability to get out of the maintenance backlog conundrum, and greater data integrity and data accessibility. There were problems, however, with the data warehouse that were addressed in Data Warehouse 2.0 (DW 2.0), such as the inevitable data lifecycle. This chapter discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development. • Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL). This chapter has separate sections on extracting text, transforming text, and loading text. The chapter emphasizes the issues around source data. There are a wide variety of sources, and each of the sources has its own set of considerations. Extracting pointers are provided, such as reading documents only once and recognizing common and different file types. Transforming text requires addressing many considerations discussed in this chapter, including phrase recognition, stop word filtering, and synonym replacement. Loading text is the final step. There are important points to understand here, too, that are explained in this chapter, such as the importance of the thematic approach and knowing how to handle large volumes of data. Two ETL examples are provided, one on email and one on spreadsheets. • Chapter 5 describes the 11 steps required to develop the unstructured data warehouse. The methodology explained in this chapter is a combination of both traditional system development lifecycle and spiral approaches. • Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value. The Document Inventory is discussed, which is similar to a library card catalog used for organizing corporate documents. This chapter explores ways of linking unstructured text to structured data. The emphasis is on taking unstructured data and reducing it into a form of data that is structured. Related concepts to linking, such as probabilistic linkages and dynamic linkages, are discussed. • Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes. • Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse. Both simple and complicated taxonomies are discussed. Techniques to help the reader leverage taxonomies, including using preferred taxonomies, external categorization, and cluster analysis are described. Real world problems are raised, including the possibilities of encountering hierarchies, multiple types, and recursion. The chapter ends with a discussion comparing a taxonomy with a data model. • Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important. Ways of reducing the amount of data are presented, including screening and removing extraneous data, as well as parallelizing the workload. • Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. The traditional data warehouse processing technology is reviewed. In addition, the data warehouse appliance is discussed. • Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies: the Ablatz Medical Group, the Eastern Hills Oil Company, and the Amber Oil Company.
Release

Bitemporal Data

Theory and Practice

Author: Tom Johnston

Publisher: Newnes

ISBN: 0124080553

Category: Computers

Page: 400

View: 4667

Bitemporal data has always been important. But it was not until 2011 that the ISO released a SQL standard that supported it. Currently, among major DBMS vendors, Oracle, IBM and Teradata now provide at least some bitemporal functionality in their flagship products. But to use these products effectively, someone in your IT organization needs to know more than how to code bitemporal SQL statements. Perhaps, in your organization, that person is you. To correctly interpret business requests for temporal data, to correctly specify requirements to your IT development staff, and to correctly design bitemporal databases and applications, someone in your enterprise needs a deep understanding of both the theory and the practice of managing bitemporal data. Someone also needs to understand what the future may bring in the way of additional temporal functionality, so their enterprise can plan for it. Perhaps, in your organization, that person is you. This is the book that will show the do-it-yourself IT professional how to design and build bitemporal databases and how to write bitemporal transactions and queries, and will show those who will direct the use of vendor-provided bitemporal DBMSs exactly what is going on "under the covers" of that software. Explains the business value of bitemporal data in terms of the information that can be provided by bitemporal tables and not by any other form of temporal data, including history tables, version tables, snapshot tables, or slowly-changing dimensions. Provides an integrated account of the mathematics, logic, ontology and semantics of relational theory and relational databases, in terms of which current relational theory and practice can be seen as unnecessarily constrained to the management of nontemporal and incompletely temporal data. Explains how bitemporal tables can provide the time-variance and nonvolatility hitherto lacking in Inmon historical data warehouses. Explains how bitemporal dimensions can replace slowly-changing dimensions in Kimball star schemas, and why they should do so. Describes several extensions to the current theory and practice of bitemporal data, including the use of episodes, "whenever" temporal transactions and queries, and future transaction time. Points out a basic error in the ISO’s bitemporal SQL standard, and warns practitioners against the use of that faulty functionality. Recommends six extensions to the ISO standard which will increase the business value of bitemporal data. Points towards a tritemporal future for bitemporal data, in which an Aristotelian ontology and a speech-act semantics support the direct management of the statements inscribed in the rows of relational tables, and add the ability to track the provenance of database content to existing bitemporal databases. This book also provides the background needed to become a business ontologist, and explains why an IT data management person, deeply familiar with corporate databases, is best suited to play that role. Perhaps, in your organization, that person is you.
Release

Analytische Informationssysteme

Business Intelligence-Technologien und -Anwendungen

Author: Peter Gluchowski,Peter Chamoni

Publisher: Springer-Verlag

ISBN: 3662477637

Category: Business & Economics

Page: 354

View: 5593

Informationssysteme für die analytischen Aufgaben von Fach- und Führungskräften treten verstärkt in den Vordergrund. Dieses etablierte Buch diskutiert und evaluiert Begriffe und Konzepte wie Business Intelligence und Big Data. Die aktualisierte und erweiterte fünfte Auflage liefert einen aktuellen Überblick zu Technologien, Produkten und Trends im Bereich analytischer Informationssysteme. Beiträge aus Wirtschaft und Wissenschaft geben einen umfassenden Überblick und eignen sich als fundierte Entscheidungsgrundlage beim Aufbau und Einsatz derartiger Technologien.
Release

Data Virtualization for Business Intelligence Systems

Revolutionizing Data Integration for Data Warehouses

Author: Rick van der Lans

Publisher: Elsevier

ISBN: 0123978173

Category: Computers

Page: 296

View: 9111

Data virtualization can help you accomplish your goals with more flexibility and agility. Learn what it is and how and why it should be used with Data Virtualization for Business Intelligence Systems. In this book, expert author Rick van der Lans explains how data virtualization servers work, what techniques to use to optimize access to various data sources and how these products can be applied in different projects. You’ll learn the difference is between this new form of data integration and older forms, such as ETL and replication, and gain a clear understanding of how data virtualization really works. Data Virtualization for Business Intelligence Systems outlines the advantages and disadvantages of data virtualization and illustrates how data virtualization should be applied in data warehouse environments. You’ll come away with a comprehensive understanding of how data virtualization will make data warehouse environments more flexible and how it make developing operational BI applications easier. Van der Lans also describes the relationship between data virtualization and related topics, such as master data management, governance, and information management, so you come away with a big-picture understanding as well as all the practical know-how you need to virtualize your data. First independent book on data virtualization that explains in a product-independent way how data virtualization technology works. Illustrates concepts using examples developed with commercially available products. Shows you how to solve common data integration challenges such as data quality, system interference, and overall performance by following practical guidelines on using data virtualization. Apply data virtualization right away with three chapters full of practical implementation guidance. Understand the big picture of data virtualization and its relationship with data governance and information management.
Release

Data Warehouse & Data Mining

Author: Roland Gabriel,Peter Gluchowski,Alexander Pastwa

Publisher: W3l GmbH

ISBN: 3937137661

Category:

Page: 234

View: 2899

Release

Tapping into Unstructured Data

Integrating Unstructured Data and Textual Analytics into Business Intelligence

Author: William H. Inmon,Anthony Nesavich

Publisher: Pearson Education

ISBN: 9780132712910

Category: Business & Economics

Page: 288

View: 4925

The Definitive Guide to Unstructured Data Management and Analysis--From the World’s Leading Information Management Expert A wealth of invaluable information exists in unstructured textual form, but organizations have found it difficult or impossible to access and utilize it. This is changing rapidly: new approaches finally make it possible to glean useful knowledge from virtually any collection of unstructured data. William H. Inmon--the father of data warehousing--and Anthony Nesavich introduce the next data revolution: unstructured data management. Inmon and Nesavich cover all you need to know to make unstructured data work for your organization. You’ll learn how to bring it into your existing structured data environment, leverage existing analytical infrastructure, and implement textual analytic processing technologies to solve new problems and uncover new opportunities. Inmon and Nesavich introduce breakthrough techniques covered in no other book--including the powerful role of textual integration, new ways to integrate textual data into data warehouses, and new SQL techniques for reading and analyzing text. They also present five chapter-length, real-world case studies--demonstrating unstructured data at work in medical research, insurance, chemical manufacturing, contracting, and beyond. This book will be indispensable to every business and technical professional trying to make sense of a large body of unstructured text: managers, database designers, data modelers, DBAs, researchers, and end users alike. Coverage includes What unstructured data is, and how it differs from structured data First generation technology for handling unstructured data, from search engines to ECM--and its limitations Integrating text so it can be analyzed with a common, colloquial vocabulary: integration engines, ontologies, glossaries, and taxonomies Processing semistructured data: uncovering patterns, words, identifiers, and conflicts Novel processing opportunities that arise when text is freed from context Architecture and unstructured data: Data Warehousing 2.0 Building unstructured relational databases and linking them to structured data Visualizations and Self-Organizing Maps (SOMs), including Compudigm and Raptor solutions Capturing knowledge from spreadsheet data and email Implementing and managing metadata: data models, data quality, and more
Release

Data Warehousing Strategie

Erfahrungen, Methoden, Visionen

Author: Reinhard Jung,Robert Winter

Publisher: Springer-Verlag

ISBN: 3642583504

Category: Business & Economics

Page: 284

View: 3685

Data Warehousing ist seit einigen Jahren in vielen Branchen ein zentrales Thema. Die anfängliche Euphorie täuschte jedoch darüber hinweg, dass zur praktischen Umsetzung gesicherte Methoden und Vorgehensmodelle fehlten. Dieses Buch stellt einen Beitrag zur Überwindung dieser Lücke zwischen Anspruch und Wirklichkeit dar. Es gibt im ersten Teil einen Überblick über aktuelle Ergebnisse im Bereich des Data Warehousing mit einem Fokus auf methodischen und betriebswirtschaftlichen Aspekten. Es finden sich u.a. Beiträge zur Wirtschaftlichkeitsanalyse, zur organisatorischen Einbettung des Data Warehousing, zum Datenqualitätsmanagement, zum integrierten Metadatenmanagement und zu datenschutzrechtlichen Aspekten sowie ein Beitrag zu möglichen zukünftigen Entwicklungsrichtungen des Data Warehousing. Im zweiten Teil berichten Projektleiter umfangreicher Data Warehousing-Projekte über Erfahrungen und Best Practices.
Release

Data Warehouse Blueprints

Business Intelligence in der Praxis

Author: Dani Schnider,Claus Jordan,Peter Welker,Joachim Wehner

Publisher: Carl Hanser Verlag GmbH Co KG

ISBN: 3446451455

Category: Computers

Page: 283

View: 8104

Data-Warehouse-Lösungen mit Blueprints erfolgreich umsetzen Dieses Buch gibt Ihnen einen Überblick über eine typische Data-Warehouse-Architektur und zeigt anhand von zahlreichen Best Practice-Beispielen, wie Sie die einzelnen Komponenten eines Data Warehouses realisieren und betreiben können. Skalierbarkeit, Performance und Integration sind dabei die wichtigsten Erfolgsfaktoren. Der kompakte und kompetente Leitfaden für Ihr Projekt Warum benötigt man eine Staging Area? Wie sollen fehlende oder fehlerhafte Daten beim Ladeprozess behandelt werden? Ist es zweckmäßiger, einen oder mehrere Data Marts zu erstellen? Wo werden die Daten aus verschiedenen Datenquellen integriert und wie sollen sie historisiert werden? Zu diesen und vielen weiteren Fragen erhalten Sie Antworten sowie Tipps und Tricks aus der Praxis. Wertvolles Know-how aus der Praxis Profitieren Sie von der langjährigen Erfahrung der Autoren. Die vorgestellten Konzepte und Vorgehensweisen haben sich bereits in zahlreichen Projekten bewährt. EXTRA: E-Book inside AUS DEM INHALT • Einleitung • Architektur • Datenmodellierung • Datenintegration • Design der DWH-Schichten • Physisches Datenbankdesign • BI-Anwendungen • Betrieb
Release

Proceedings of the 1999 Congress on Evolutionary Computation

Cec99 : July 6-9, 1999 Mayflower Hotel Washington, D.C. USA

Author: Congress on Evolutionary Computation,IEEE Neural Networks Council

Publisher: Institute of Electrical & Electronics Engineers(IEEE)

ISBN: 9780780355361

Category: Computers

Page: 2348

View: 3571

Release

Data-Warehouse-Systeme

Architektur, Entwicklung, Anwendung

Author: Andreas Bauer

Publisher: N.A

ISBN: 9783898647854

Category: Data warehousing

Page: 690

View: 9129

Hauptbeschreibung Dieses Lehrbuch gibt einen fundierten Einblick sowohl in die Architektur und Entwicklung eines Data-Warehouse-Systems als auch in den gesamten Ablauf des Data-Warehouse-Prozesses - vom Laden der Daten bis zu deren Auswertung. Der Schwerpunkt liegt auf den Datenbanken und deren Konzeption, Modellierung und Optimierung. Die Autoren zeigen u. a. betriebswirtschaftliche Einsatzbereiche sowie wissenschaftliche und technische Anwendungsgebiete auf und geben Hinweise für den Aufbau und die Wartung eines Data-Warehouse-Systems. Begriffsdefinitionen und ein durchgängiges An.
Release

Data Warehousing 2000

Methoden, Anwendungen, Strategien

Author: Reinhard Jung,Robert Winter

Publisher: Springer-Verlag

ISBN: 3642576818

Category: Business & Economics

Page: 393

View: 743

Data Warehousing hat in den letzten Jahren in vielen Unternehmen stark an Bedeutung gewonnen und ist dabei zu einer der zentralen Herausforderungen im Informationsmanagement geworden. Für viele Anwendungen, wie beispielsweise Customer-Relationship-Management oder Führungsinformationssysteme, bilden Data-Warehouse-Architekturen eine wesentliche Grundlage. Der Tagungsband zur Konferenz "Data Warehousing 2000 - Methoden, Anwendungen, Strategien" gibt einen Überblick zum State-of-the-Art sowohl im Bereich Entwicklung aus technischer und organisatorischer bzw. betriebswirtschaftlicher Sicht als auch im Bereich der vielfältigen Anwendungsmöglichkeiten einer Data-Warehouse-Architektur. Neben Aufsätzen aus dem wissenschaftlichen Bereich finden sich auch Berichte aus laufenden und abgeschlossenen Projekten im Umfeld des Data Warehousing.
Release

Big Data

Die Revolution, die unser Leben verändern wird

Author: Viktor Mayer-Schönberger,Viktor; Cukier Mayer-Schönberger

Publisher: Redline Wirtschaft

ISBN: 3864144590

Category: Political Science

Page: 288

View: 6280

Ob Kaufverhalten, Grippewellen oder welche Farbe am ehesten verrät, ob ein Gebrauchtwagen in einem guten Zustand ist – noch nie gab es eine solche Menge an Daten und noch nie bot sich die Chance, durch Recherche und Kombination in der Daten¬flut blitzschnell Zusammenhänge zu entschlüsseln. Big Data bedeutet nichts weniger als eine Revolution für Gesellschaft, Wirtschaft und Politik. Es wird die Weise, wie wir über Gesundheit, Erziehung, Innovation und vieles mehr denken, völlig umkrempeln. Und Vorhersagen möglich machen, die bisher undenkbar waren. Die Experten Viktor Mayer-Schönberger und Kenneth Cukier beschreiben in ihrem Buch, was Big Data ist, welche Möglichkeiten sich eröffnen, vor welchen Umwälzungen wir alle stehen – und verschweigen auch die dunkle Seite wie das Ausspähen von persönlichen Daten und den drohenden Verlust der Privatsphäre nicht.
Release