Programming Hive

Author: Edward Capriolo,Dean Wampler,Jason Rutherglen

Publisher: "O'Reilly Media, Inc."

ISBN: 1449326986

Category: Computers

Page: 350

View: 1532

DOWNLOAD NOW »

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem. This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data. Use Hive to create, alter, and drop databases, tables, views, functions, and indexes Customize data formats and storage options, from files to external databases Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods Gain best practices for creating user defined functions (UDFs) Learn Hive patterns you should use and anti-patterns you should avoid Integrate Hive with other data processing programs Use storage handlers for NoSQL databases and other datastores Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce
Release

Network Data Analytics

A Hands-On Approach for Application Development

Author: K. G. Srinivasa,Siddesh G. M.,Srinidhi H.

Publisher: Springer

ISBN: 3319778005

Category: Computers

Page: 398

View: 9405

DOWNLOAD NOW »

In order to carry out data analytics, we need powerful and flexible computing software. However the software available for data analytics is often proprietary and can be expensive. This book reviews Apache tools, which are open source and easy to use. After providing an overview of the background of data analytics, covering the different types of analysis and the basics of using Hadoop as a tool, it focuses on different Hadoop ecosystem tools, like Apache Flume, Apache Spark, Apache Storm, Apache Hive, R, and Python, which can be used for different types of analysis. It then examines the different machine learning techniques that are useful for data analytics, and how to visualize data with different graphs and charts. Presenting data analytics from a practice-oriented viewpoint, the book discusses useful tools and approaches for data analytics, supported by concrete code examples. The book is a valuable reference resource for graduate students and professionals in related fields, and is also of interest to general readers with an understanding of data analytics.
Release

Handbook of Research on Big Data Storage and Visualization Techniques

Author: Segall, Richard S.,Cook, Jeffrey S.

Publisher: IGI Global

ISBN: 1522531432

Category: Computers

Page: 917

View: 6715

DOWNLOAD NOW »

The digital age has presented an exponential growth in the amount of data available to individuals looking to draw conclusions based on given or collected information across industries. Challenges associated with the analysis, security, sharing, storage, and visualization of large and complex data sets continue to plague data scientists and analysts alike as traditional data processing applications struggle to adequately manage big data. The Handbook of Research on Big Data Storage and Visualization Techniques is a critical scholarly resource that explores big data analytics and technologies and their role in developing a broad understanding of issues pertaining to the use of big data in multidisciplinary fields. Featuring coverage on a broad range of topics, such as architecture patterns, programing systems, and computational energy, this publication is geared towards professionals, researchers, and students seeking current research and application topics on the subject.
Release

IBM Information Server: Integration and Governance for Emerging Data Warehouse Demands

Author: Chuck Ballard,Manish Bhide,Holger Kache,Bob Kitzberger,Beate Porst,Yeh-Heng Sheng,Harald C. Smith,IBM Redbooks

Publisher: IBM Redbooks

ISBN: 0738438499

Category: Computers

Page: 194

View: 6702

DOWNLOAD NOW »

This IBM® Redbooks® publication is intended for business leaders and IT architects who are responsible for building and extending their data warehouse and Business Intelligence infrastructure. It provides an overview of powerful new capabilities of Information Server in the areas of big data, statistical models, data governance and data quality. The book also provides key technical details that IT professionals can use in solution planning, design, and implementation.
Release

Hadoop: The Definitive Guide

Author: Tom White

Publisher: "O'Reilly Media, Inc."

ISBN: 1449311520

Category: Computers

Page: 657

View: 5496

DOWNLOAD NOW »

Counsels programmers and administrators for big and small organizations on how to work with large-scale application datasets using Apache Hadoop, discussing its capacity for storing and processing large amounts of data while demonstrating best practices for building reliable and scalable distributed systems.
Release

Planning for Big Data

Author: Edd Wilder-James

Publisher: "O'Reilly Media, Inc."

ISBN: 1449329640

Category: Computers

Page: 83

View: 5933

DOWNLOAD NOW »

In an age where everything is measurable, understanding big data is an essential. From creating new data-driven products through to increasing operational efficiency, big data has the potential to make your organization both more competitive and more innovative. As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven. Written by O'Reilly Radar's experts on big data, this anthology describes: The broad industry changes heralded by the big data era What big data is, what it means to your business, and how to start solving data problems The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions The landscape of NoSQL databases and their relative merits How visualization plays an important part in data work
Release

Practical Hive

A Guide to Hadoop's Data Warehouse System

Author: Scott Shaw,Andreas François Vermeulen,Ankur Gupta,David Kjerrumgaard

Publisher: Apress

ISBN: 1484202716

Category: Computers

Page: 265

View: 6788

DOWNLOAD NOW »

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.
Release

Hadoop For Dummies

Author: Dirk deRoos

Publisher: John Wiley & Sons

ISBN: 1118607554

Category: Computers

Page: 416

View: 9641

DOWNLOAD NOW »

Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.
Release

Oracle Essentials

Oracle Database 12c

Author: Rick Greenwald,Robert Stackowiak,Jonathan Stern

Publisher: "O'Reilly Media, Inc."

ISBN: 1449343171

Category: Computers

Page: 432

View: 9454

DOWNLOAD NOW »

Written by Oracle insiders, this indispensable guide distills an enormous amount of information about the Oracle Database into one compact volume. Ideal for novice and experienced DBAs, developers, managers, and users, Oracle Essentials walks you through technologies and features in Oracle’s product line, including its architecture, data structures, networking, concurrency, and tuning. Complete with illustrations and helpful hints, this fifth edition provides a valuable one-stop overview of Oracle Database 12c, including an introduction to Oracle and cloud computing. Oracle Essentials provides the conceptual background you need to understand how Oracle truly works. Topics include: A complete overview of Oracle databases and data stores, and Fusion Middleware products and features Core concepts and structures in Oracle’s architecture, including pluggable databases Oracle objects and the various datatypes Oracle supports System and database management, including Oracle Enterprise Manager 12c Security options, basic auditing capabilities, and options for meeting compliance needs Performance characteristics of disk, memory, and CPU tuning Basic principles of multiuser concurrency Oracle’s online transaction processing (OLTP) Data warehouses, Big Data, and Oracle’s business intelligence tools Backup and recovery, and high availability and failover solutions
Release

Professional Hadoop Solutions

Author: Boris Lublinsky,Kevin T. Smith,Alexey Yakubovich

Publisher: John Wiley & Sons

ISBN: 1118824180

Category: Computers

Page: 504

View: 7876

DOWNLOAD NOW »

The go-to guidebook for deploying Big Data solutions withHadoop Today's enterprise architects need to understand how the Hadoopframeworks and APIs fit together, and how they can be integrated todeliver real-world solutions. This book is a practical, detailedguide to building and implementing those solutions, with code-levelinstruction in the popular Wrox tradition. It covers storing datawith HDFS and Hbase, processing data with MapReduce, and automatingdata processing with Oozie. Hadoop security, running Hadoop withAmazon Web Services, best practices, and automating Hadoopprocesses in real time are also covered in depth. With in-depth code examples in Java and XML and the latest onrecent additions to the Hadoop ecosystem, this complete resourcealso covers the use of APIs, exposing their inner workings andallowing architects and developers to better leverage and customizethem. The ultimate guide for developers, designers, and architectswho need to build and deploy Hadoop applications Covers storing and processing data with various technologies,automating data processing, Hadoop security, and deliveringreal-time solutions Includes detailed, real-world examples and code-levelguidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in theprogrammer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprisearchitects and developers need to maximize the power of Hadoop.
Release