Cloudera Administration Handbook

Author: Rohit Menon

Publisher: Packt Publishing Ltd

ISBN: 1783558970

Category: Computers

Page: 254

View: 6811

An easy-to-follow Apache Hadoop administrator’s guide filled with practical screenshots and explanations for each step and configuration. This book is great for administrators interested in setting up and managing a large Hadoop cluster. If you are an administrator, or want to be an administrator, and you are ready to build and maintain a production-level cluster running CDH5, then this book is for you.
Release

Expert Hadoop Administration

Managing, Tuning, and Securing Spark, YARN, and HDFS

Author: Sam R. Alapati

Publisher: Addison-Wesley Professional

ISBN: 0134703383

Category: Computers

Page: 848

View: 3912

This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book. The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.” —Paul Dix, Series Editor In Expert Hadoop® Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop
Release

Hadoop Operations

A Guide for Developers and Administrators

Author: Eric Sammer

Publisher: "O'Reilly Media, Inc."

ISBN: 144932729X

Category: Computers

Page: 298

View: 4214

If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance. Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments. Get a high-level overview of HDFS and MapReduce: why they exist and how they work Plan a Hadoop deployment, from hardware and OS selection to network requirements Learn setup and configuration details with a list of critical properties Manage resources by sharing a cluster across multiple groups Get a runbook of the most common cluster maintenance tasks Monitor Hadoop clusters—and learn troubleshooting with the help of real-world war stories Use basic tools and techniques to handle backup and catastrophic failure
Release

Learning Cloudera Impala

Author: Avkash Chauhan

Publisher: Packt Publishing Ltd

ISBN: 1783281286

Category: Computers

Page: 150

View: 558

This book is an easy-to-follow, step-by-step tutorial where each chapter takes your knowledge to the next level. The book covers practical knowledge with tips to implement this knowledge in real-world scenarios. A chapter with a real-life example is included to help you understand the concepts in full.Using Cloudera Impala is for those who really want to take advantage of their Hadoop cluster by processing extremely large amounts of raw data in Hadoop at real-time speed. Prior knowledge of Hadoop and some exposure to HIVE and MapReduce is expected.
Release

Hadoop: The Definitive Guide

Author: Tom White

Publisher: "O'Reilly Media, Inc."

ISBN: 1449338771

Category: Computers

Page: 688

View: 8035

Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems
Release

The Python Apprentice

Author: Robert Smallshire,Austin Bingham

Publisher: Packt Publishing Ltd

ISBN: 1788298667

Category: Computers

Page: 352

View: 3269

Learn the Python skills and culture you need to become a productive member of any Python project. About This Book Taking a practical approach to studying Python A clear appreciation of the sequence-oriented parts of Python Emphasis on the way in which Python code is structured Learn how to produce bug-free code by using testing tools Who This Book Is For The Python Apprentice is for anyone who wants to start building, creating and contributing towards a Python project. No previous knowledge of Python is required, although at least some familiarity with programming in another language is helpful. What You Will Learn Learn the language of Python itself Get a start on the Python standard library Learn how to integrate 3rd party libraries Develop libraries on your own Become familiar with the basics of Python testing In Detail Experienced programmers want to know how to enhance their craft and we want to help them start as apprentices with Python. We know that before mastering Python you need to learn the culture and the tools to become a productive member of any Python project. Our goal with this book is to give you a practical and thorough introduction to Python programming, providing you with the insight and technical craftsmanship you need to be a productive member of any Python project. Python is a big language, and it's not our intention with this book to cover everything there is to know. We just want to make sure that you, as the developer, know the tools, basic idioms and of course the ins and outs of the language, the standard library and other modules to be able to jump into most projects. Style and approach We introduce topics gently and then revisit them on multiple occasions to add the depth required to support your progression as a Python developer. We've worked hard to structure the syllabus to avoid forward references. On only a few occasions do we require you to accept techniques on trust, before explaining them later; where we do, it's to deliberately establish good habits.
Release

Hadoop 2.x Administration Cookbook

Author: Gurmukh Singh

Publisher: Packt Publishing Ltd

ISBN: 1787126870

Category: Computers

Page: 348

View: 3449

Over 100 practical recipes to help you become an expert Hadoop administrator About This Book Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster Import and export data into Hive and use Oozie to manage workflow. Practical recipes will help you plan and secure your Hadoop cluster, and make it highly available Who This Book Is For If you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It's also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problems What You Will Learn Set up the Hadoop architecture to run a Hadoop cluster smoothly Maintain a Hadoop cluster on HDFS, YARN, and MapReduce Understand high availability with Zookeeper and Journal Node Configure Flume for data ingestion and Oozie to run various workflows Tune the Hadoop cluster for optimal performance Schedule jobs on a Hadoop cluster using the Fair and Capacity scheduler Secure your cluster and troubleshoot it for various common pain points In Detail Hadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems encountered in Hadoop administration. The book begins with laying the foundation by showing you the steps needed to set up a Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further on, you will explore durability and high availability of a Hadoop cluster. You'll get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get hands-on experience with the backup and recovery options and the performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics, and best practices in Hadoop administration. By the end of this book, you will have a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it, and configure auditing for your Hadoop clusters. Style and approach This book contains short recipes that will help you run a Hadoop cluster efficiently. The recipes are solutions to real-life problems that administrators encounter while working with a Hadoop cluster
Release

Oracle Big Data Handbook

Author: Tom Plunkett,Brian Macdonald,Bruce Nelson,Mark Hornick,Helen Sun,Keith Laker,Khader Mohiuddin,Debra Harding,Gokula Mishra,David Segleau,Robert Stackowiak

Publisher: McGraw Hill Professional

ISBN: 0071827269

Category: Computers

Page: 464

View: 9644

"Cowritten by members of Oracle's big data team, [this book] provides complete coverage of Oracle's comprehensive, integrated set of products for acquiring, organizing, analyzing, and leveraging unstructured data. The book discusses the strategies and technologies essential for a successful big data implementation, including Apache Hadoop, Oracle Big Data Appliance, Oracle Big Data Connectors, Oracle NoSQL Database, Oracle Endeca, Oracle Advanced Analytics, and Oracle's open source R offerings"--Page 4 of cover.
Release

Big Data For Dummies

Author: Judith Hurwitz,Alan Nugent,Fern Halper,Marcia Kaufman

Publisher: John Wiley & Sons

ISBN: 1118644174

Category: Computers

Page: 336

View: 4757

Find the right big data solution for your business or organization Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work. Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals Authors are experts in information management, big data, and a variety of solutions Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more Provides essential information in a no-nonsense, easy-to-understand style that is empowering Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.
Release

HBase: The Definitive Guide

Random Access to Your Planet-Size Data

Author: Lars George

Publisher: "O'Reilly Media, Inc."

ISBN: 1449315224

Category: Computers

Page: 556

View: 5519

If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away. Discover how tight integration with Hadoop makes scalability with HBase easier Distribute large datasets across an inexpensive cluster of commodity servers Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks
Release

Executing Data Quality Projects

Ten Steps to Quality Data and Trusted Information (TM)

Author: Danette McGilvray

Publisher: Elsevier

ISBN: 0080558399

Category: Computers

Page: 352

View: 3230

Information is currency. Recent studies show that data quality problems are costing businesses billions of dollars each year, with poor data linked to waste and inefficiency, damaged credibility among customers and suppliers, and an organizational inability to make sound decisions. In this important and timely new book, Danette McGilvray presents her “Ten Steps approach to information quality, a proven method for both understanding and creating information quality in the enterprise. Her trademarked approach—in which she has trained Fortune 500 clients and hundreds of workshop attendees—applies to all types of data and to all types of organizations. * Includes numerous templates, detailed examples, and practical advice for executing every step of the “Ten Steps approach. * Allows for quick reference with an easy-to-use format highlighting key concepts and definitions, important checkpoints, communication activities, and best practices. * A companion Web site includes links to numerous data quality resources, including many of the planning and information-gathering templates featured in the text, quick summaries of key ideas from the Ten Step methodology, and other tools and information available online.
Release

Hadoop Backup and Recovery Solutions

Author: Gaurav Barot,Chintan Mehta,Amij Patel

Publisher: Packt Publishing Ltd

ISBN: 1783289058

Category: Computers

Page: 206

View: 4637

Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures. Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases. You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you'll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you'll explore troubleshooting strategies and techniques to resolve failures.
Release

Django 2 by Example

Build powerful and reliable Python web applications from scratch

Author: Antonio Mele

Publisher: Packt Publishing Ltd

ISBN: 1788472004

Category: Computers

Page: 526

View: 4251

Learn Django 2.0 with four end-to-end projects Key Features Learn Django by building real-world web applications from scratch Develop powerful web applications quickly using the best coding practices Integrate other technologies into your application with clear, step-by-step explanations and comprehensive example code Book Description If you want to learn about the entire process of developing professional web applications with Django, then this book is for you. This book will walk you through the creation of four professional Django projects, teaching you how to solve common problems and implement best practices. You will learn how to build a blog application, a social image-bookmarking website, an online shop, and an e-learning platform. The book will teach you how to enhance your applications with AJAX, create RESTful APIs, and set up a production environment for your Django projects. The book walks you through the creation of real-world applications, while solving common problems and implementing best practices. By the end of this book, you will have a deep understanding of Django and how to build advanced web applications What you will learn Build practical, real-world web applications with Django Use Django with other technologies, such as Redis and Celery Develop pluggable Django applications Create advanced features, optimize your code, and use the cache framework Add internationalization to your Django projects Enhance your user experience using JavaScript and AJAX Add social features to your projects Build RESTful APIs for your applications Who this book is for If you are a web developer who wants to see how to build professional sites with Django, this book is for you. You will need a basic knowledge of Python, HTML, and JavaScript, but you don't need to have worked with Django before.
Release

Hadoop Security

Protecting Your Big Data Platform

Author: Ben Spivey,Joey Echeverria

Publisher: "O'Reilly Media, Inc."

ISBN: 1491901349

Category: Computers

Page: 340

View: 7349

As more corporations turn to Hadoop to store and process their most valuable data, the risk of a potential breach of those systems increases exponentially. This practical book not only shows Hadoop administrators and security architects how to protect Hadoop data from unauthorized access, it also shows how to limit the ability of an attacker to corrupt or modify data in the event of a security breach. Authors Ben Spivey and Joey Echeverria provide in-depth information about the security features available in Hadoop, and organize them according to common computer security concepts. You’ll also get real-world examples that demonstrate how you can apply these concepts to your use cases. Understand the challenges of securing distributed systems, particularly Hadoop Use best practices for preparing Hadoop cluster hardware as securely as possible Get an overview of the Kerberos network authentication protocol Delve into authorization and accounting principles as they apply to Hadoop Learn how to use mechanisms to protect data in a Hadoop cluster, both in transit and at rest Integrate Hadoop data ingest into enterprise-wide security architecture Ensure that security architecture reaches all the way to end-user access
Release

Advanced Analytics with Spark

Patterns for Learning from Data at Scale

Author: Sandy Ryza,Uri Laserson,Sean Owen,Josh Wills

Publisher: "O'Reilly Media, Inc."

ISBN: 1491972904

Category: Computers

Page: 280

View: 8600

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses
Release

Parallel R

Data Analysis in the Distributed World

Author: Q. Ethan McCallum,Stephen Weston

Publisher: "O'Reilly Media, Inc."

ISBN: 1449320333

Category: Computers

Page: 126

View: 9911

It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets, including three chapters on using R and Hadoop together. You’ll learn the basics of Snow, Multicore, Parallel, Segue, RHIPE, and Hadoop Streaming, including how to find them, how to use them, when they work well, and when they don’t. With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier. Snow: works well in a traditional cluster environment Multicore: popular for multiprocessor and multicore computers Parallel: part of the upcoming R 2.14.0 release R+Hadoop: provides low-level access to a popular form of cluster computing RHIPE: uses Hadoop’s power with R’s language and interactive shell Segue: lets you use Elastic MapReduce as a backend for lapply-style operations
Release

Monitoring Hadoop

Author: Gurmukh Singh

Publisher: Packt Publishing Ltd

ISBN: 1783281561

Category: Computers

Page: 100

View: 783

This book is useful for Hadoop administrators who need to learn how to monitor and diagnose their clusters. Also, the book will prove useful for new users of the technology, as the language used is simple and easy to grasp.
Release

Hadoop Operations and Cluster Management Cookbook

Author: Shumin Guo

Publisher: Packt Publishing Ltd

ISBN: 1782165177

Category: Computers

Page: 368

View: 3543

Solve specific problems using individual self-contained code recipes, or work through the book to develop your capabilities. This book is packed with easy-to-follow code and commands used for illustration, which makes your learning curve easy and quick.If you are a Hadoop cluster system administrator with Unix/Linux system management experience and you are looking to get a good grounding in how to set up and manage a Hadoop cluster, then this book is for you. It's assumed that you will have some experience in Unix/Linux command line already, as well as being familiar with network communication basics.
Release

Mobile Forensics – Advanced Investigative Strategies

Author: Oleg Afonin,Vladimir Katalov

Publisher: Packt Publishing Ltd

ISBN: 178646408X

Category: Computers

Page: 412

View: 819

Master powerful strategies to acquire and analyze evidence from real-life scenarios About This Book A straightforward guide to address the roadblocks face when doing mobile forensics Simplify mobile forensics using the right mix of methods, techniques, and tools Get valuable advice to put you in the mindset of a forensic professional, regardless of your career level or experience Who This Book Is For This book is for forensic analysts and law enforcement and IT security officers who have to deal with digital evidence as part of their daily job. Some basic familiarity with digital forensics is assumed, but no experience with mobile forensics is required. What You Will Learn Understand the challenges of mobile forensics Grasp how to properly deal with digital evidence Explore the types of evidence available on iOS, Android, Windows, and BlackBerry mobile devices Know what forensic outcome to expect under given circumstances Deduce when and how to apply physical, logical, over-the-air, or low-level (advanced) acquisition methods Get in-depth knowledge of the different acquisition methods for all major mobile platforms Discover important mobile acquisition tools and techniques for all of the major platforms In Detail Investigating digital media is impossible without forensic tools. Dealing with complex forensic problems requires the use of dedicated tools, and even more importantly, the right strategies. In this book, you'll learn strategies and methods to deal with information stored on smartphones and tablets and see how to put the right tools to work. We begin by helping you understand the concept of mobile devices as a source of valuable evidence. Throughout this book, you will explore strategies and "plays" and decide when to use each technique. We cover important techniques such as seizing techniques to shield the device, and acquisition techniques including physical acquisition (via a USB connection), logical acquisition via data backups, over-the-air acquisition. We also explore cloud analysis, evidence discovery and data analysis, tools for mobile forensics, and tools to help you discover and analyze evidence. By the end of the book, you will have a better understanding of the tools and methods used to deal with the challenges of acquiring, preserving, and extracting evidence stored on smartphones, tablets, and the cloud. Style and approach This book takes a unique strategy-based approach, executing them on real-world scenarios. You will be introduced to thinking in terms of "game plans," which are essential to succeeding in analyzing evidence and conducting investigations.
Release

Talend Open Studio Cookbook

Author: Rick Barton

Publisher: Packt Publishing Ltd

ISBN: 1782167277

Category: Computers

Page: 270

View: 3399

Primarily designed as a reference book, simple and effective exercises based upon genuine real-world tasks enable the developer to reduce the time to deliver the results. Presentation of the activities in a recipe format will enable the readers to grasp even the complex concepts with consummate ease.Talend Open Studio Cookbook is principally aimed at relative beginners and intermediate Talend Developers who have used the product to perform some simple integration tasks, possibly via a training course or beginner's tutorials.
Release