Hadoop in Action

Author: Chuck Lam,Mark Davis,Ajit Gaddam

Publisher: Manning Publications

ISBN: 9781617291227

Category:

Page: 525

View: 9832

The massive datasets required for most modern businesses are too large to safely store and efficiently process on a single server. Hadoop is an open source data processing framework that provides a distributed file system that can manage data stored across clusters of servers and implements the MapReduce data processing model so that users can effectively query and utilize big data. The new Hadoop 2.0 is a stable, enterprise-ready platform supported by a rich ecosystem of tools and related technologies such as Pig, Hive, YARN, Spark, Tez, and many more. Hadoop in Action, Second Edition, provides a comprehensive introduction to Hadoop and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show how Hadoop can be used in more complex data analysis tasks. It covers how YARN, new in Hadoop 2, simplifies and supercharges resource management to make streaming and real-time applications more feasible. Included are best practices and design patterns of MapReduce programming. The book expands on the first edition by enhancing coverage of important Hadoop 2 concepts and systems, and by providing new chapters on data management and data science that reinforce a practical understanding of Hadoop. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
Release

Big Data Analytics with R and Hadoop

Author: Vignesh Prajapati

Publisher: Packt Publishing Ltd

ISBN: 1782163298

Category: Computers

Page: 238

View: 8944

Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.
Release

Professional Hadoop Solutions

Author: Boris Lublinsky,Kevin T. Smith,Alexey Yakubovich

Publisher: John Wiley & Sons

ISBN: 1118824180

Category: Computers

Page: 504

View: 7748

The go-to guidebook for deploying Big Data solutions withHadoop Today's enterprise architects need to understand how the Hadoopframeworks and APIs fit together, and how they can be integrated todeliver real-world solutions. This book is a practical, detailedguide to building and implementing those solutions, with code-levelinstruction in the popular Wrox tradition. It covers storing datawith HDFS and Hbase, processing data with MapReduce, and automatingdata processing with Oozie. Hadoop security, running Hadoop withAmazon Web Services, best practices, and automating Hadoopprocesses in real time are also covered in depth. With in-depth code examples in Java and XML and the latest onrecent additions to the Hadoop ecosystem, this complete resourcealso covers the use of APIs, exposing their inner workings andallowing architects and developers to better leverage and customizethem. The ultimate guide for developers, designers, and architectswho need to build and deploy Hadoop applications Covers storing and processing data with various technologies,automating data processing, Hadoop security, and deliveringreal-time solutions Includes detailed, real-world examples and code-levelguidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in theprogrammer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprisearchitects and developers need to maximize the power of Hadoop.
Release

Hadoop Operations

Author: Eric Sammer

Publisher: "O'Reilly Media, Inc."

ISBN: 1449327052

Category: Computers

Page: 282

View: 8652

For system administrators tasked with the job of maintaining large and complex Hadoop clusters, this book explains the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.
Release

MongoDB Cookbook

Author: Amol Nayak

Publisher: Packt Publishing Ltd

ISBN: 1782161953

Category: Computers

Page: 388

View: 7122

If you want a reference to show you practical solutions, or just want to satisfy your need for more knowledge of this fantastic NoSQL database, then this book is ideal for you. To get the most out of this book, you should know the basics of MongoDB.
Release

Pig in Action

Munging Big Data

Author: M. Tim Jones

Publisher: Manning Publications Company

ISBN: 9781617291586

Category: Computers

Page: 325

View: 5927

It's notoriously difficult to query Hadoop data using standard Map/Reduce programming techniques. Pig and the Pig Latin scripting language provide a SQL-like platform that simplifies query construction against data sets in Hadoop, eases the obstacle of Map/Reduce, and opens the door to processing large data sets for casual users, including experimentation on data sets. And it stands up well under stress—Yahoo uses Pig for over half the queries it runs on the world's largest Hadoop cluster. Pig in Action introduces Pig and the Pig Latin language while teaching the fundamentals of big data processing. Readers will explore the intersection of business and data science as they walk through practical questions like executing standard queries, establishing automated data management processes and policies, and developing useful reports. Most importantly, they'll learn techniques to extract valuable insights from data while mastering the features of Pig.
Release

Hadoop Real-World Solutions Cookbook

Author: Jonathan R. Owens,Brian Femiano,Jon Lentz

Publisher: Packt Publishing Ltd

ISBN: 1849519137

Category: Computers

Page: 316

View: 4952

Realistic, simple code examples to solve problems at scale with Hadoop and related technologies.
Release

Big Data in der Praxis

Beispiellösungen mit Hadoop und NoSQL. Daten speichern, aufbereiten, visualisieren

Author: Jonas Freiknecht

Publisher: Carl Hanser Verlag GmbH Co KG

ISBN: 3446441778

Category: Computers

Page: 448

View: 770

BIG DATA IN DER PRAXIS // - Für Analysten, BI-Verantwortliche, Data-Scientists, Consultants - Auf der DVD finden Sie: 18 fertige Projekte, die im Buch Schritt für Schritt entwickelt werden; Videotutorials u.a. zur Installation von Hadoop, Hive, HBase (Gesamtdauer: 80 Min.); Testdatensätze für die Wissensdatenbank Dieses Buch bringt Ihnen das Thema Big Data auf sehr praktische Art und Weise nahe. Sie lernen Technologien, Tools und Methoden kennen, entwickeln Beispiel-Lösungen und bekommen aufgezeigt, wie Sie bestehende Systeme vorausschauend auf die mit dem Big Data-Trend einhergehenden Herausforderungen vorbereiten. Dazu werden Sie neben den bekannten Apache-Projekten wie Hadoop, Hive und HBase auch einige weniger bekannte Frameworks wie Apache UIMA oder Apache OpenNLP kennenlernen, um gezielt die Verarbeitung unstrukturierter Daten zu behandeln. Alle hier verwendeten Software-Komponenten stehen im vollen Umfang kostenlos im Internet zur Verfügung. Gemeinsam mit dem Autor werden Sie ganz konkret Schritt für Schritt viele kleinere Projekte aufbauen bis hin zu einer fertigen und funktionstüchtigen Implementierung. Ziel des Buches ist es, Sie auf den Effekt und den Mehrwert der neuen Möglichkeiten aufmerksam zu machen, sodass Sie diese konstruktiv in Ihr Unternehmen tragen können und für sich und Ihre Kollegen somit ein Bewusstsein für den Wert Ihrer Daten schaffen. AUS DEM INHALT // Einführung rund um Big Data // Hadoop installieren, konfigurieren & bedienen // HDFS, Map-Reduce & YARN: Daten speichern und verarbeiten // Hadoop-Ecosystem: Überblick über dessen Komponenten // Einführung in NoSQL // HBase installieren, einrichten & auf Daten zugreifen // Data-Warehousing mit Apache Hive // HiveQL als Abfragesprache, Hive Security, Hive & JDBC // Datenimport aus relationalen Datenbanken mit Sqoop // Big Data-Visualisierung: Diagrammarten, Tipps & Trends // Visualisierungs-Frameworks im Vergleich // D3.js: Entwicklung einiger Beispieldiagramme // Entwicklung einer abschließenden Big Data-Analyse-Lösung // Troubleshooting für die Arbeit mit Hadoop, Hive & HBase
Release

Spring im Einsatz

Author: Craig Walls

Publisher: Carl Hanser Verlag GmbH Co KG

ISBN: 3446429468

Category: Computers

Page: 428

View: 314

SPRING IM EINSATZ // - Spring 3.0 auf den Punkt gebracht: Die zentralen Konzepte anschaulich und unterhaltsam erklärt. - Praxis-Know-how für den Projekteinsatz: Lernen Sie Spring mit Hilfe der zahlreichen Codebeispiele aktiv kennen. - Im Internet: Der vollständige Quellcode für die Applikationen dieses Buches Das Spring-Framework gehört zum obligatorischen Grundwissen eines Java-Entwicklers. Spring 3 führt leistungsfähige neue Features wie die Spring Expression Language (SpEL), neue Annotationen für IoC-Container und den lang ersehnten Support für REST ein. Es gibt keinen besseren Weg, um sich Spring anzueignen, als dieses Buch - egal ob Sie Spring gerade erst entdecken oder sich mit den neuen 3.0-Features vertraut machen wollen. Craig Walls setzt in dieser gründlich überarbeiteten 2. Auflage den anschaulichen und praxisorientierten Stil der Vorauflage fort. Er bringt als Autor sein Geschick für treffende und unterhaltsame Beispiele ein, die das Augenmerk direkt auf die Features und Techniken richten, die Sie wirklich brauchen. Diese Auflage hebt die wichtigsten Aspekte von Spring 3.0 hervor: REST, Remote-Services, Messaging, Security, MVC, Web Flow und vieles mehr. Das finden Sie in diesem Buch: - Die Arbeit mit Annotationen, um die Konfiguration zu reduzieren - Die Arbeit mit REST-konformen Ressourcen - Spring Expression Language (SpEL) - Security, Web Flow usw. AUS DEM INHALT: Spring ins kalte Wasser, Verschalten von Beans, Die XML-Konfiguration in Spring minimalisieren, Aspektorientierung, Zugriff auf die Datenbank, Transaktionen verwalten, Webapplikationen mit Spring MVC erstellen, Die Arbeit mit Spring Web Flow, Spring absichern, Die Arbeit mit Remote-Diensten, Spring und REST, Messaging in Spring, Verwalten von Spring-Beans mit JMX
Release

Big Data

Die Revolution, die unser Leben verändern wird

Author: Viktor Mayer-Schönberger,Viktor; Cukier Mayer-Schönberger

Publisher: Redline Wirtschaft

ISBN: 3864144590

Category: Political Science

Page: 288

View: 5186

Ob Kaufverhalten, Grippewellen oder welche Farbe am ehesten verrät, ob ein Gebrauchtwagen in einem guten Zustand ist – noch nie gab es eine solche Menge an Daten und noch nie bot sich die Chance, durch Recherche und Kombination in der Daten¬flut blitzschnell Zusammenhänge zu entschlüsseln. Big Data bedeutet nichts weniger als eine Revolution für Gesellschaft, Wirtschaft und Politik. Es wird die Weise, wie wir über Gesundheit, Erziehung, Innovation und vieles mehr denken, völlig umkrempeln. Und Vorhersagen möglich machen, die bisher undenkbar waren. Die Experten Viktor Mayer-Schönberger und Kenneth Cukier beschreiben in ihrem Buch, was Big Data ist, welche Möglichkeiten sich eröffnen, vor welchen Umwälzungen wir alle stehen – und verschweigen auch die dunkle Seite wie das Ausspähen von persönlichen Daten und den drohenden Verlust der Privatsphäre nicht.
Release

Hadoop Beginner's Guide

Author: Garry Turkington

Publisher: Packt Publishing Ltd

ISBN: 1849517304

Category: Computers

Page: 398

View: 2284

Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills. "Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems. Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems. While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection. In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.
Release

Web Scalability for Startup Engineers

Author: Artur Ejsmont

Publisher: McGraw Hill Professional

ISBN: 0071843663

Category: Computers

Page: 432

View: 2529

This invaluable roadmap for startup engineers reveals how to successfully handle web application scalability challenges to meet increasing product and traffic demands. Web Scalability for Startup Engineers shows engineers working at startups and small companies how to plan and implement a comprehensive scalability strategy. It presents broad and holistic view of infrastructure and architecture of a scalable web application. Successful startups often face the challenge of scalability, and the core concepts driving a scalable architecture are language and platform agnostic. The book covers scalability of HTTP-based systems (websites, REST APIs, SaaS, and mobile application backends), starting with a high-level perspective before taking a deep dive into common challenges and issues. This approach builds a holistic view of the problem, helping you see the big picture, and then introduces different technologies and best practices for solving the problem at hand. The book is enriched with the author's real-world experience and expert advice, saving you precious time and effort by learning from others' mistakes and successes. Language-agnostic approach addresses universally challenging concepts in Web development/scalability—does not require knowledge of a particular language Fills the gap for engineers in startups and smaller companies who have limited means for getting to the next level in terms of accomplishing scalability Strategies presented help to decrease time to market and increase the efficiency of web applications
Release

Hadoop Application Architectures

Designing Real-World Big Data Applications

Author: Mark Grover,Ted Malaska,Jonathan Seidman,Gwen Shapira

Publisher: "O'Reilly Media, Inc."

ISBN: 1491900059

Category: Computers

Page: 400

View: 9002

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case. To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process. This book covers: Factors to consider when using Hadoop to store and model data Best practices for moving data in and out of the system Data processing frameworks, including MapReduce, Spark, and Hive Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics Giraph, GraphX, and other tools for large graph processing on Hadoop Using workflow orchestration and scheduling tools such as Apache Oozie Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume Architecture examples for clickstream analysis, fraud detection, and data warehousing
Release

Mehr Hacking mit Python

Eigene Tools entwickeln für Hacker und Pentester

Author: Justin Seitz

Publisher: dpunkt.verlag

ISBN: 3864917530

Category: Computers

Page: 182

View: 1729

Wenn es um die Entwicklung leistungsfähiger und effizienter Hacking-Tools geht, ist Python für die meisten Sicherheitsanalytiker die Sprache der Wahl. Doch wie genau funktioniert das? In dem neuesten Buch von Justin Seitz - dem Autor des Bestsellers »Hacking mit Python« - entdecken Sie Pythons dunkle Seite. Sie entwickeln Netzwerk-Sniffer, manipulieren Pakete, infizieren virtuelle Maschinen, schaffen unsichtbare Trojaner und vieles mehr. Sie lernen praktisch, wie man • einen »Command-and-Control«-Trojaner mittels GitHub schafft • Sandboxing erkennt und gängige Malware-Aufgaben wie Keylogging und Screenshotting automatisiert • Windows-Rechte mittels kreativer Prozesskontrolle ausweitet • offensive Speicherforensik-Tricks nutzt, um Passwort-Hashes abzugreifen und Shellcode in virtuelle Maschinen einzuspeisen • das beliebte Web-Hacking-Tool Burp erweitert • die Windows COM-Automatisierung nutzt, um einen Man-in-the-Middle-Angriff durchzuführen • möglichst unbemerkt Daten aus einem Netzwerk abgreift Eine Reihe von Insider-Techniken und kreativen Aufgaben zeigen Ihnen, wie Sie die Hacks erweitern und eigene Exploits entwickeln können.
Release

Data Science für Dummies

Author: Lillian Pierson

Publisher: John Wiley & Sons

ISBN: 352780675X

Category: Mathematics

Page: 382

View: 7548

Daten, Daten, Daten? Sie haben schon Kenntnisse in Excel und Statistik, wissen aber noch nicht, wie all die Datensätze helfen sollen, bessere Entscheidungen zu treffen? Von Lillian Pierson bekommen Sie das dafür notwendige Handwerkszeug: Bauen Sie Ihre Kenntnisse in Statistik, Programmierung und Visualisierung aus. Nutzen Sie Python, R, SQL, Excel und KNIME. Zahlreiche Beispiele veranschaulichen die vorgestellten Methoden und Techniken. So können Sie die Erkenntnisse dieses Buches auf Ihre Daten übertragen und aus deren Analyse unmittelbare Schlüsse und Konsequenzen ziehen.
Release

Hadoop MapReduce v2 Cookbook - Second Edition

Author: Thilina Gunarathne

Publisher: Packt Publishing Ltd

ISBN: 1783285486

Category: Computers

Page: 322

View: 3028

If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.
Release

Learning Hadoop 2

Author: Garry Turkington,Gabriele Modena

Publisher: Packt Publishing Ltd

ISBN: 1783285524

Category: Computers

Page: 382

View: 6086

If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.
Release

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Author: Steve Hoffman

Publisher: Packt Publishing Ltd

ISBN: 1784399140

Category: Computers

Page: 178

View: 7205

If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.
Release

Hadoop: The Definitive Guide

The Definitive Guide

Author: Tom White

Publisher: "O'Reilly Media, Inc."

ISBN: 9780596551360

Category: Computers

Page: 528

View: 8688

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!
Release

Big Data

Architettura, tecnologie 
e metodi per l’utilizzo 
di grandi basi di dati

Author: Alessandro Rezzani

Publisher: Maggioli Editore

ISBN: 8838789894

Category: Business & Economics

Page: 320

View: 692

Ogni giorno nel mondo vengono creati miliardi di dati digitali. Questa mole di informazione proviene dal notevole incremento di dispositivi che automatizzano numerose operazioni – record delle transazioni di acquisto e segnali GPS dei cellulari, per esempio – e dal Web: foto, video, post, articoli e contenuti digitali generati e diffusi dagli utenti tramite i social media. L’elaborazione di questi “big data” richiede elevate capacità di calcolo, tecnologie e risorse che vanno ben al di là dei sistemi convenzionali di gestione e immagazzinamento dei dati. Il testo esplora il mondo dei “grandi dati” e ne offre una descrizione e classificazione, presentando le opportunità che possono derivare dal loro utilizzo. Descrive le soluzioni software e hardware dedicate, riservando ampio spazio alle implementazioni Open Source e alle principali offerte cloud. Si propone dunque come una guida approfondita agli strumenti e alle tecnologie che permettono l’analisi e la gestione di grandi quantità di dati. Il volume è dedicato a chi, in università e in azienda (database administrator, IT manager, professionisti di Business Intelligence) intende approfondire le tematiche relative ai big data. È, inoltre, un valido supporto per il management aziendale per comprendere come ottenere informazioni utilizzabili nei processi decisionali. Alessandro Rezzani insegna presso l’Università Bocconi di Milano. È esperto di progettazione e implementazione di Data Warehouse, di processi ETL, database multidimensionali e soluzioni di reporting. Attualmente si occupa di disegno e implementazione di soluzioni di Business Intelligence presso Factory Software. Con Apogeo Education ha pubblicato “Business Intelligence. Processi, metodi, utilizzo in azienda”, 2012.
Release