Kafka: The Definitive Guide

Real-Time Data and Stream Processing at Scale

Author: Neha Narkhede,Gwen Shapira,Todd Palino

Publisher: "O'Reilly Media, Inc."

ISBN: 1491936118

Category: Computers

Page: 322

View: 8218


Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you’re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds. Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. Understand publish-subscribe messaging and how it fits in the big data ecosystem. Explore Kafka producers and consumers for writing and reading messages Understand Kafka patterns and use-case requirements to ensure reliable data delivery Get best practices for building data pipelines and applications with Kafka Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks Learn the most critical metrics among Kafka’s operational measurements Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems

Kafka Streams in Action

Real-time apps and microservices with the Kafka Streams API

Author: Bill Bejeck

Publisher: Manning Publications

ISBN: 9781617294471

Category: Computers

Page: 280

View: 7691


Summary Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort. Foreword by Neha Narkhede, Cocreator of Apache Kafka Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Not all stream-based applications require a dedicated processing cluster. The lightweight Kafka Streams library provides exactly the power and simplicity you need for message handling in microservices and real-time event processing. With the Kafka Streams API, you filter and transform data streams with just Kafka and your application. About the Book Kafka Streams in Action teaches you to implement stream processing within the Kafka platform. In this easy-to-follow book, you'll explore real-world examples to collect, transform, and aggregate data, work with multiple processors, and handle real-time events. You'll even dive into streaming SQL with KSQL! Practical to the very end, it finishes with testing and operational aspects, such as monitoring and debugging. What's inside Using the KStreams API Filtering, transforming, and splitting data Working with the Processor API Integrating with external systems About the Reader Assumes some experience with distributed systems. No knowledge of Kafka or streaming applications required. About the Author Bill Bejeck is a Kafka Streams contributor and Confluent engineer with over 15 years of software development experience. Table of Contents PART 1 - GETTING STARTED WITH KAFKA STREAMS Welcome to Kafka Streams Kafka quicklyPART 2 - KAFKA STREAMS DEVELOPMENT Developing Kafka Streams Streams and state The KTable API The Processor APIPART 3 - ADMINISTERING KAFKA STREAMS Monitoring and performance Testing a Kafka Streams applicationPART 4 - ADVANCED CONCEPTS WITH KAFKA STREAMS Advanced applications with Kafka StreamsAPPENDIXES Appendix A - Additional configuration information Appendix B - Exactly once semantics

Spark: The Definitive Guide

Big Data Processing Made Simple

Author: Bill Chambers,Matei Zaharia

Publisher: "O'Reilly Media, Inc."

ISBN: 1491912308


Page: 608

View: 5530


Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Spark’s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

Architecting Modern Data Platforms

A Guide to Enterprise Hadoop at Scale

Author: Jan Kunigk,Ian Buss,Paul Wilkinson,Lars George

Publisher: O'Reilly Media

ISBN: 1491969245

Category: Computers

Page: 636

View: 7921


There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, you’ll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform. Ideal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. You’ll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into: Infrastructure: Look at all component layers in a modern data platform, from the server to the data center, to establish a solid foundation for data in your enterprise Platform: Understand aspects of deployment, operation, security, high availability, and disaster recovery, along with everything you need to know to integrate your platform with the rest of your enterprise IT Taking Hadoop to the cloud: Learn the important architectural aspects of running a big data platform in the cloud while maintaining enterprise security and high availability

Practical Hadoop Ecosystem

A Definitive Guide to Hadoop-Related Frameworks and Tools

Author: Deepak Vohra

Publisher: Apress

ISBN: 1484221990

Category: Computers

Page: 421

View: 7511


Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout, and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform. What You Will Learn: Set up the environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5 Run a MapReduce job Store data with Apache Hive, and Apache HBase Index data in HDFS with Apache Solr Develop a Kafka messaging system Stream Logs to HDFS with Apache Flume Transfer data from MySQL database to Hive, HDFS, and HBase with Sqoop Create a Hive table over Apache Solr Develop a Mahout User Recommender System Who This Book Is For: Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

The Metamorphosis Thrift Study Edition

Author: Franz Kafka

Publisher: Courier Corporation

ISBN: 0486112683

Category: Fiction

Page: 128

View: 816


Includes the unabridged text of Kafka's classic novella plus a complete study guide that features chapter-by-chapter summaries, explanations and discussions of the plot, question-and-answer sections, author biography, historical background, and more.

Through the Lens of the Reader

Explorations of European Narrative

Author: Lilian R. Furst

Publisher: SUNY Press

ISBN: 9780791408087

Category: Literary Criticism

Page: 186

View: 9994


Through the Lens of the Reader is a sequence of ten essays exploring European narrative from the eighteenth to the twentieth century. It covers a wide spectrum of authors ranging from Goethe through Balzac, Flaubert, Zola, George Eliot, Henry James to Rilke, Thomas Mann, and Kafka. The essays are unified by a particular mode of reading, in which the lens of the reader becomes the filter through which texts are constructed in accordance with the signals emitted by their narrational and linguistic strategies.

The Book Buyer's Advisor

The Definitive Guide to Discovering the Year's Best Books

Author: Americal Library Association's Booklist Magazine,Bill Ott

Publisher: Triumph Books (IL)


Category: Best books

Page: 397

View: 4570



The Rough Guide to Berlin

Author: John Gawthrop

Publisher: N.A

ISBN: 9781843532439

Category: Travel

Page: 368

View: 8254


Fully revised and updated, this 7th edition provides entertaining coverage of all the city''s attractions from the powerful Richstag and world-class museums to cutting edge galleries and the latest on the lively club scene. With critical listings of the best places to eat, drink, sleep and party – for all budgets – the guide gets under the skin of this dynamic city. There is practical advice on trips out of the city including Potsdam and Park Sanssouci. Finally, the contexts section includes informed coverage of the city''s turbulent history.

Learning Apache Flink

Author: Tanmay Deshpande

Publisher: Packt Publishing Ltd

ISBN: 1786467267

Category: Computers

Page: 280

View: 3347


Discover the definitive guide to crafting lightning-fast data processing for distributed systems with Apache Flink About This Book Build your expertize in processing real-time data with Apache Flink and its ecosystem Gain insights into the working of all components of Apache Flink such as FlinkML, Gelly, and Table API filled with real world use cases Exploit Apache Flink's capabilities like distributed data streaming, in-memory processing, pipelining and iteration operators to improve performance. Solve real world big-data problems with real time in-memory and disk-based processing capabilities of Apache Flink. Who This Book Is For Big data developers who are looking to process batch and real-time data on distributed systems. Basic knowledge of Hadoop and big data is assumed. Reasonable knowledge of Java or Scala is expected. What You Will Learn Learn how to build end to end real time analytics projects Integrate with existing big data stack and utilize existing infrastructure Build predictive analytics applications using FlinkML Use graph library to perform graph querying and search. Understand Flink's - "Streaming First" architecture to implementing real streaming applications Learn Flink Logging and Monitoring best practices in order to efficiently design your data pipelines Explore the detailed processes to deploy Flink cluster on Amazon Web Services(AWS) and Google Cloud Platform (GCP). In Detail With the advent of massive computer systems, organizations in different domains generate large amounts of data on a real-time basis. The latest entrant to big data processing, Apache Flink, is designed to process continuous streams of data at a lightning fast pace. This book will be your definitive guide to batch and stream data processing with Apache Flink. The book begins with introducing the Apache Flink ecosystem, setting it up and using the DataSet and DataStream API for processing batch and streaming datasets. Bringing the power of SQL to Flink, this book will then explore the Table API for querying and manipulating data. In the latter half of the book, readers will get to learn the remaining ecosystem of Apache Flink to achieve complex tasks such as event processing, machine learning, and graph processing. The final part of the book would consist of topics such as scaling Flink solutions, performance optimization and integrating Flink with other tools such as ElasticSearch. Whether you want to dive deeper into Apache Flink, or want to investigate how to get more out of this powerful technology, you'll find everything you need inside. Style and approach This book is a comprehensive guide that covers advanced features of the Apache Flink, and communicates them with a practical understanding of the underlying concepts for how, when, and why to use them.