Programming Pig
by Alan F Gates
Publisher: O'Reilly Media 2011
Number of pages: 222
Description:
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Download or read it online for free here:
Download link
(6.4MB, PDF)
Similar books
Graph Databases
by Ian Robinson, Jim Webber, Emil Eifrem - O'Reilly Media
Graph Databases, published by O'Reilly Media, discusses the problems that are well aligned with graph databases, with examples drawn from practical, real-world use cases. This book also looks at the ecosystem of complementary technologies.
(10508 views)
by Ian Robinson, Jim Webber, Emil Eifrem - O'Reilly Media
Graph Databases, published by O'Reilly Media, discusses the problems that are well aligned with graph databases, with examples drawn from practical, real-world use cases. This book also looks at the ecosystem of complementary technologies.
(10508 views)
Text Mining with R: A Tidy Approach
by Julia Silge, David Robinson - O'Reilly Media
With this practical book, you'll explore text-mining techniques with tidytext, a package that authors developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext can make text analysis easy and effective.
(5326 views)
by Julia Silge, David Robinson - O'Reilly Media
With this practical book, you'll explore text-mining techniques with tidytext, a package that authors developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext can make text analysis easy and effective.
(5326 views)
A Little Riak Book
by Eric Redmond - GitBook
This is a free little book about Riak, a scalable, high availability NoSQL datastore. Riak is an open-source, distributed key/value database for high availability and near-linear scalability. Riak has remarkably high uptime and grows with you.
(8447 views)
by Eric Redmond - GitBook
This is a free little book about Riak, a scalable, high availability NoSQL datastore. Riak is an open-source, distributed key/value database for high availability and near-linear scalability. Riak has remarkably high uptime and grows with you.
(8447 views)
CouchDB: The Definitive Guide
by J. C. Anderson, J. Lehnardt, N. Slater - O'Reilly Media
CouchDB's creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications. CouchDB is ideal for web applications that handle huge amounts of loosely structured data.
(10986 views)
by J. C. Anderson, J. Lehnardt, N. Slater - O'Reilly Media
CouchDB's creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications. CouchDB is ideal for web applications that handle huge amounts of loosely structured data.
(10986 views)