by Alan F Gates
Publisher: O'Reilly Media 2011
Number of pages: 344
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Home page url
Download or read it online for free here:
by Ian Robinson, Jim Webber, Emil Eifrem - O'Reilly Media
Graph Databases, published by O'Reilly Media, discusses the problems that are well aligned with graph databases, with examples drawn from practical, real-world use cases. This book also looks at the ecosystem of complementary technologies.
by Lars George - O'Reilly Media
If you are looking for a solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. HBase scales to billions of rows and columns, while ensuring that performance remain constant.
by Eric Redmond - GitBook
This is a free little book about Riak, a scalable, high availability NoSQL datastore. Riak is an open-source, distributed key/value database for high availability and near-linear scalability. Riak has remarkably high uptime and grows with you.
by Open Knowledge Foundation - School of Data
The Data Wrangling Handbook is a companion text to the School of Data. Its function is something like a traditional textbook -- it will provide the detail and background theory to support the School of Data courses and challenges.