Archive for March, 2013

Hive + Splout SQL for a social media reporting webapp: A Big Data love story

(This is the second post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig).

In this post we’ll present an example Big Data use case for analyzing tweets and reporting consolidated, meaningful statistics to Twitter users through an interactive, low-latency webapp. For that we will marry Hive (an open-source warehousing solution for Hadoop that enables easy analysis and summarization over Hadoop) with Splout SQL, a highly-performant, low-latency partitioned SQL for Hadoop. We will build a very simple – yet scalable – analysis tool such as the Tweet archivist, and we will do it without even coding. The tool will provide historical mentions, summarized hashtags and popular tweets for every actor in the input dataset.

Read more…


Cascading + Splout SQL for log analysis and serving: A Big Data love story

(This is the first post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig).

In this post we’ll present an example Big Data use case for analyzing and indexing a large amount of Apache logs from an e-Commerce website, and being able to serve them in a low-latency “customer service” web application that needs fine-grained, detailed per-user information for troubleshooting and performing “loyalty campaigns”. For that we will marry Cascading, an agile Java high-level Hadoop framework with Splout SQL, a highly-performant, low-latency partitioned SQL for Hadoop. We’ll see how to develop a solution which is totally scalable both in processing and serving and develop it in barely 200 lines. We’ll also provide a simple JavaScript frontend that will use Splout SQL’s REST API via jQuery.

Read more…