Archive for April, 2013

A practical Storm’s Trident API Overview

On the 10th of April Pere gave a Trident hackaton at Berlin’s Big Data Beers. There was also a parallel Disco hackaton by Dave from Continuum Analytics. Es war viel spaß! The people who came had the chance to learn the basics of Trident in the Storm session while trying it right away. The hackaton covered the most basic aspects of the API, the philosophy and typical use cases for Storm and included a simple exercise that manipulated a stream of fake tweets. The project with the session guideline, some runnable examples and the tweet generator can be found on github.


In this post we will see an introductory overview of Trident’s API that can be followed with the help of the aforementioned github project.

Read more…

Presenting Splout Cloud: a managed web-latency SQL querying engine in the cloud

We have created Splout Cloud, a web-latency managed service in the AWS cloud. Simply put, Splout Cloud converts any data files – regardless of their size – into a scalable, partitioned SQL querying engine. And unlike offline analytics engines, it is highly performant, fast enough so as to be able to provide sub-second queries and real-time aggregations to feed arbitrary web or mobile applications.


Splout Cloud is based on Splout SQL, an open-source SQL database for Hadoop.

Read more…

Pig + Splout SQL for a retail coupon generator: A Big Data love story

(This is the last post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig).

In this post we’ll present an example Big Data use case for a big food retail store. We will summarize individual client’s purchases using Apache Pig and we will dump the analysis into Splout SQL for being able to query it in real-time. Then, we will be able to combine the summarized information with a list of promotional products and suggest discounts on particular products for every client, in real-time. This information could be easily used by a coupon printing system for increasing client loyalty.

Combining an agile Big Data processing tool such as Pig with a flexible, low-latency SQL querying system such as Splout SQL provides a simple yet effective solution to this problem, which we will be able to simulate through this post almost with no effort.

Read more…