Archive for October, 2012

A richer database spout for Big Data

In this post we will review the current approaches there are for servicing Big Data, that is, for being able to process an arbitrary number of queries with sub-second latencies in a scalable cluster of machines over a huge dataset and under high load.

Think Twitter, Facebook, Linkedin. Think servicing Hadoop-generated datasets.

What are the possibilities that the open-source world gives us for building a website whose queries impact such a huge dataset? What are the most common problems we might encounter in such a scenario and how well do these tools solve this problem?

Then, we will propose a new architecture that provides a scalable yet rich solution for this problem.
Read more…