Splout provides a SQL view over your Big Data with sub-second latencies and high throughput.
Big Data Serving
There are many Big Data problems whose output is also Big Data. Splout allows serving an arbitrarily big dataset by partitioning it.
There are many databases that allow serving Big Data such as NoSQL solutions, but they don’t have a rich query language like SQL. You generally can’t aggregate data in real-time like you would do with a GROUP BY clause. Because you can’t precompute everything, SQL is a very convenient feature to have in a Big Data Serving solution.
Hadoop is nowadays the de-facto open-source solution for Big Data batch-processing. When the output of a Hadoop process is big, there isn’t a satisfying solution for serving it. Think of pre-computed recommendations, for example, where the whole dataset may vary from one day to another. Splout decouples database creation from database serving and makes it efficient and safe to deploy Hadoop-generated datasets.
Splout is not a “fast analytics” engine. Splout is made for demanding web or mobile applications where query performance is critical. Arbitrary real-time aggregations should be done in less than 200 milliseconds under high traffic load.
Splout scales horizontally. By adding more machines you can increase throughput linearly. Splout coordinates a cluster of machines to provide fail-over in case of network splits or hardware corruption.
Even though Splout is relational, it is very flexible. Because data is deployed atomically, you can change your data model from one day to another without pain.
Splout serves tablespaces. Each tablespace may have one or more tables. Tables are either partitioned or replicated in every partition. By using command-line tools you can index and deploy any dataset in your HDFS, local system or remote S3 file system. You can also use the advanced Java API for fine-tuning the whole process.
Splout provides a REST interface that return JSON to any SQL query.
|The Splout SQL webpage|