Today we are celebrating the opening of the Datasalt blog and its Spanish version. Let’s start by talking a little bit about Datasalt and what we do.
Firstly, we need to discuss today’s context. Information is currently experiencing an explosion. New devices equipped with sensors (mobile phones, antennas, telescopes, cameras, etc.) are constantly being created, thus generating massive amounts of data. Moreover, companies strive to keep track of all the interaction / operations they have with their customers, monitoring their behavior as closely as possible. Clicks on the web as well as all the transactions and purchases performed by customers or users are registered. In short, companies have more data than ever and they are aware that they must use that information to become more competitive.
However, the usual techniques of data analysis (Data Mining), such as Enterprise Data Warehouse (EDW) and Business Intelligence (BI), are inadequate for analysis and extraction of such enormous amounts of data. First, they are inadequate because of their inability to process all of the data, making it necessary to do “sampling” in order to reduce the size of the problem. Second, these solutions are efficient in the processing of structured information, but not when the information is unstructured. And most of the data generated today is unstructured.
This has given rise to the concept of Big Data, which refers to data that, due to their size and nature, are beyond the scope of the usual techniques of data analysis (EDW and BI). Companies have a growing need to extract value from information to remain competitive. This is the reason why interest in Big Data is growing dramatically, as shown in the articles published in a special report entitled “Data, data everywhere” that The Economist recently devoted to this issue (the one entitled “A different game” is also interesting).
New techniques have emerged, capable of exploiting the Big Data and transforming them into value for companies. Among them we should highlight Hadoop, a platform for distributed data analysis, capable of processing large amounts of data through the use of a set of parallel machines. Other solutions worth noting are NoSQL distributed databases. These databases feature the ability to handle larger amounts of data in exchange for lowering the requirements with respect to any RDBMS (hence the name “no SQL”). Here we could highlight Cassandra, HBase and MongoDB. Solr provides a search engine that is also very useful. All these techniques make a perfect tandem with the new cloud computing platforms (Amazon, Rackspace, etc). If you want to find out more about this subject, you can check the Gartner report entitled “Hadoop and MapReduce: Big Data Analytics.”
And this is where Datasalt comes in, as a company that specializes in the field of Big Data by developing new products and providing services. We offer solutions for the extraction of value (Data Mining) from large data sets, such as records of interaction with customers, data logs generated by applications, the information captured from the web and social networks, data from mobile devices, etc. We also have systems of aggregation and search within large data sets.
We’ll be publishing news, methods and events related to Big Data and Datasalt in this blog. See you soon!