Datasalt has ample knowledge of the Big Data field, distributed systems, scalability, search engines and web crawling and web mining. Our philosophy is based on ongoing knowledge recycling and intense research into new techniques, together with the consolidation of the best practices learned. The numerous technologies upon which Datasalt’s solutions are based include:
Take the Tour
Hadoop and its ecosystem
Google revolutionized the world of mass parallel processing systems (MPP) with the invention of the MapReduce paradigm, developed to manage and process the huge amounts of data aggregated to its search engine. Hadoop is an open-source implementation that opens the doors of the MapReduce paradigm to the rest of the industry. Hadoop is mature technology, used by hundreds of companies including Facebook and Yahoo, with clusters of more than 4000 nodes. Hadoop is the technology behind the concept of Big Data.
Spark and its ecosystem
Spark is an innovative Big Data computing framework originally developed in the University of Berkeley. It provides a flexible programming model and makes better use of modern hardware (i.e. memory). Spark represents a step further as a technology and we envision it as the reference Big Data technology for the next years, already being adopted by several companies.
Real-time stream processing
Nowadays, being able to provide quick answers to Big Data processing is becoming more and more valuable. Real-time processing systems such as Storm or Spark Streaming, together with scalable queuing systems such as Kafka provide the necessary means to cleaning, pre-aggregating and unlocking the value in streams of massive amounts of events that must be handled and processed in a scalable and realiable way. Many companies are already powering their solutions with efficient real-time stream processing using Storm.
Big Data technologies are slowly closing the gap between what was possible in the past and what is possible nowadays. Nowadays it is possible to integrate Big Data applications with Big Data querying systems that are able to deliver quick insights and fast complex analysis by familiar means (SQL).
The trend of NoSQL databases arose due to the limitations regarding scalability and flexibility of relational databases. NoSQL databases leave out certain requirements such as ACID guarantees, fixed table schemas, secondary indexes or foreign keys, to gain in flexibility and scalability. They do not usually support SQL queries, hence the name “no SQL”. NoSQL databases prove to be useful in developing scalable applications and for storage of big data.
Powerful, simple methods for enquiring and extracting information from the Big Data archives are needed. Search engines based on inverted indexes enable one to search among the deluge of information simply and quickly. At Datasalt, we rely on the strength of Lucene search systems such as ElasticSearch and Solr.