Datasalt has released Pangool, a Java implementation of the new Tuple MapReduce distributed processing paradigm that simplifies the development of Big Data applications for Hadoop. Pangool is free software that makes the development of highly efficient Big Data applications fast and simple, thanks to its native support of the most common development patterns: joins, sorting, secondary and compound registers.

Developing efficient applications for Hadoop is no easy task. Its Java API is too complex, especially for the most common development patterns. Pangool aims to solve these problems, offering a much more convenient alternative API. Pangool is designed to simplify the development of Big Data applications by including the most common development patterns natively. And of course, it does this with no loss of efficiency compared to the native Hadoop implementation.

Main features

Tuple as the unit of information

The use of tuples provides developers with a great amount of flexibility in adapting to the unique features of each project. Tuples are managed efficiently by Pangool, thus reducing the overall cost of the project.

Grouping and sorting

Any given processing task in Pangool is governed mainly by two parameters: which fields are used in grouping and which are used for sorting. This simplification is one of Pangool’s strong points

Efficient and easy-to-implement Joins

One of the basic patterns that comes up in any Big Data project is the possibility of joining several data sets. Pangool supports joins natively and efficiently.

Multiple inputs and outputs

Pangool’s API offers integrated support for multiple inputs and outputs so that each job can include several input data sets and several output data sets.

Efficiency and flexibility

Pangool is a more convenient alternative to the Hadoop Java API. We can perform the same jobs in both, but the difficulty of the tasks is not the same. Therefore, we recommend using Pangool to open the doors to the Big Data world.

Despite its more powerful API, Pangool is quite similar in efficiency to Hadoop’s API. Pangool merely makes life easier for those who need the efficiency and flexibility of Hadoop’s Java API.

  • Simplifying development in Hadoop
  • Efficiency and flexibility
  • Tuples as the unit of information
  • Simple secondary sorting
  • Native support for joins
  • Instance configuration
  • Multiple inputs and outputs
  • Support for several serializations: Thirft, Avro, ProtoStuff, etc.
  • Fully compatible with Hadoop