Home About Blog Downloads Documentation FAQ

About

Faster is a Distributed computing framework based on functional computing designed to be efficient and simple to use. Transforming a single threaded algorithm into a multi-threaded distributed problem solver can be done in seconds with Faster. Just a few lines of code to transform a source and a single command to deploy it in a cluster. It is computation time for everybody! :D

As many computation environments are not comprised of homogeneous machines, Faster takes advantage of heterogeneous environments. We use dynamic resource allocation to increase resource occupation. If a machine is often overloaded in comparison with its peers we migrate data to idle machines to even the score. Also, we prioritize coarse grained data processing to favor scalability.

Also, Faster uses MPI, a well established library, for process management and message passing, this way we can offer easy of deployment. Servers running Secure Shell (SSH) service is enough to allow users run Faster based jobs. There no requirement for any management process or sophisticated file systems or queuing mechanisms (though, we support some, of course).

Faster uses distributed datasets called FDD (Fast Distributed Dataset). FDDs distributes data among Faster based processes located on computational entities. It rebalances it according to load and memory availability in order to avoid performance bottlenecks and data loss. It also supports failure tolerance though data replication and checkpointing.

We use functional paradigm in order to process distributed data. Maps transforms data items of on type into another, Reduces combined multiple items and CountByKey count key occurrence of indexed data. Those and other functions allow us to implement most algorithms behaviours, including iterative ones. We aim to optimize iterative algorithms through exceptionally fast task management and lightweight Read-Write Map functions called Update. Update functions can perform very fast iterations because they don't generate a new dataset, it only updates item values.

Features

  • High level and easy to use distributed computing framework.
  • Functional ideology similar to Apache Spark(map, reduce, flatMap, groupByKey, coGroup etc).
  • Coarse grained data processing for scalability.
  • Fault tolerance for safe execution.
  • Implemented in C++ for speed.
  • Optimized for heterogeneous clusters with a dynamic balancing of data blocks.
  • Small memory footprint and fast start time. Dynamic library loading loads only library parts really needed.
  • Optimized for iterative algorithms with support to RW Map function (Update).
  • Built-in bulk functions (bulkMap, bulkReduce etc) to enable algorithms sub-iterations.
  • Masterless implementation. If the SSH is running everything is fine.
  • Minimal memory footprint. With manual memory free option.
  • Native OpenMP parallelization. Maps and Reduces are already parallelized wit OpenMP (Bulk functions are not).
  • Well established MPI library for message passing and process management.
  • Easy to use with Cuda/OpenCL.

System Requirements

  • Operational systems supported:
    • Ubuntu 12.04
    • We still don't know what OS will support Faster. Please, let us know!
  • Compilation:
    • Cmake
    • OpenMPI 1.5
    • GCC 4.8 or compatible with:
      • OpenMP
      • C++11:
        • Template Alias
        • Variadic Templates