Home About Blog Downloads Documentation FAQ

Development Planing

Engineering

  • Better integration with Big Data environment:
    • HDFS - Hadoop distributed file system
    • Yarn job scheduler
  • Implement module testing:
    • Unit Testing
    • TDD - Test driven development
  • Finish operators congruency
  • Profiling Web Interface
  • Packaging:
    • Ubuntu 14.04
    • Ubuntu 16.04

Research

  • Richer performance analysis:
    • RO operators vs RW operators
      What is the performance gain in running RO and RW operators?
    • MPI vs TCP?
      Try to run MPI in a "compatibility mode" with TCP.
    • Spark Garbage collection
      Try to measure Time Spark spends in GC
    • Spark Execution overhead
      Measure time spent inside useful computing.
      overhead = [total time] - [useful computing]
    • More algorithms:
      • KMeans
      • Community Detection
    • Compare algorithms to MPI implementations
    • Bigger Cluster
  • More Special Operators
    • Online File input + parsing
    • Online File input + parsing + operator
    • Filter Input
    • Asynchronous Message Machine
    • Graph partitioning
  • Load balancing for unbalanced algorithms or heterogeneous environments
    • Migrate data betweeen machines in execution time
    • hashing allocation
  • Integration with OpenMP accelerators (GPGPU)
    • Run a Map in the GPU
    • Compatibility with bulk functions
    • Cache datasets in gpu memory

BOLD - Active development