Development Planing
Engineering
- Better integration with Big Data environment:
- HDFS - Hadoop distributed file system
- Yarn job scheduler
- Implement module testing:
- Unit Testing
- TDD - Test driven development
- Finish operators congruency
- Profiling Web Interface
- Packaging:
- Ubuntu 14.04
- Ubuntu 16.04
Research
- Richer performance analysis:
- RO operators vs RW operators
What is the performance gain in running RO and RW operators?
- MPI vs TCP?
Try to run MPI in a "compatibility mode" with TCP.
- Spark Garbage collection
Try to measure Time Spark spends in GC
- Spark Execution overhead
Measure time spent inside useful computing.
overhead = [total time] - [useful computing]
- More algorithms:
- KMeans
- Community Detection
- Compare algorithms to MPI implementations
- Bigger Cluster
- More Special Operators
- Online File input + parsing
- Online File input + parsing + operator
- Filter Input
- Asynchronous Message Machine
- Graph partitioning
- Load balancing for unbalanced algorithms or heterogeneous environments
- Migrate data betweeen machines in execution time
- hashing allocation
- Integration with OpenMP accelerators (GPGPU)
- Run a Map in the GPU
- Compatibility with bulk functions
- Cache datasets in gpu memory
BOLD - Active development