libfaster API Documentation
Development Version
Super fast distributted computing
|
dataset entry exchange between machines.
The groupByKey() and cogroup() operations performa shuffle of information between machines in the cluster. The group locally in each machine every element of a dataset that has the same key. Shufle operations are usually associated with network operations because in order to group elements by key in the cluster, all machines have to send data that does not belong to it to the propper owner.
Note that when a dataset is grouped by key, the key location data is saved to be reused. That way, when calling cogroup multiple times, execution time is saved.
Functions | |
template<typename U > | |
groupedFdd< K > * | faster::iFddCore< K, T >::cogroup (iFddCore< K, U > *fdd1) |
Groupes two datasets twogether according with the keys of the first dataset. More... | |
template<typename U , typename V > | |
groupedFdd< K > * | faster::iFddCore< K, T >::cogroup (iFddCore< K, U > *fdd1, iFddCore< K, V > *fdd2) |
Groupes tree datasets together according with the keys of the first dataset. More... | |
indexedFdd< K, T > * | faster::iFddCore< K, T >::groupByKey () |
Groups distributed dataset by key. More... | |
|
inline |
Groupes two datasets twogether according with the keys of the first dataset.
U | - Value type of the second dataset |
fdd1 | - second dataset |
Definition at line 95 of file indexedFdd.h.
|
inline |
Groupes tree datasets together according with the keys of the first dataset.
U | - Value type of the second dataset |
V | - Value type of the third dataset |
fdd1 | - second dataset |
fdd2 | - third dataset |
Definition at line 114 of file indexedFdd.h.
indexedFdd< K, T > * faster::iFddCore< K, T >::groupByKey | ( | ) |
Groups distributed dataset by key.
Definition at line 853 of file indexedFdd.h.