libfaster API Documentation  Development Version
Super fast distributted computing
Shuffle Related Operations

Description

dataset entry exchange between machines.

The groupByKey() and cogroup() operations performa shuffle of information between machines in the cluster. The group locally in each machine every element of a dataset that has the same key. Shufle operations are usually associated with network operations because in order to group elements by key in the cluster, all machines have to send data that does not belong to it to the propper owner.

Note that when a dataset is grouped by key, the key location data is saved to be reused. That way, when calling cogroup multiple times, execution time is saved.

...
auto g1 = data.cogroup(data2); <--- this will take longer
auto g2 = data.cogroup(data3); <--- now it will take less time
...
Returns
pointer to self

Functions

template<typename U >
groupedFdd< K > * faster::iFddCore< K, T >::cogroup (iFddCore< K, U > *fdd1)
 Groupes two datasets twogether according with the keys of the first dataset. More...
 
template<typename U , typename V >
groupedFdd< K > * faster::iFddCore< K, T >::cogroup (iFddCore< K, U > *fdd1, iFddCore< K, V > *fdd2)
 Groupes tree datasets together according with the keys of the first dataset. More...
 
indexedFdd< K, T > * faster::iFddCore< K, T >::groupByKey ()
 Groups distributed dataset by key. More...
 

Function Documentation

§ cogroup() [1/2]

template<typename K, typename T>
template<typename U >
groupedFdd<K>* faster::iFddCore< K, T >::cogroup ( iFddCore< K, U > *  fdd1)
inline

Groupes two datasets twogether according with the keys of the first dataset.

Template Parameters
U- Value type of the second dataset
Parameters
fdd1- second dataset
Returns
pointer to a dataset group

Definition at line 95 of file indexedFdd.h.

§ cogroup() [2/2]

template<typename K, typename T>
template<typename U , typename V >
groupedFdd<K>* faster::iFddCore< K, T >::cogroup ( iFddCore< K, U > *  fdd1,
iFddCore< K, V > *  fdd2 
)
inline

Groupes tree datasets together according with the keys of the first dataset.

Template Parameters
U- Value type of the second dataset
V- Value type of the third dataset
Parameters
fdd1- second dataset
fdd2- third dataset
Returns

Definition at line 114 of file indexedFdd.h.

§ groupByKey()

template<typename K , typename T >
indexedFdd< K, T > * faster::iFddCore< K, T >::groupByKey ( )

Groups distributed dataset by key.

Returns
pointer to itself

Definition at line 853 of file indexedFdd.h.