Description

dataset entry exchange between machines.

The groupByKey() and cogroup() operations performa shuffle of information between machines in the cluster. The group locally in each machine every element of a dataset that has the same key. Shufle operations are usually associated with network operations because in order to group elements by key in the cluster, all machines have to send data that does not belong to it to the propper owner.

Note that when a dataset is grouped by key, the key location data is saved to be reused. That way, when calling cogroup multiple times, execution time is saved.

...
auto g1 = data.cogroup(data2); <--- this will take longer
auto g2 = data.cogroup(data3); <--- now it will take less time
...

Returns: pointer to self

Functions
template<typename U >
groupedFdd< K > *	faster::iFddCore< K, T >::cogroup (iFddCore< K, U > *fdd1)
	Groupes two datasets twogether according with the keys of the first dataset. More...

template<typename U , typename V >
groupedFdd< K > *	faster::iFddCore< K, T >::cogroup (iFddCore< K, U > fdd1, iFddCore< K, V > fdd2)
	Groupes tree datasets together according with the keys of the first dataset. More...

indexedFdd< K, T > *	faster::iFddCore< K, T >::groupByKey ()
	Groups distributed dataset by key. More...

Function Documentation

§ cogroup() [1/2]

template<typename K, typename T>

template<typename U >

groupedFdd<K>* faster::iFddCore< K, T >::cogroup ( iFddCore< K, U > * fdd1 )

inline

Groupes two datasets twogether according with the keys of the first dataset.

Template Parameters

U	- Value type of the second dataset

Parameters

fdd1	- second dataset

Returns: pointer to a dataset group

Definition at line 95 of file indexedFdd.h.

§ cogroup() [2/2]

template<typename K, typename T>

template<typename U , typename V >

groupedFdd<K>* faster::iFddCore< K, T >::cogroup	(	iFddCore< K, U > *	fdd1,
		iFddCore< K, V > *	fdd2
	)

inline

Groupes tree datasets together according with the keys of the first dataset.

Template Parameters

U	- Value type of the second dataset
V	- Value type of the third dataset

Parameters

fdd1	- second dataset
fdd2	- third dataset

Returns

Definition at line 114 of file indexedFdd.h.

§ groupByKey()

template<typename K , typename T >

indexedFdd< K, T > * faster::iFddCore< K, T >::groupByKey ( )

Groups distributed dataset by key.

Returns: pointer to itself

Definition at line 853 of file indexedFdd.h.

Description

Functions

Function Documentation

§ cogroup() [1/2]

§ cogroup() [2/2]

§ groupByKey()