Available blocks¶

http¶

This module contains blocks for performing HTTP operations.

lb.blocks.http.read_http()[source]¶

Performs an HTTP GET request, and returns its result.

Parameters:	url (str) – The requested URL. encoding (str) – How the result should be decoded.
Outputs:	result (str) – The content returned by the request.

matplotlib¶

This module is a wrapper around Matplotlib. You need matplotlib to use it: pip install matplotlib.

lb.blocks.matplotlib.plot_bars()[source]¶

Generates a bar plot.

Inputs:	bar_values (list) – The values to plot, in the form of a list of tuples (value, label).

misc¶

Contains miscellaneous blocks.

lb.blocks.misc.concatenate()[source]¶

Joins a list of strings.

Parameters:	sep (str) – What will separate the strings in the result.
Inputs:	data (List[Any]) – The list of items you want to join.
Outputs:	result (Any) – The joined result.

lb.blocks.misc.flatMap()[source]¶

Applies a map function to every item, and then flattens the result.

[a,b,c] -> [[x,y],[z],[]] -> [x,y,z]

Parameters:	func (Callable) – The function to apply.
Inputs:	data (List[Any]) – The input list.
Outputs:	result (List[Any]) – The mapped list.

lb.blocks.misc.flatten_list()[source]¶

Flattens a list of lists.

[[a,b],[c,d]] -> [a,b,c,d]

Inputs:	data (List[List[Any]]) – The list to flatten.
Outputs:	result (List[Any]) – The flattened list.

lb.blocks.misc.group_by_count()[source]¶

Groups items in a similar manner as SQL’s COUNT()…GROUP BY().

Inputs:	data (List[Any]) – The list to group and count.
Outputs:	result (List[Tuple[Any,int]]) – A list of tuples (item, count).

lb.blocks.misc.map_list()[source]¶

Applies a function to every element of a list and returns the resulting list.

Parameters:	func (Callable) – The function to apply.
Inputs:	data (List[Any]) – The input list.
Outputs:	result (List[Any]) – The mapped list.

lb.blocks.misc.show_console()[source]¶

Pretty prints in green on the console.

Inputs:	data (Any) – The field you want to display.

lb.blocks.misc.sort()[source]¶

Sorts a list.

Parameters:	key (Callable) – A function to select the element to sort on, similar to Python’s key argument to list.sort. reverse (bool) – Set to true to return the inverted result.
Inputs:	data (List[Any]) – The list to sort.
Outputs:	result (List[Any]) – The sorted list.

lb.blocks.misc.split()[source]¶

Splits a string.

Parameters:	sep (str) – The separator on which to split.
Inputs:	data (str) – The string you want to split.
Outputs:	result (List[str]) – The list of splits.

lb.blocks.misc.write_line()[source]¶

Writes a string in a file.

Parameters:	filename (str) – The file you want to write to.
Inputs:	data (str) – The value to write in the file.

lb.blocks.misc.write_lines()[source]¶

Writes strings in a file, all on a new line.

Parameters:	filename (str) – The file you want to write to.
Inputs:	data (List[str]) – The values to write in the file.

spark¶

Wrappers around Apache Spark API. To use this module, you need to install Spark and pyspark.

lb.blocks.spark.get_spark_context()[source]¶

Creates a Spark context. Useful to have it in a function, otherwise within a module it will be created at import time, even if not used.

Not a block; not for use in a graph.

lb.blocks.spark.spark_add()[source]¶

ReduceByKey with the addition function.

Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (Any) – The result.

lb.blocks.spark.spark_aggregateByKey()[source]¶

Spark’s aggregateByKey

Parameters:	zeroValue (Any) – seqFunc (Callable) – combFunc (Callable) – numTasks –
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_cartesian()[source]¶

Spark’s cartesian

Inputs:	data1 (RDD) – The first RDD. data2 (RDD) – The second RDD.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_coalesce()[source]¶

Spark’s coalesce

Parameters:	numPartitions (int) –
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_cogroup()[source]¶

Spark’s cogroup

Inputs:	data1 (RDD) – The first RDD. data2 (RDD) – The second RDD.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_collect()[source]¶

Spark’s collect

Inputs:	data (RDD) – The RDD to collect.
Outputs:	result (list) – The collected list.

lb.blocks.spark.spark_count()[source]¶

Spark’s count

Inputs:	data (RDD) – The RDD to count.
Outputs:	result (int) – The number of items in the RDD.

lb.blocks.spark.spark_countByKey()[source]¶

Spark’s countByKey

Inputs:	data (RDD) – The RDD to convert.
Outputs:	int) result (Mapping(Any,) – The mapping of the elements to their number of occurences.

lb.blocks.spark.spark_distinct()[source]¶

Spark’s distinct

Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_filter()[source]¶

Spark’s filter

Parameters:	func (Callable) – The function to apply.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_first()[source]¶

Spark’s first

Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (Any) – The first item of the RDD.

lb.blocks.spark.spark_flatMap()[source]¶

Spark’s flatMap

Parameters:	func (Callable) – The function to apply.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_foreach()[source]¶

Spark’s foreach

Parameters:	func (Callable) – The function to apply.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (Any) – The result.

lb.blocks.spark.spark_groupByKey()[source]¶

Spark’s groupByKey

Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_intersection()[source]¶

Spark’s intersection

Inputs:	data1 (RDD) – The first RDD. data2 (RDD) – The second RDD.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_join()[source]¶

Spark’s join

Inputs:	data1 (RDD) – The first RDD. data2 (RDD) – The second RDD.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_map()[source]¶

Spark’s map

Parameters:	func (Callable) – The function to apply.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_mapPartitions()[source]¶

Spark’s mapPartitions

Parameters:	func (Callable) – The function to apply.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_pipe()[source]¶

Spark’s pipe

Parameters:	command (str) – The command to pipe to.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_readfile()[source]¶

Reads a file and returns an RDD ready to act on it.

Parameters:	master (str) – Spark’s master. appname (str) – Spark’s application name. filename (str) – The file to be read.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_reduce()[source]¶

Spark’s reduce

Parameters:	func (Callable) – The function to apply.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_reduceByKey()[source]¶

Spark’s reduceByKey

Parameters:	func (Callable) – The function to apply. numTasks – Number of tasks.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_repartition()[source]¶

Spark’s repartition

Parameters:	numPartitions (int) –
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_sample()[source]¶

Spark’s sample

Parameters:	withReplacement (bool) – Default to false. fraction (float) – The quantity to sample. seed (int) – Seed.
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_saveAsTextFile()[source]¶

Spark’s saveAsTextFile

Parameters:	path (str) – The file path.
Inputs:	data (RDD) – The RDD to save.

lb.blocks.spark.spark_sortByKey()[source]¶

Spark’s sortByKey

Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_swap()[source]¶

Swaps pairs.

`[(a,b),(c,d)] -> [(b,a),(d,c)]`

Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (Any) – The result.

lb.blocks.spark.spark_take()[source]¶

Spark’s take

Parameters:	n (int) – The number of items to take
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (Any) – The first n item of the RDD.

lb.blocks.spark.spark_takeOrdered()[source]¶

Spark’s takeOrdered

Parameters:	num (int) – key (int) –
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (list) – The resulting list.

lb.blocks.spark.spark_takeSample()[source]¶

Spark’s takeSample

Parameters:	withReplacement (bool) – num (int) – seed (int) –
Inputs:	data (RDD) – The RDD to convert.
Outputs:	result (list) – The resulting list.

lb.blocks.spark.spark_text_to_words()[source]¶

Converts a line of text into a list of words.

Parameters:	lowercase (bool) – If the text should also be converted to lowercase.
Inputs:	line (RDD) – The line to convert.
Outputs:	result (RDD) – The resulting RDD.

lb.blocks.spark.spark_union()[source]¶

Spark’s union

Inputs:	data1 (RDD) – The first RDD. data2 (RDD) – The second RDD.
Outputs:	result (RDD) – The resulting RDD.

twitter¶

unixlike¶

This module contains various blocks, which look like their UNIX friends.

lb.blocks.unixlike.cat()[source]¶

Reads a file.

Parameters:	filename (str) – The file to read.
Outputs:	result (List[str]) – The lines of the file.

lb.blocks.unixlike.cut()[source]¶

Cuts content.

Parameters:	sep (str) – The separator. fields (List[int]) – The fields to extract.
Inputs:	data (List[str]) – The list of strings to cut.
Outputs:	result (List[str]) – The cut list of string.

lb.blocks.unixlike.grep()[source]¶

Greps content.

Parameters:	pattern (str) – The pattern to grep for.
Inputs:	data (List[str]) – The data to grep. result (List[str]) – The grepped list of string.

lb.blocks.unixlike.head()[source]¶

Keeps the beginning.

Inputs:	n (int) – How many items to keep. data (List[Any]) – Input data.
Outputs:	result (List[Any]) – Shortened output.

lb.blocks.unixlike.tail()[source]¶

Keeps the end.

Inputs:	n (int) – How many items to keep. data (List[Any]) – Input data.
Outputs:	result (List[Any]) – Shortened output.