Available blocks

http

This module contains blocks for performing HTTP operations.

lb.blocks.http.read_http()[source]

Performs an HTTP GET request, and returns its result.

Parameters:
  • url (str) – The requested URL.
  • encoding (str) – How the result should be decoded.
Outputs:

result (str) – The content returned by the request.

matplotlib

This module is a wrapper around Matplotlib. You need matplotlib to use it: pip install matplotlib.

lb.blocks.matplotlib.plot_bars()[source]

Generates a bar plot.

Inputs:bar_values (list) – The values to plot, in the form of a list of tuples (value, label).

misc

Contains miscellaneous blocks.

lb.blocks.misc.concatenate()[source]

Joins a list of strings.

Parameters:sep (str) – What will separate the strings in the result.
Inputs:data (List[Any]) – The list of items you want to join.
Outputs:result (Any) – The joined result.
lb.blocks.misc.flatMap()[source]

Applies a map function to every item, and then flattens the result.

[a,b,c] -> [[x,y],[z],[]] -> [x,y,z]

Parameters:func (Callable) – The function to apply.
Inputs:data (List[Any]) – The input list.
Outputs:result (List[Any]) – The mapped list.
lb.blocks.misc.flatten_list()[source]

Flattens a list of lists.

[[a,b],[c,d]] -> [a,b,c,d]

Inputs:data (List[List[Any]]) – The list to flatten.
Outputs:result (List[Any]) – The flattened list.
lb.blocks.misc.group_by_count()[source]

Groups items in a similar manner as SQL’s COUNT()…GROUP BY().

Inputs:data (List[Any]) – The list to group and count.
Outputs:result (List[Tuple[Any,int]]) – A list of tuples (item, count).
lb.blocks.misc.map_list()[source]

Applies a function to every element of a list and returns the resulting list.

Parameters:func (Callable) – The function to apply.
Inputs:data (List[Any]) – The input list.
Outputs:result (List[Any]) – The mapped list.
lb.blocks.misc.show_console()[source]

Pretty prints in green on the console.

Inputs:data (Any) – The field you want to display.
lb.blocks.misc.sort()[source]

Sorts a list.

Parameters:
  • key (Callable) – A function to select the element to sort on, similar to Python’s key argument to list.sort.
  • reverse (bool) – Set to true to return the inverted result.
Inputs:

data (List[Any]) – The list to sort.

Outputs:

result (List[Any]) – The sorted list.

lb.blocks.misc.split()[source]

Splits a string.

Parameters:sep (str) – The separator on which to split.
Inputs:data (str) – The string you want to split.
Outputs:result (List[str]) – The list of splits.
lb.blocks.misc.write_line()[source]

Writes a string in a file.

Parameters:filename (str) – The file you want to write to.
Inputs:data (str) – The value to write in the file.
lb.blocks.misc.write_lines()[source]

Writes strings in a file, all on a new line.

Parameters:filename (str) – The file you want to write to.
Inputs:data (List[str]) – The values to write in the file.

spark

Wrappers around Apache Spark API. To use this module, you need to install Spark and pyspark.

lb.blocks.spark.get_spark_context()[source]

Creates a Spark context. Useful to have it in a function, otherwise within a module it will be created at import time, even if not used.

Not a block; not for use in a graph.

lb.blocks.spark.spark_add()[source]

ReduceByKey with the addition function.

Inputs:data (RDD) – The RDD to convert.
Outputs:result (Any) – The result.
lb.blocks.spark.spark_aggregateByKey()[source]

Spark’s aggregateByKey

Parameters:
  • zeroValue (Any) –
  • seqFunc (Callable) –
  • combFunc (Callable) –
  • numTasks
Inputs:

data (RDD) – The RDD to convert.

Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_cartesian()[source]

Spark’s cartesian

Inputs:
  • data1 (RDD) – The first RDD.
  • data2 (RDD) – The second RDD.
Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_coalesce()[source]

Spark’s coalesce

Parameters:numPartitions (int) –
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_cogroup()[source]

Spark’s cogroup

Inputs:
  • data1 (RDD) – The first RDD.
  • data2 (RDD) – The second RDD.
Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_collect()[source]

Spark’s collect

Inputs:data (RDD) – The RDD to collect.
Outputs:result (list) – The collected list.
lb.blocks.spark.spark_count()[source]

Spark’s count

Inputs:data (RDD) – The RDD to count.
Outputs:result (int) – The number of items in the RDD.
lb.blocks.spark.spark_countByKey()[source]

Spark’s countByKey

Inputs:data (RDD) – The RDD to convert.
Outputs:int) result (Mapping(Any,) – The mapping of the elements to their number of occurences.
lb.blocks.spark.spark_distinct()[source]

Spark’s distinct

Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_filter()[source]

Spark’s filter

Parameters:func (Callable) – The function to apply.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_first()[source]

Spark’s first

Inputs:data (RDD) – The RDD to convert.
Outputs:result (Any) – The first item of the RDD.
lb.blocks.spark.spark_flatMap()[source]

Spark’s flatMap

Parameters:func (Callable) – The function to apply.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_foreach()[source]

Spark’s foreach

Parameters:func (Callable) – The function to apply.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (Any) – The result.
lb.blocks.spark.spark_groupByKey()[source]

Spark’s groupByKey

Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_intersection()[source]

Spark’s intersection

Inputs:
  • data1 (RDD) – The first RDD.
  • data2 (RDD) – The second RDD.
Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_join()[source]

Spark’s join

Inputs:
  • data1 (RDD) – The first RDD.
  • data2 (RDD) – The second RDD.
Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_map()[source]

Spark’s map

Parameters:func (Callable) – The function to apply.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_mapPartitions()[source]

Spark’s mapPartitions

Parameters:func (Callable) – The function to apply.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_pipe()[source]

Spark’s pipe

Parameters:command (str) – The command to pipe to.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_readfile()[source]

Reads a file and returns an RDD ready to act on it.

Parameters:
  • master (str) – Spark’s master.
  • appname (str) – Spark’s application name.
  • filename (str) – The file to be read.
Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_reduce()[source]

Spark’s reduce

Parameters:func (Callable) – The function to apply.
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_reduceByKey()[source]

Spark’s reduceByKey

Parameters:
  • func (Callable) – The function to apply.
  • numTasks – Number of tasks.
Inputs:

data (RDD) – The RDD to convert.

Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_repartition()[source]

Spark’s repartition

Parameters:numPartitions (int) –
Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_sample()[source]

Spark’s sample

Parameters:
  • withReplacement (bool) – Default to false.
  • fraction (float) – The quantity to sample.
  • seed (int) – Seed.
Inputs:

data (RDD) – The RDD to convert.

Outputs:

result (RDD) – The resulting RDD.

lb.blocks.spark.spark_saveAsTextFile()[source]

Spark’s saveAsTextFile

Parameters:path (str) – The file path.
Inputs:data (RDD) – The RDD to save.
lb.blocks.spark.spark_sortByKey()[source]

Spark’s sortByKey

Inputs:data (RDD) – The RDD to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_swap()[source]

Swaps pairs.

`[(a,b),(c,d)] -> [(b,a),(d,c)]`

Inputs:data (RDD) – The RDD to convert.
Outputs:result (Any) – The result.
lb.blocks.spark.spark_take()[source]

Spark’s take

Parameters:n (int) – The number of items to take
Inputs:data (RDD) – The RDD to convert.
Outputs:result (Any) – The first n item of the RDD.
lb.blocks.spark.spark_takeOrdered()[source]

Spark’s takeOrdered

Parameters:
  • num (int) –
  • key (int) –
Inputs:

data (RDD) – The RDD to convert.

Outputs:

result (list) – The resulting list.

lb.blocks.spark.spark_takeSample()[source]

Spark’s takeSample

Parameters:
  • withReplacement (bool) –
  • num (int) –
  • seed (int) –
Inputs:

data (RDD) – The RDD to convert.

Outputs:

result (list) – The resulting list.

lb.blocks.spark.spark_text_to_words()[source]

Converts a line of text into a list of words.

Parameters:lowercase (bool) – If the text should also be converted to lowercase.
Inputs:line (RDD) – The line to convert.
Outputs:result (RDD) – The resulting RDD.
lb.blocks.spark.spark_union()[source]

Spark’s union

Inputs:
  • data1 (RDD) – The first RDD.
  • data2 (RDD) – The second RDD.
Outputs:

result (RDD) – The resulting RDD.

twitter

unixlike

This module contains various blocks, which look like their UNIX friends.

lb.blocks.unixlike.cat()[source]

Reads a file.

Parameters:filename (str) – The file to read.
Outputs:result (List[str]) – The lines of the file.
lb.blocks.unixlike.cut()[source]

Cuts content.

Parameters:
  • sep (str) – The separator.
  • fields (List[int]) – The fields to extract.
Inputs:

data (List[str]) – The list of strings to cut.

Outputs:

result (List[str]) – The cut list of string.

lb.blocks.unixlike.grep()[source]

Greps content.

Parameters:

pattern (str) – The pattern to grep for.

Inputs:
  • data (List[str]) – The data to grep.
  • result (List[str]) – The grepped list of string.
lb.blocks.unixlike.head()[source]

Keeps the beginning.

Inputs:
  • n (int) – How many items to keep.
  • data (List[Any]) – Input data.
Outputs:

result (List[Any]) – Shortened output.

lb.blocks.unixlike.tail()[source]

Keeps the end.

Inputs:
  • n (int) – How many items to keep.
  • data (List[Any]) – Input data.
Outputs:

result (List[Any]) – Shortened output.