Available blocks¶
http¶
This module contains blocks for performing HTTP operations.
matplotlib¶
This module is a wrapper around Matplotlib. You need matplotlib to use it: pip
install matplotlib.
misc¶
Contains miscellaneous blocks.
-
lb.blocks.misc.concatenate()[source]¶ Joins a list of strings.
Parameters: sep (str) – What will separate the strings in the result. Inputs: data (List[Any]) – The list of items you want to join. Outputs: result (Any) – The joined result.
-
lb.blocks.misc.flatMap()[source]¶ Applies a map function to every item, and then flattens the result.
[a,b,c] -> [[x,y],[z],[]] -> [x,y,z]Parameters: func (Callable) – The function to apply. Inputs: data (List[Any]) – The input list. Outputs: result (List[Any]) – The mapped list.
-
lb.blocks.misc.flatten_list()[source]¶ Flattens a list of lists.
[[a,b],[c,d]] -> [a,b,c,d]Inputs: data (List[List[Any]]) – The list to flatten. Outputs: result (List[Any]) – The flattened list.
-
lb.blocks.misc.group_by_count()[source]¶ Groups items in a similar manner as SQL’s
COUNT()…GROUP BY().Inputs: data (List[Any]) – The list to group and count. Outputs: result (List[Tuple[Any,int]]) – A list of tuples (item, count).
-
lb.blocks.misc.map_list()[source]¶ Applies a function to every element of a list and returns the resulting list.
Parameters: func (Callable) – The function to apply. Inputs: data (List[Any]) – The input list. Outputs: result (List[Any]) – The mapped list.
-
lb.blocks.misc.show_console()[source]¶ Pretty prints in green on the console.
Inputs: data (Any) – The field you want to display.
-
lb.blocks.misc.sort()[source]¶ Sorts a list.
Parameters: - key (Callable) – A function to select the element to sort on, similar to Python’s key argument to list.sort.
- reverse (bool) – Set to true to return the inverted result.
Inputs: data (List[Any]) – The list to sort.
Outputs: result (List[Any]) – The sorted list.
-
lb.blocks.misc.split()[source]¶ Splits a string.
Parameters: sep (str) – The separator on which to split. Inputs: data (str) – The string you want to split. Outputs: result (List[str]) – The list of splits.
spark¶
Wrappers around Apache Spark API. To use this module, you need to install Spark and pyspark.
-
lb.blocks.spark.get_spark_context()[source]¶ Creates a Spark context. Useful to have it in a function, otherwise within a module it will be created at import time, even if not used.
Not a block; not for use in a graph.
-
lb.blocks.spark.spark_add()[source]¶ ReduceByKey with the addition function.
Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The result.
-
lb.blocks.spark.spark_aggregateByKey()[source]¶ Spark’s aggregateByKey
Parameters: - zeroValue (Any) –
- seqFunc (Callable) –
- combFunc (Callable) –
- numTasks –
Inputs: data (RDD) – The RDD to convert.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_cartesian()[source]¶ Spark’s cartesian
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_coalesce()[source]¶ Spark’s coalesce
Parameters: numPartitions (int) – Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_cogroup()[source]¶ Spark’s cogroup
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_collect()[source]¶ Spark’s collect
Inputs: data (RDD) – The RDD to collect. Outputs: result (list) – The collected list.
-
lb.blocks.spark.spark_count()[source]¶ Spark’s count
Inputs: data (RDD) – The RDD to count. Outputs: result (int) – The number of items in the RDD.
-
lb.blocks.spark.spark_countByKey()[source]¶ Spark’s countByKey
Inputs: data (RDD) – The RDD to convert. Outputs: int) result (Mapping(Any,) – The mapping of the elements to their number of occurences.
-
lb.blocks.spark.spark_distinct()[source]¶ Spark’s distinct
Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_filter()[source]¶ Spark’s filter
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_first()[source]¶ Spark’s first
Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The first item of the RDD.
-
lb.blocks.spark.spark_flatMap()[source]¶ Spark’s flatMap
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_foreach()[source]¶ Spark’s foreach
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The result.
-
lb.blocks.spark.spark_groupByKey()[source]¶ Spark’s groupByKey
Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_intersection()[source]¶ Spark’s intersection
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_join()[source]¶ Spark’s join
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_map()[source]¶ Spark’s map
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_mapPartitions()[source]¶ Spark’s mapPartitions
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_pipe()[source]¶ Spark’s pipe
Parameters: command (str) – The command to pipe to. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_readfile()[source]¶ Reads a file and returns an RDD ready to act on it.
Parameters: - master (str) – Spark’s master.
- appname (str) – Spark’s application name.
- filename (str) – The file to be read.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_reduce()[source]¶ Spark’s reduce
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_reduceByKey()[source]¶ Spark’s reduceByKey
Parameters: - func (Callable) – The function to apply.
- numTasks – Number of tasks.
Inputs: data (RDD) – The RDD to convert.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_repartition()[source]¶ Spark’s repartition
Parameters: numPartitions (int) – Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_sample()[source]¶ Spark’s sample
Parameters: - withReplacement (bool) – Default to false.
- fraction (float) – The quantity to sample.
- seed (int) – Seed.
Inputs: data (RDD) – The RDD to convert.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_saveAsTextFile()[source]¶ Spark’s saveAsTextFile
Parameters: path (str) – The file path. Inputs: data (RDD) – The RDD to save.
-
lb.blocks.spark.spark_sortByKey()[source]¶ Spark’s sortByKey
Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.spark_swap()[source]¶ Swaps pairs.
`[(a,b),(c,d)] -> [(b,a),(d,c)]`Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The result.
-
lb.blocks.spark.spark_take()[source]¶ Spark’s take
Parameters: n (int) – The number of items to take Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The first n item of the RDD.
-
lb.blocks.spark.spark_takeOrdered()[source]¶ Spark’s takeOrdered
Parameters: - num (int) –
- key (int) –
Inputs: data (RDD) – The RDD to convert.
Outputs: result (list) – The resulting list.
-
lb.blocks.spark.spark_takeSample()[source]¶ Spark’s takeSample
Parameters: - withReplacement (bool) –
- num (int) –
- seed (int) –
Inputs: data (RDD) – The RDD to convert.
Outputs: result (list) – The resulting list.
twitter¶
unixlike¶
This module contains various blocks, which look like their UNIX friends.
-
lb.blocks.unixlike.cat()[source]¶ Reads a file.
Parameters: filename (str) – The file to read. Outputs: result (List[str]) – The lines of the file.
-
lb.blocks.unixlike.cut()[source]¶ Cuts content.
Parameters: - sep (str) – The separator.
- fields (List[int]) – The fields to extract.
Inputs: data (List[str]) – The list of strings to cut.
Outputs: result (List[str]) – The cut list of string.
-
lb.blocks.unixlike.grep()[source]¶ Greps content.
Parameters: pattern (str) – The pattern to grep for.
Inputs: - data (List[str]) – The data to grep.
- result (List[str]) – The grepped list of string.