Available blocks¶
http¶
This module contains blocks for performing HTTP operations.
matplotlib¶
This module is a wrapper around Matplotlib. You need matplotlib to use it: pip
install matplotlib
.
misc¶
Contains miscellaneous blocks.
-
lb.blocks.misc.
concatenate
()[source]¶ Joins a list of strings.
Parameters: sep (str) – What will separate the strings in the result. Inputs: data (List[Any]) – The list of items you want to join. Outputs: result (Any) – The joined result.
-
lb.blocks.misc.
flatMap
()[source]¶ Applies a map function to every item, and then flattens the result.
[a,b,c] -> [[x,y],[z],[]] -> [x,y,z]
Parameters: func (Callable) – The function to apply. Inputs: data (List[Any]) – The input list. Outputs: result (List[Any]) – The mapped list.
-
lb.blocks.misc.
flatten_list
()[source]¶ Flattens a list of lists.
[[a,b],[c,d]] -> [a,b,c,d]
Inputs: data (List[List[Any]]) – The list to flatten. Outputs: result (List[Any]) – The flattened list.
-
lb.blocks.misc.
group_by_count
()[source]¶ Groups items in a similar manner as SQL’s
COUNT()…GROUP BY()
.Inputs: data (List[Any]) – The list to group and count. Outputs: result (List[Tuple[Any,int]]) – A list of tuples (item, count).
-
lb.blocks.misc.
map_list
()[source]¶ Applies a function to every element of a list and returns the resulting list.
Parameters: func (Callable) – The function to apply. Inputs: data (List[Any]) – The input list. Outputs: result (List[Any]) – The mapped list.
-
lb.blocks.misc.
show_console
()[source]¶ Pretty prints in green on the console.
Inputs: data (Any) – The field you want to display.
-
lb.blocks.misc.
sort
()[source]¶ Sorts a list.
Parameters: - key (Callable) – A function to select the element to sort on, similar to Python’s key argument to list.sort.
- reverse (bool) – Set to true to return the inverted result.
Inputs: data (List[Any]) – The list to sort.
Outputs: result (List[Any]) – The sorted list.
-
lb.blocks.misc.
split
()[source]¶ Splits a string.
Parameters: sep (str) – The separator on which to split. Inputs: data (str) – The string you want to split. Outputs: result (List[str]) – The list of splits.
spark¶
Wrappers around Apache Spark API. To use this module, you need to install Spark and pyspark.
-
lb.blocks.spark.
get_spark_context
()[source]¶ Creates a Spark context. Useful to have it in a function, otherwise within a module it will be created at import time, even if not used.
Not a block; not for use in a graph.
-
lb.blocks.spark.
spark_add
()[source]¶ ReduceByKey with the addition function.
Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The result.
-
lb.blocks.spark.
spark_aggregateByKey
()[source]¶ Spark’s aggregateByKey
Parameters: - zeroValue (Any) –
- seqFunc (Callable) –
- combFunc (Callable) –
- numTasks –
Inputs: data (RDD) – The RDD to convert.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_cartesian
()[source]¶ Spark’s cartesian
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_coalesce
()[source]¶ Spark’s coalesce
Parameters: numPartitions (int) – Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_cogroup
()[source]¶ Spark’s cogroup
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_collect
()[source]¶ Spark’s collect
Inputs: data (RDD) – The RDD to collect. Outputs: result (list) – The collected list.
-
lb.blocks.spark.
spark_count
()[source]¶ Spark’s count
Inputs: data (RDD) – The RDD to count. Outputs: result (int) – The number of items in the RDD.
-
lb.blocks.spark.
spark_countByKey
()[source]¶ Spark’s countByKey
Inputs: data (RDD) – The RDD to convert. Outputs: int) result (Mapping(Any,) – The mapping of the elements to their number of occurences.
-
lb.blocks.spark.
spark_distinct
()[source]¶ Spark’s distinct
Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_filter
()[source]¶ Spark’s filter
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_first
()[source]¶ Spark’s first
Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The first item of the RDD.
-
lb.blocks.spark.
spark_flatMap
()[source]¶ Spark’s flatMap
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_foreach
()[source]¶ Spark’s foreach
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The result.
-
lb.blocks.spark.
spark_groupByKey
()[source]¶ Spark’s groupByKey
Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_intersection
()[source]¶ Spark’s intersection
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_join
()[source]¶ Spark’s join
Inputs: - data1 (RDD) – The first RDD.
- data2 (RDD) – The second RDD.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_map
()[source]¶ Spark’s map
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_mapPartitions
()[source]¶ Spark’s mapPartitions
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_pipe
()[source]¶ Spark’s pipe
Parameters: command (str) – The command to pipe to. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_readfile
()[source]¶ Reads a file and returns an RDD ready to act on it.
Parameters: - master (str) – Spark’s master.
- appname (str) – Spark’s application name.
- filename (str) – The file to be read.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_reduce
()[source]¶ Spark’s reduce
Parameters: func (Callable) – The function to apply. Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_reduceByKey
()[source]¶ Spark’s reduceByKey
Parameters: - func (Callable) – The function to apply.
- numTasks – Number of tasks.
Inputs: data (RDD) – The RDD to convert.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_repartition
()[source]¶ Spark’s repartition
Parameters: numPartitions (int) – Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_sample
()[source]¶ Spark’s sample
Parameters: - withReplacement (bool) – Default to false.
- fraction (float) – The quantity to sample.
- seed (int) – Seed.
Inputs: data (RDD) – The RDD to convert.
Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_saveAsTextFile
()[source]¶ Spark’s saveAsTextFile
Parameters: path (str) – The file path. Inputs: data (RDD) – The RDD to save.
-
lb.blocks.spark.
spark_sortByKey
()[source]¶ Spark’s sortByKey
Inputs: data (RDD) – The RDD to convert. Outputs: result (RDD) – The resulting RDD.
-
lb.blocks.spark.
spark_swap
()[source]¶ Swaps pairs.
`[(a,b),(c,d)] -> [(b,a),(d,c)]`
Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The result.
-
lb.blocks.spark.
spark_take
()[source]¶ Spark’s take
Parameters: n (int) – The number of items to take Inputs: data (RDD) – The RDD to convert. Outputs: result (Any) – The first n item of the RDD.
-
lb.blocks.spark.
spark_takeOrdered
()[source]¶ Spark’s takeOrdered
Parameters: - num (int) –
- key (int) –
Inputs: data (RDD) – The RDD to convert.
Outputs: result (list) – The resulting list.
-
lb.blocks.spark.
spark_takeSample
()[source]¶ Spark’s takeSample
Parameters: - withReplacement (bool) –
- num (int) –
- seed (int) –
Inputs: data (RDD) – The RDD to convert.
Outputs: result (list) – The resulting list.
twitter¶
unixlike¶
This module contains various blocks, which look like their UNIX friends.
-
lb.blocks.unixlike.
cat
()[source]¶ Reads a file.
Parameters: filename (str) – The file to read. Outputs: result (List[str]) – The lines of the file.
-
lb.blocks.unixlike.
cut
()[source]¶ Cuts content.
Parameters: - sep (str) – The separator.
- fields (List[int]) – The fields to extract.
Inputs: data (List[str]) – The list of strings to cut.
Outputs: result (List[str]) – The cut list of string.
-
lb.blocks.unixlike.
grep
()[source]¶ Greps content.
Parameters: pattern (str) – The pattern to grep for.
Inputs: - data (List[str]) – The data to grep.
- result (List[str]) – The grepped list of string.