Tutorial
========
In this tutorial, we go through the steps of installing λ-blocks,
writing a computation graph, and proceed to execute it.
Installation
------------
Dependencies
^^^^^^^^^^^^
If you're using Debian, Ubuntu, or a system of this family, the
required dependencies should all be available in your package manager::
sudo apt install python3 python3-venv libyaml-dev
If you're not using Debian or a Debian-based system, be sure to
install Python 3 and the development headers of ``libyaml``, this is
necessary for ``pip`` to compile PyYAML.
Finally, if you want to use the Spark blocks, you will need Spark and
``pyspark`` to be installed on your system (but this is not required
for this tutorial).
λ-blocks
^^^^^^^^
While λ-blocks is still in its early days of development, it is not
available through ``pip``, nor in any distribution package
manager. Therefore, the best is to install it in a virtual environment
this way::
git clone https://github.com/lambdablocks/lambdablocks.git
cd lambdablocks
pyvenv VENV
source VENV/bin/activate
python3 setup.py install
Also not required for this tutorial, these dependencies are needed for
some blocks in the included library::
pip install matplotlib requests-oauthlib
Verification
^^^^^^^^^^^^
Don't leave your activated virtual environment, and try executing::
blocks.py --help
If you get the help page of this executable, all is set!
Writing a computation graph
---------------------------
Now that everything is installed, let's dive into writing a first
λ-blocks program.
Such a program, also called a computation graph, is written in `YAML
`_, a simple data representation format. Create a
new file and name it ``wordcount.yml``: it will contain the description
of a computation graph to perform a Wordcount. Add this content::
---
name: wordcount
description: Counts words
modules: [lb.blocks.unixlike]
---
- block: cat
name: cat
args:
filename: examples/wordlist
This YAML file contains two parts: the first one is a key/value list
giving information on the computation graph (such as its name,
description, and used modules). The second part is more interesting:
it contains the list of the code blocks that are the vertices of our
graph. For now, there is only one vertice: it uses the block
:py:func:`lb.blocks.unixlike.cat`. It has a unique name ``cat`` (since
we use only once the block ``cat`` in this program, the vertice name
can be the same as the block name), and one argument, a path to a
file. As you may have guessed, this block acts like the Unix ``cat``
utility: it reads a file.
This program won't do much, except for reading a file. You can try to
execute it this way::
blocks.py -f wordcount.yml
If nothing happens, it is normal: the file has been read by λ-blocks,
but it isn't supposed to be displayed on the console. If you get an
error, the path you provided may be incorrect: be sure to execute the
command within in the ``lambdablocks`` folder, or to change the
``filename`` argument.
Let's add a few vertices in our graph, and link them together to
compute a Wordcount implementation::
---
name: wordcount
description: counts words
modules: [lb.blocks.unixlike]
---
- block: cat
name: cat
args:
filename: examples/wordlist
- block: group_by_count
name: group
inputs:
data: cat.result
- block: sort
name: sort
args:
key: "lambda x: x[1]"
reverse: true
inputs:
data: group.result
- block: show_console
name: show
inputs:
data: sort.result
We now have 4 blocks (or vertices):
* ``cat`` reads a file and outputs a list of lines found in this file;
* ``group_by_count`` reads a list, and outputs a list of unique items,
along with the number of times they appear in the list;
* ``sort`` reads a list, and outputs a sorted list, sorted by the second
item of each element;
* ``show_console`` displays its inputs on the user console.
A block has named inputs and named outputs. To link two blocks
together, we specify the inputs of a block in the ``inputs`` key. For
example, the block ``group_by_count`` takes one input, ``data``, that is
the output ``result`` of the block ``cat``.
Let's try to execute this graph::
blocks.py -f wordcount.yml
That's it! You should get a list of fruits, along with their number of
occurences.
Using plugins
-------------
λ-blocks, while processing a computation graph, can execute plugins,
which are pieces of Python code able to act on the graph. For example,
let's try the included :py:mod:`lb.plugins.debug` plugin::
blocks.py -f wordcount.yml -p lb.plugins.debug
This plugin will display an excerpt of the results produced by each
block, which allows you to effectively see what every block is
doing. This is useful to follow the data as it is transformed from the
entry of the graph to all the following vertices.
You can also try to execute the :py:mod:`lb.plugins.instrumentation`
plugin the same way, which will measure the time taken by every block
to compute, useful to detect bottlenecks::
blocks.py -f wordcount.yml -p lb.plugins.debug lb.plugins.instrumentation
Unsurprisingly, the ``cat`` block should be the slowest, because it
requires to read a file on disk.
Next steps
----------
Now that we've seen some possibilities of λ-blocks and how it works,
you can look at some :doc:`examples `, :doc:`check the list
of available blocks `, :doc:`the list of available plugins
`, :doc:`write your own blocks ` or :doc:`write
your own plugins `.