# Pipeline steps

## `input`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| limit | `Long` | 0..1 | Optionally limit the number of tuples that are yielded from the input data.  Similar to an SQL limit. |
| name | `String` | 0..1 | Nest each tuple from the relation in a parent tuple.  More simply put, if given, this will be the name of the attribute that wraps each tuple from the relation.  For example, if your source has attributes named `foo` and `bar`, setting name to `baz` will give results named `baz.foo` and `baz.bar`.  This can be useful for distinguishing the source data from other computed data in your pipeline. |
| offset | `Long` | 0..1 | Optionally skip a number of tuples that are yielded from the input data.  Similar to the offset part in an SQL limit clause |
| relation | `Expression` | 0..1 | The source data to yield.  Normally this is the ID of a bookmark from your project, but can also be a filename/URL. |
| value | `Expression` | 0..1 | An expression that provides a single input item. Used instead of `relation`. E.g the following would create an input with a single tuple `input(value: {make: 'Toyota', model: 'Corolla'})`. Note that `limit` and `offset` are ignored when `value` is used. |
Makes input available to the pipeline. Input is most often from a relation (such as a Shapefile or CSV) but may also come from the `value` expression.

## `filter`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| filter | `Expression` | 1 | A boolean-yielding expression. If the expression evaluates to FALSE, then the item is removed from the results. |
Applies a filter to results, removing those that fail the filter test.

## `join`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| initial-index-size | `Integer` | 0..1 | Sets the initial capacity of the index. This can improve performance when the initial size is large enough that the index does not need to grow as items are added. Setting to a large value will use more memory. |
| join-type | `JoinType` | 1 | Specifies what to do with left hand side (LHS) rows when no right hand side (RHS) row matches. One of:
- INNER: any non-matching rows are dropped (the default). 
- LEFT_OUTER: include any non-matching rows from the LHS. |
| on | `Expression` | 1 | Condition that evaluates whether a row of data from each relation 'matches' and should be joined together. |
Joins two relations together. The relation can be any chain of pipeline steps. When using `->` to chain pipeline steps to a join step, the inputs should be named. Use `lhs` for the left hand side (LHS) relation, and `rhs` for the right hand side (RHS) relation, e.g. `prev_step -> join.lhs`.

## `unnest`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| emit-empty | `Boolean` | 1 | If true, a result is returned for a tuple where the list(s) to be unnested are empty or null, with the lists to be unnested replaced with a null value. |
| index-key | `String` | 0..1 | If specified, an attribute of this name is added to output containing the index into list |
| to-unnest | `Expression` | 1 | The list attributes to expand, e.g. `unnest(list)` or `unnest([list1, list2])` to unnest more than one list at once. |
Takes a tuple that contains a list and expands it, so that the list is replaced by the list item(s). A new tuple is produced for each item in the list(s) specified. When multiple lists are specified, the nth tuple produced will contain the nth item from each list (similar to `zip()` in Python).

## `select`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| select | `Expression` | 1 | An expression that maps the input to output.  Output is always a struct, even if only a non-struct value is returned, e.g. select('') will give a struct with a single text member. |
Produces a new output tuple by applying the given select expression to the input tuple, e.g. `select({*, '$' + str(cost) as "Replacement Cost"})`.  Don't forget to surround your expression with braces (`{` and `}`) if you want to return more than one thing!

## `sort`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| by | `Expression` | 1 | The attribute(s) to sort by.  To sort on multiple attributes, use a list of attributes, e.g. `[consequence.loss, exposure.total_value]` |
| delta | `Lambda` | 0..1 | A lambda expression that calculates the delta between the two tuples that have been sorted. The delta is added into the tuples. For example if the tuples contained a `value` attribute and you wanted to know the difference between tuples you could use `(prev, current) -> current.value - prev.value` |
| delta-attribute | `String` | 0..1 | Name the attribute that the delta will be stored in. By default this will be `delta`. |
| delta-type | `String` | 0..1 | The type of the delta-attribute. If provided, it will be used to pass the delta from the previous step into the lambda expression (as `prev.delta`). This provides basic support for cascading hazards. Only required when the delta lambda expression uses the delta-attribute |
| direction | `Direction` | 0+ | May be used to set the sort direction for attributes. `asc` to sort in ascending order, or `desc` to sort in descending order. Any attributes that do not have a direction set in this list will default to ascending. E.g. `['desc', 'asc']` |
Produces output that is sorted based on one or more sort-by expressions. The sort step should only be used immediately before a save step. This is to prevent any subsequent steps changing the order because they are being processed in parallel. Caution, output is collected in memory. Sorting large volumes of output may result in failures due to insufficient memory.

## `group`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| by | `Expression` | 0..1 | An expression that groups the input so that each group is aggregated individually.  For example `group({category, sum(value) as total}, by: category)` would group all inputs by their category and calculate a total value for each category |
| select | `Expression` | 1 | The aggregation expression to apply to input, e.g. `{sum(value) as total}`.  Members of the group by expression can be referenced here, e.g. `group({category, sum(value) as total}, by: category)` is valid. |
Apply an aggregate expression across optionally grouped input values, e.g. `group(count(exposures), by: damage_state)`. Similar to the use of `GROUP BY` in SQL, use RiskScape's aggregate functions to perform operations like `sum` and `count` on grouped tuples to compute a single value for each group.

## `save`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| format | `Format` | 0..1 | The file format to use when saving data. See `riskscape format list` for the available formats. Defaults to `csv` if no geometry is present in the output, `shapefile` if there is. |
| name | `String` | 0..1 | A name to give to the output.  May differ to the name ultimately given to any created files, depending on the format or the storage location (e.g. files saved to a directory with existing files may be renamed to avoid over-writing any existing files). |
Save results out to files or other supported storage systems.

## `enlarge`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| distance | `Expression` | 1 | The distance (in metres) to enlarge the geometries. |
| geom-expression | `PropertyAccess` | 0..1 | Expression to the geometry to enlarge (or a struct that contains a geometry member). If not specified then the first geometry found will be the one to be enlarged. |
| mode | `EnlargeMode` | 0..1 | Controls how geometries are enlarged. Refer to `buffer` function for a description of how mode affects the enlarged geometry. |
| remove-overlaps | `Boolean` | 0..1 | When true, overlaps that exist after enlarging the geometries will be removed.  Overlaps are removed for each geometry by 1) finding all other geometries that overlap it, 2) removing the overlap from either the geometry being checked or the other geometry in an alternating manner (this is to prevent all overlaps being removed from the first geometry). 3) If removing an overlap would render any of the resulting geometries empty, then the overlap is not removed. |
Enlarges a geometry by a specified amount. Primarily intended for use with line geometries.

## `union`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
Combines two pipeline chains into one. For example, this can combine multiple input relations into one. The resulting tuple is a combination of the attributes produced by each of the input steps. For best results, the input steps should produce the same attribute names and types. If an attribute is present in one branch but not another, it will become Nullable. If the attribute is present, but has a different type, then its type may become 'Anything'.

## `python`

| Name | Type Name | Arity | Description |
| --- | --- | --- | --- |
| result-type | `Struct` | 0..1 | A type definition of the rows your function will yield, e.g. `struct(count: integer, max: floating)`. |
| script | `URI` | 1 | The location of the python script to execute, e.g. `functions/count_rows.py`. |
Pass execution of the pipeline to and from a special dataset-processing CPython function.  Whereas a 'normal' CPython function is called once per tuple, using the `python` step allows all the tuples to be collected in the function and then any number of tuples returned.  See the documentation online for more detailed examples.