Multi-file hazard data

There are a few different cases where you might have multiple hazard input files that you want to run through your model:

Probabilistic modelling, where each hazard file represents a specific ARI (Average Recurrence Interval).
Time-series modelling, where each hazard file represents a particular snapshot in a time sequence. For example, a volcanic eruption sequence may be represented by a series of hazard layers that span several months.
Cases where you are ‘stitching together’ hazard data that spans multiple files. For example, a national-scale flood model may use GeoTIFFs produced for one river catchment at a time.
Letting the user pick the hazard scenario at run-time, such as picking out a particular sea level rise increment for a climate-change model.

Specifying each hazard-layer as a separate model parameter quickly becomes impractical, as you might have a dozen or more hazard-layers, or the hazard-layers may vary in number depending on the situation. Fortunately, the RiskScape Engine’s data processing abilities are flexible enough to support all these models. The approach we recommend using is loading the hazard-layers via a CSV file.

Note

This page covers using multiple hazard-layers at the same time. If you have multiple hazard-layers that you want to run through your model one at a time, then refer to Running the same model repeatedly.

Tip

This page is aimed at advanced users who are comfortable writing pipeline code. If pipeline code is new to you, try looking at How to write basic pipelines and How to write advanced pipelines first.

Loading data via CSV

To start off with, you will probably have a directory full of hazard files. The directory listing may look like this, for example:

Hazards/FloodDepth_upolu_5_0.tif
Hazards/FloodDepth_upolu_10_0.tif
Hazards/FloodDepth_upolu_25_0.tif
Hazards/FloodDepth_upolu_50_0.tif
Hazards/FloodDepth_upolu_100_0.tif
Hazards/FloodDepth_upolu_250_0.tif

The first step is to turn this directory listing into a CSV file. In addition to the filepath, you should also include any metadata you will need in your model as separate CSV columns. For a probabilistic model, this will be the return period or ARI. For a time-series model, this will be the time step or event-ID.

From the example directory listing earlier, we could create a hazards.csv file that looks like this:

filepath,island,return_period,SLR
Hazards/FloodDepth_upolu_5_0.tif,upolu,5,0
Hazards/FloodDepth_upolu_10_0.tif,upolu,10,0
Hazards/FloodDepth_upolu_25_0.tif,upolu,25,0
Hazards/FloodDepth_upolu_50_0.tif,upolu,50,0
Hazards/FloodDepth_upolu_100_0.tif,upolu,100,0
Hazards/FloodDepth_upolu_250_0.tif,upolu,250,0

RiskScape can now load this CSV as an input to your model. Each row of data in the CSV file will be processed one at a time in your model. However, the filepath attribute is simply loaded into the model as a text string, whereas in order to spatially query it, we need to turn it into Coverage data.

Note

The filepath in the CSV file should always be a path relative to the directory containing your project.ini file. This is how the RiskScape Engine will try to resolve the files. Alternatively, you can use full filepaths, but this makes it more difficult to share your model with others or upload the model into the RiskScape Platform.

Building a coverage dynamically

The bookmark() function will turn a GeoTIFF filepath from a text-string into a coverage that can be spatially sampled. For example, bookmark('Hazards/FloodDepth_upolu_10_0.tif') would return a coverage that could then be passed to sample() functions. In this case, we simply need to call bookmark(filepath) for each row of data in our input hazards.csv file.

However, the dynamic nature of the hazards.csv file presents a problem. RiskScape pipeline code is a typed language, which means that the RiskScape Engine needs to figure out the shape of your input data (e.g. raster or vector data) before the model is run. However, RiskScape does not know what sort of file filepath is until the model is actually run.

We can solve this by defining a placeholder bookmark in our project.ini file. RiskScape can use this placeholder bookmark to work out the shape of the input data before the model is run. For our example data, the placeholder bookmark might look like this:

[bookmark floodmap]
location = Hazards/FloodDepth_upolu_10_0.tif

In the model pipeline, we can then use the placeholder bookmark ID in our bookmark() expression, e.g. bookmark('floodmap'). Here, the bookmark ID 'floodmap' is a constant expression, which means it always has the same value for every row of data processed by the model. In this case, it means that the bookmark() expression will always return raster data (i.e. a coverage), because that matches the type of the floodmap bookmark.

The second part is swapping the bookmark’s location on the fly for the filepath from the CSV file. As long as the type doesn’t change, the bookmark() expression can return a different GeoTIFF coverage for each row of data loaded from the CSV. In our example, the pipeline might look like this:

input('hazards.csv')
 ->
select({ *, bookmark('floodmap', { location: filepath }) as coverage })

Tip

Make sure the placeholder bookmark you use is valid. You can check it using the command riskscape bookmark info BOOKMARK_ID. You will get an error if the bookmark ID does not exist, or the location filepath in the placeholder bookmark is not valid.

Vector data

This example uses raster data (i.e. GeoTIFF) for simplicity, but you can use the same approach if your hazard-layer is vector data (i.e. shapefile, GeoPackage, etc). When loading vector data dynamically via CSV, the input data in the files must match exactly. This means that:

all files in the CSV must have the exact same attributes.
all files in the CSV must be in the exact same CRS.

Typically when using this approach, the hazard data would have all been generated in the same manner, so should have the same attributes. However, if some vector files contain spurious extra attributes, it can be helpful to define a type in your bookmark to exclude the attributes that you don’t need. For example, the following placeholder bookmark would only include the attributes the_geom and Depth.

[bookmark floodmap]
location = Hazards/FloodDepth_upolu_10_0.shp
type = struct(the_geom: geometry, Depth: floating)

Tip

If the vector data files span several different CRS, then you could try grouping the data by CRS, e.g. if you are dealing with two different CRS then split the data into two different CSV files. Alternatively, try working in a common CRS, such as WGS84.

Alternative raster approach

Instead of using a placeholder bookmark, a slightly more complicated alternative is to specify the type of the data being loaded as a third type argument to the bookmark() function. This approach works best for raster input files. Using this approach, the example pipeline would look like this:

input('hazards.csv')
 ->
select({ *, bookmark('N/A', { location: filepath }, type: 'coverage(floating)') as coverage })

Note

The type argument is currently not supported for geospatial vector data. However, it could be used to load non-geospatial relations, such as CSV input data or NetCDF files.

The rest of the model

So far we have only looked at loading the hazard-layers into the model dynamically. The rest of the model will still need to:

Join the exposure-layer to the events
Calculate the consequence (i.e. loss) for each event
Aggregate the results by event

Calculating the consequence for each event essentially works the same as a single-event model, but we will look at joining and aggregating the data in a bit more detail.

Joining the exposure-layer

Each exposure-layer feature will be potentially exposed to every event. To do this in pipeline code, we can simply use the join(on: true) step to join the exposure data to the hazard data. Essentially, every row of exposure-data will be duplicated for each event (i.e. each row of data in the hazards.csv file).

For example, say we only had two buildings (A and B) and two events (1 and 2), then after joining the datasets and calculating the loss, the pipeline data might look like this:

exposure	hazard	loss
Building A	event 1	$10,000
Building A	event 2	$25,000
Building B	event 1	$0
Building B	event 2	$5,000

The pipeline code to join together the exposure and hazard input data might look something like this:

input($exposure_layer, name: 'exposure')
 ->
join(on: true) as join_exposures_and_events

input('hazards.csv', name: 'event')
 ->
select({ *, bookmark('floodmap', { location: event.filepath }) as coverage })
 ->
join_exposures_and_events.rhs

Tip

It’s better performance if the smaller dataset is on the right-hand side on the join (i.e. .rhs). Typically the exposure dataset will be larger than the hazard dataset, in terms of rows of data, and so the hazard dataset should connect to the .rhs of the join step.

Aggregating the results by event

When working with multiple hazard layers, you may often want to view the total loss by event. Going back to the two-building example from the previous section, the aggregated loss would look like this:

hazard	loss
event 1	$10,000
event 2	$30,000

Tip

For probabilistic models, you could pass the total loss to aal_trapz() to calculate the Average Annual Loss.

The pipeline code for the whole model, including aggregation, might look something like this:

input($exposure_layer, name: 'exposure')
 ->
join(on: true) as join_exposures_and_events
 ->
select({ *, sample_centroid(exposure, coverage) as hazard_intensity })
 ->
select({ *, loss_function(exposure, hazard_intensity) as loss })
 ->
group(by: event,
      select: {
                *,
                sum(loss) as Total_Loss,
                mean(hazard_intensity) as Mean_Hazard
      }) as event_loss_table
 ->
save('event-loss')

input('hazards.csv', name: 'event')
 ->
filter(true)
 ->
select({ *, bookmark('floodmap', { location: event.filepath }) as coverage })
 ->
join_exposures_and_events.rhs

Note that:

sample_centroid() is used here for simplicity, but you may want to use different Spatial sampling depending on your model.
loss_function() is a placeholder for where your own loss or consequence function would go.
If you were letting the user pick out the hazard scenario at run-time, then you would simply change the filter(true) step to be more appropriate, e.g. filter(event.SLR = $SLR)
How you aggregate by event may vary slightly depending on your model. For example, if you were ‘stitching together’ hazard data that spans multiple files, you would need to be careful to aggregate by event.id or event.return_period instead of the whole event (which only represents one GeoTIFF file out of many for the event).

Event dependencies

The example pipeline so far has been processing each GeoTIFF as a separate, independent event. In some cases you may want to keep the event data together. For example:

For a time-series model, such as cascading hazards, where the damage state of the building needs to be factored into the next event.
For memory efficiency, when calculating a per-property AAL for a probabilistic model with a large exposure dataset.

In these cases, we can aggregate the coverages into a single list. That allows each exposure to process all events in one go. The pipeline code to do this would look more like this:

input($exposure_layer, name: 'exposure')
 ->
join(on: true) as join_exposures_and_events
 ->
select({ *, map(events, event -> sample_centroid(exposure, event.coverage)) as hazard_intensities })
 ->
select({ *, cascading_loss_function(exposure, hazard_intensities) as loss })

input('hazards.csv', name: 'event')
 ->
filter(true)
 ->
select({ *, bookmark('floodmap', { location: event.filepath }) as coverage })
 ->
group({ to_list({ coverage, event }) as events })
 ->
join_exposures_and_events.rhs

Note

For cascading hazards, you would also need to sort the events list. When aggregating data into a list, RiskScape does not guarantee the order of the list items in any way.

.