CSV

CSV (Comma Separated Values) is a common text file format for tabular data. RiskScape can read CSV files directly and use them in your models.

CSV files are useful for storing relational data, such as asset inventories, vulnerability tables, or lookup data.

Default behaviour

When RiskScape loads data from a CSV file, it assigns the text type to each column by default. For example, using the following cities.csv file as a model input would result in four attributes (name, latitude, longitude, and population), all of which are text-strings.

name,latitude,longitude,population
Wellington,-41.2924,174.7787,1816000
Auckland,-36.8509,174.7645,520971

This data could not be used for any spatial operations because the coordinates are text type, rather than geometry type. The population attribute could not be multiplied by a scale-factor because it is text, rather than floating or integer type.

In order to load the data into a more usable format, you would previously have needed to define a bookmark, like this:

[bookmark cities]
location = cities.csv
set-attribute.geometry = create_point(latitude, longitude)
set-attribute.population = float(population)
crs-name = EPSG:4326

Type detection

RiskScape can automatically detect and apply more appropriate data types when loading CSV files. This makes it easier to work with numeric data and geometry without having to manually convert text values via set-attribute and map-attribute bookmark parameters.

To enable type detection, simply add type-detection = true to your bookmark, e.g.

[bookmark my-csv-data]
location = PATH/TO/data.csv
format = csv
type-detection = true

Note

Type detection scans all of the data in the CSV file. This is convenient for smaller CSV files, but can take a long time for very large files (e.g. more than 100MB).

Type detection is applied after any other attribute mapping operations (i.e. set-attribute or map-attribute).

Numeric columns

Type detection scans all text columns looking for numeric values:

  • If a column contains only numeric values, the type will be changed to floating.

  • If a column contains a mix of numeric and empty values, the type will be changed to nullable('floating'). This allows RiskScape to handle missing data appropriately.

Note

If any rows contain text that cannot be converted to a number, RiskScape will assume it is not numeric, even if all the other rows are.

Geometry

When type detection finds numeric latitude and longitude columns in your CSV file, it can automatically create a geometry attribute. This makes it easy to use CSV files containing point locations (such as building centroids) in your models.

RiskScape will add a geometry attribute when:

  • numeric columns are found for latitude and longitude coordinates

  • there is no other geometry attribute already present

The geometry value added will be a point in the EPSG:4326 CRS (i.e. WGS84).

The columns to use for the latitude and longitude coordinates is determined by:

  • Looking for an exact match in column name. RiskScape looks for columns called longitude, long, or lon for longitude, in that order, and latitude or lat for latitude.

  • Searching for column names that contain latitude and longitude as sub-strings, e.g. Building_latitude or Longitude_best_guess.

In both cases the search:

  • Is case insensitive, so Lat and LAT would both be acceptable latitude attribute names

  • Uses the first match, so if lat and custom_latitude are both present then lat would be used.

Note

RiskScape does not currently automatically detect columns containing Well-Known Text (WKT), e.g. POINT(174.7787 -41.2924).

.