CSV
CSV (Comma Separated Values) is a common text file format for tabular data. RiskScape can read CSV files directly and use them in your models. CSV files are useful for storing relational data, such as asset inventories, vulnerability tables, or lookup data.
Type detection
When loading data from CSV files, by default RiskScape will automatically detect and apply more appropriate data types,
such as turning lat, long coordinates into geometry.
This makes it easier to work with numeric data and geometry in CSV format without having to manually create RiskScape bookmarks.
Note
Type detection scans all of the data in the CSV file. This is convenient for smaller CSV files, but can take a long time for very large files (e.g. more than 100MB).
Type detection is applied after any other attribute mapping operations
(i.e. set-attribute or map-attribute).
Numeric columns
Type detection scans all text columns looking for numeric values:
If a column contains only numeric values, the type will be changed to
floating.If a column contains a mix of numeric and empty values, the type will be changed to
nullable('floating'). This allows RiskScape to handle missing data appropriately.
Note
If any rows contain text that cannot be converted to a number (e.g. ‘NaN’ or ‘NULL’), RiskScape will assume it is not numeric, even if all the other rows are.
Type-detection currently only supports the floating numeric type.
In order to end up with the integer type, you would need to apply a type cast
in your bookmark, like this:
[bookmark buildings]
location = house_prices.csv
set-attribute.year = int(year)
set-attribute.cost = int(cost)
Geometry
When type detection finds numeric latitude and longitude columns in your CSV file,
it can automatically create a geometry attribute. This makes it easy to use CSV files
containing point locations (such as building centroids) in your models.
RiskScape will add a geometry attribute when:
numeric columns are found for latitude and longitude coordinates
there is no other geometry attribute already present
The geometry value added will be a point in the EPSG:4326 CRS (i.e. WGS84).
The columns to use for the latitude and longitude coordinates is determined by:
Looking for an exact match in column name. RiskScape looks for columns called either
longitude,long, orlonfor longitude, in that order, andlatitudeorlatfor latitude.Searching for column names that contain
latitudeandlongitudeas sub-strings, e.g.Building_latitudeorLongitude_best_guess.
In both cases the search:
Is case insensitive, so
LatandLATwould both be acceptable latitude attribute namesUses the first match, so if
latandcustom_latitudeare both present thenlatwould be used.
Note
RiskScape does not currently automatically detect columns containing Well-Known Text (WKT),
e.g. POINT(174.7787 -41.2924).
Disabling type detection
To disable type detection, simply add type-detection = false to your bookmark, e.g.
[bookmark my-csv-data]
location = PATH/TO/data.csv
type-detection = false
When RiskScape loads data from a CSV file with type-detection disabled, every column will be assigned the text type by default.
Well-Known Text (WKT)
RiskScape does not automatically turn CSV data in Well-Known Text (WKT) format into geometry.
For example, using the following cities.csv file would simply result in a text-string WKT attribute by default.
name,WKT,population
Wellington,"POINT(-41.2924 174.7787)",1816000
Auckland,"POINT(-36.8509 174.7645)",520971
In order to load the data into a more usable format, you would need to define a bookmark like this:
[bookmark cities]
location = cities.csv
set-attribute.geometry = geom_from_wkt(WKT)
crs-name = EPSG:4326
Tip
Whenever you create geometry manually in a bookmark, such as using geom_from_wkt(),
you should always specify the crs-name bookmark parameter as well.
Otherwise RiskScape cannot tell what CRS the data is in.
Coordinate Reference System quirks
Sometimes you may also have to specify the axis-order that the CRS is in, i.e. whether the coordinates are in lat, long or long, lat order. Sometimes the EPSG standard (particularly EPSG:4326) will define a lat, long (or northing, easting) order, however, GIS applications will typically work with the geometry in long, lat (or easting, northing) order.
For example, say the cities.csv actually contained the following data, with the axis-order reversed
so that it is in long, lat order:
name,WKT,population
Wellington,"POINT(174.7787 -41.2924)",1816000
Auckland,"POINT(174.7645 -36.8509)",520971
Simply loading the data using geom_from_wkt() would put it in a completely wrong location.
Because the data contains the longitude coordinate first, to load it correctly you would need
to add crs-longitude-first to the bookmark, like this:
[bookmark cities]
location = cities.csv
set-attribute.geometry = geom_from_wkt(WKT)
crs-name = EPSG:4326
crs-longitude-first = true
The Geometry Reference Guide has more details on Axis/Ordinate Order.
Tip
The riskscape bookmark evaluate BOOKMARK_NAME command will produce
a geospatial output file with all your bookmark’s changes applied.
This makes it easy to see what your data will actually look like when it is loaded by RiskScape.
Other point coordinates
Sometimes your CSV data might contain point coordinates that RiskScape type-detection does not
load automatically, or the coordinates may be in another CRS that you need to specify manually.
For example, type-detection will not automatically create geometry for the following cities.csv file
because the column names do not match the ‘lat’, ‘long’ keywords at all.
name,POINT_Y,POINT_X,population
Wellington,-41.2924,174.7787,1816000
Auckland,-36.8509,174.7645,520971
Instead of geom_from_wkt(), in your bookmark you would use create_point(), like this:
[bookmark cities]
location = cities.csv
set-attribute.geometry = create_point(POINT_Y, POINT_X)
crs-name = EPSG:4326
Note
The coordinate order for create_point() should always match the CRS that the geometry is in.
In this case, EPSG:4326 specifies that coordinates are described in the y,x order.
Tip
Using create_point() to create geometry can also be useful for other tabular data formats,
like NetCDF or HDF5. Currently type-detection only creates
geometry automatically for CSV input files.
.