Required data inputs and paramters¶
Important
- This section describes the required data inputs and parameters for the Argentina Transport Risk Analysis (ATRA)
- To implement the ATRA all data described here should be created with the data properties and column names as described below
- If these data properties and column names are not provided in the data then the Python scripts will give run-time errors
Spatial data requirements¶
- All spatial data inputs must:
- Be projected to a valid coordinate system. Spatial data with no projection system will give errors
- Have valid geometries. Null or Invalid geometries will give errors
Note
- The assumed projection system used in the model is EPSG:4326
- If the users change any spatial data they have to create new data with a valid projection system
Topological network requirements¶
- A topological network is defined as a graph composed of nodes and edges
- All finalised networks data are created and stored:
- In the file path -
/data/network/
- As csv file with post-processed network nodes and edges
- As Shapefiles with post-processed network nodes and edges
- The created networks are:
road, rail, port, air
- The
air
network has fewer attributes due to lack of data and airlines not being important for commidity flows bridge
files are also created but they are not networks, as explained below
- In the file path -
Note
The names and properties of the attributes listed below are the essential network parameters for the whole model analysis. If the users wish to replace or change these datasets then they must retain the same names of columns with same types of values as given in the original data.
It is recommended that changes in parameter values should be made in the csv files, while the Shapefiles are mainly used for associating the geometries of the features. While we have provided the Shapefiles with parameter values as well, the model uses the Shapeflies mainly for performing geometry operations.
For example if a new road edge is added to the road network, then all its properties should be added to the road_edges.csv
file, while in the road_edges.shp
file the edge_id
and valid geometry
should be added.
The essential attributes in these networks are listed below. See the data for all attributes and try to recreate your data with similar column names and attribute values.
Several of these parameters and their values are created from incoming_data
which is explained in the section Preparing data for the model
- All nodes have the following attributes:
node_id
- String Node IDgeometry
- Point geometry of node with projection ESPG:4326- Several other atttributes depending upon the specific transport sector
- All edges have the following attributes:
from_node
- String node ID that should be present in node_id columnto_node
- String node ID that should be present in node_id columnedge_id
- String edge IDgeometry
- LineString geometry of edge with projection ESPG:4326length
- Float estimated length in kilometers of edgemin_speed
- Float estimated minimum speed in km/hr on edgemax_speed
- Float estimated maximum speed in km/hr on edgemin_time
- Float estimated minimum time of travel in hours on edgemax_time
- Float estimated maximum time of travel in hours on edgemin_gcost
- Float estimated minimum generalized cost in USD/ton on edge (not present in road edge files)max_gcost
- Float estimated maximum generalized cost in USD/ton on edge (not present in road edge files)- Several other atttributes depending upon the specific transport sector
Note
It is very important that the first 2 columns of the edge files should be from_node
and to_node
. If it is not then the network graph creation with give run-time error. The order of other columns is flexible. For example edge_id
could be column number 3 or any other column after the second column.
- Attributes only present in roads edges:
road_name
- String name or number of roadsurface
- String value for surface material of the roadroad_type
- String value of either national, province or ruralwidth
- Float width of edge in metersmin_time_cost
- Float estimated minimum cost of time in USD on edgemax_time_cost
- Float estimated maximum cost of time in USD on edgemin_tariff_cost
- Float estimated minimum tariff cost in USD on edgemax_tariff_cost
- Float estimated maximum tariff cost in USD on edgetmda_count
- Integer number of daily vehicle counts on edge
- National-roads bridges GIS data are also created as nodes containing:
bridge_id
- String bridge IDedge_id
- String edge ID matchingedge_id
of national-roads edges intersecting with bridgeswidth
- Float with of bridge in meterslength
- Float length of bridge in metersgeometry
- Point geometry of node with projection ESPG:4326- Several other atttributes depending upon the specific bridge input data
- National-roads bridges GIS data are also created as edges containing:
bridge_id
- String bridge IDlength
- Float length of bridge in metersgeometry
- LineString geometry of bridge with projection ESPG:4326
Note
We assume that networks are provided as topologically correct connected graphs: each edge
is a single LineString (may be straight line or more complex line), but must have exactly
two endpoints, which are labelled as from_node
and to_node
(the values of these
attributes must correspond to the node_id
of a node).
Wherever two edges meet, we assume that there is a shared node, matching each of the intersecting edge endpoints. For example, at a t-junction there will be three edges meeting at one node.
Due to gaps in geometries and connectivity in the raw datasets several dummy nodes and edges have been created in the node and edges join points and lines. For example there are more nodes in the rail network than stations in Argentina, and similarly in the port network. The road network contains severral edges with road_type = 0
which represent a dummy edge created to join two roads.
The bridge datasets are not networks because they do not have a topology. Bridge nodes are matched to the road network to later match road flow and failure results with failed bridges. For example, we estimate the failure consequence of a road edge of the National Route 12 first, and if we know there is a bridge on this road that is also flooded then we assign the failure consequence to the bridge as well. Bridge edges are created to intersect with flood outlines to estimate the length of flooding of bridges.
OD matrices requirements¶
- All finalised OD matrices are stored:
- In the path -
/data/OD_data/
- As csv file with names
{mode}_nodes_daily_ods.csv
wheremode = {road, rail, port}
- As csv file with names
{mode}_province_annual_ods.csv
- As Excel sheets with combined Province level annual OD matrices
- In the path -
- All node-level daily OD matrices contain mode-wise and total OD flows and should have attributes:
origin_id
- String node IDs of origin nodes. Value should be present in thenode_id
column of the sectors network filedestination_id
- String node IDs of destination nodes. Value should be present in thenode_id
column of the sectors network fileorigin_province
- String names of origin Provincesdestination_province
- String names of destination Provincesmin_total_tons
- Float values of minimum daily tonnages between OD nodesmax_total_tons
- Float values of maximum daily tonnages between OD nodes- Float values of daily min-max tonnages of commodities/industries between OD nodes: here based on OD data provided for each sector
- If min-max values cannot be estimated then there is a
total_tons
column - for roads only
- All aggregated province-level OD matrices contain mode-wise and total OD flows and should have attributes:
origin_province
- String names of origin Provincesdestination_province
- String names of destination Provincesmin_total_tons
- Float values of minimum daily tonnages between OD Provincesmax_total_tons
- Float values of maximum daily tonnages between OD Provinces- Float values of daily min-max tonnages of commodities/industries between OD Provinces: here based on OD data provided for each sector
- If min-max values cannot be estimated then there is a
total_tons
column - for roads only
Note
The OD columns names and their attributes listed aobve are essential for the flow and failure model analysis. While the names of commodities/industries might vary it is important that the OD data has the columns specifically mentioned as origin_id, destination_id, origin_province, destination_province, min_total_tons (or total_tons), max_total_tons (or total_tons)
.
The model can track individual commodity/industry flows and failure results, but in the overrall calculations it estimates the flows and disruptions corresponding to the total tonnage (min or max). The commodity/industry names are important for doing macroeconomic loss analysis explained below.
Hence, if an new user input contains only the total tonnage values and no commodity/industry specific OD values, then the model codes will still run with no errors, except the macroeconomic analysis code will not be able to run.
If the users wish to replace or change these datasets then they must retain the same names of columns with same types of values as given in the original data.
Hazards data requirements¶
- All hazard datasets are stored:
- In sub-folders in the path -
/data/flood_data/FATHOM
- As GeoTiff files
- See
/data/flood_data/hazard_data_folder_data_info.xlsx
for details of all hazard files
- In sub-folders in the path -
- Single-band GeoTiff hazard raster files should have attributes:
- values - between 0 and 1000 for flood depth in meters
- raster grid geometry
- projection systems: Default assumed = EPSG:4326
Note
The hazard datasets were obtained from a third-party consultant https://www.fathom.global who generated flood maps specific to this project
It is assumed that all hazard data is provided in GeoTiff format with a projection system. If the users want to introduce new hazard data then it should be in GeoTiff format only.
When new hazard files are given the hazard_data_folder_data_info.xlsx
should be updated accordingly
Administrative areas with statistics data requirements¶
- Argentina boundary datasets are stored:
- In the path -
/incoming_data/admin_boundaries_and_census/departamento/
- In the path -
/incoming_data/admin_boundaries_and_census/provincia/
- As Shapefiles
- In the path -
- Global boundary dataset for map plotting are stored:
- In the path -
/data/boundaries/
- As Shapefiles
- In the path -
- Census boundary data are stored:
- In the path -
/incoming_data/admin_boundaries_and_census/radios censales/
- As a Shapefile
- In the path -
Note
The admin and boundary datasets were obtained from different sources in Argentina
Admin boundary | Source |
---|---|
Department | Provided through World Bank |
Province | Provided through World Bank |
All admin levels | https://www.naturalearthdata.com/downloads/10m-physical-vectors/ |
Census - 2010 | https://www.indec.gov.ar/ |
Admin boundary layers are generally available online. For example at https://data.humdata.org/dataset/argentina-administrative-level-0-boundaries.
The department, province and census datasets are used in the model, while the global boundaries are mainly used for generaing map backgrounds
The names and properties of the attributes listed below are the essential boundary parameters for the whole model analysis. If the users wish to replace or change these datasets then they must retain the same names of columns with same types of values as given in the original data.
For example if a new census dataset is introduced then it should contain the column poblacion
with new population numbers. The census data used here is at Department level, but it could be replaced with other boundary level census estimates as well.
- All Argentina Department boundary datasets should have the attributes:
name
- String names Spanish - attribute name changed todepartment_name
OBJECTID
- Integer IDs - attribute name changed todepartment_id
geometry
- Polygon geometries of boundary with projection ESPG:4326
- All Argentina Province boundary datasets should have attributes:
nombre
- String names Spanish - attribute name changed toprovince_name
OBJECTID
- Integer IDs - attribute name changed toprovince_id
geometry
- Polygon geometries of boundary with projection ESPG:4326
- All global boundary datasets should have attributes:
name
- String names of boundaries in Englishgeometry
- Polygon geometry of boundary with projection ESPG:4326
- The census datasets should have attributes:
poblacion
- Float value of populationgeometry
- Polygon geometry of boundary with projection ESPG:4326
Macroeconomic data requirements¶
- For the macroeconomic analysis first a multi-regional IO matrix for 24 provinces in Argentina is created from a national-level IO matrix and province level Gross Production Values (GPV) of IO Industries
2. The multi-regional macroeconoic IO data is created from data downloaded from the Instituto Nacional de Estadística y Censos (INDEC) website. The data is stored as:
- Industry and Commodity level IO accounts in the file path
data/economic_IO_tables/input/sh_cou_06_16.xls
- Industry level GPV in the file path
data/economic_IO_tables/input/PIB_provincial_06_17.xls
- Names of aggregated industries classification for Argentina in the file path
data/economic_IO_tables/input/industry_high_level_classification.xlsx
, which should be present in the IO and GPV data files
- A set of look-up tables are created to match commodities in the OD matrices to IO industries
- In the file in path -
data/economic_IO_tables/input/commodity_classifications-hp.xlsx
- The sheetnames in the excel file are
road, rail, port
corresponding to the sector for which OD matrices are created commodity_group
- String name of commodity group identified in the OD matrices datacommodity_subgroup
- String name of commodity subgroup identified in the OD matrices datahigh_level_industry
- String name of aggregated industry present in theindustry_high_level_classification.xlsx
file
- In the file in path -
- The multi-regional macroeconomic IO data creation, explained later, produces results:
- In the file in path -
data/economic_IO_tables/output/IO_ARGENTINA.xlsx
- In the file in path -
data/economic_IO_tables/output/MRIO_ARGENTINA_FULL.xlsx
- This data is used in the macroeconomic loss analysis
- In the file in path -
Note
The macroeconomic data are obtained from INDEC at https://www.indec.gob.ar/nivel3_default.asp?id_tema_1=3&id_tema_2=9&fbclid=IwAR02qnMIJeu86xUM5TFK5hrABN3FcJLGx6k5BYNhxLe4o0FhqJxuV2wxb5E. The PIB and COU datasets are used in the model
If the users want to update the IO tables for Argentina then it is recommended that they replace the above files sh_cou_06_16.xls
and PIB_provincial_06_17.xls
with exactly the same sheetnames and data structures as given in the original data used by the IO model scripts.
If the industry classifications are modified in the IO data then the changeas should also be made in industry_high_level_classification.xlsx
and commodity_classifications-hp.xlsx
files.
Adaptation options and costs requirements¶
- All adaptation options input datasets are stored:
- In the file -
/data/adaptation_options/ROCKS - Database - ARNG (Version 2.3) Feb2018.xlsx
- We use the sheet
Resultados Consolidados
for our analysis
- In the file -
Note
The adaptation data is very specific and if new options are created then the users will need to change the scripts as well