Pre-processing data for the model¶
Important
The topological network data and parameters described in Topological network requirements had to be created from several data sources, which had several gaps.
- This section describes collected datasets that are used to create data for the Argentina Transport Risk Analysis (ATRA)
- The datasets listed here are specfic to Argentina and are used as inputs to make the finalized data used in the rest of the model
- To implement the ATRA pre-processing without any changes in existing codes, all data described here should be created and stored exactly as indicated below
- Python scripts were created specific to clean and modify these datasets, due to which changes made to the way the data are organized will most probably also result in making changes to the Python scripts
- In some instances some values for data are encoded within the Python scripts, so the users should be able to make changes directly in the Python scripts
- In each Python script described below, see the inline comments to understand where the inputs are given
- It is recommended to run the Python scripts in the same order as described here
- If the users want to use the same data and make modifications in values of data then they can follow the steps and codes explained below. Otherwise this whole process can be skipped if the users know how to create the networks in the formats specified in the Topological network requirements
- If the data are updated, especially if OD flows are updated to another year, then the users will have to make changes to the Python codes to be able to input new data files
- Mostly all inputs are read using the Python libraries of pandas and geopandas. The user should familiarise themselves with file reading and writing functions in these libraries. For example most codes use the geopandas function read_file and to_file <http://geopandas.org/io.html>`_to read and write shapefiles, and the pandas functions `read_excel and to_excel and read_csv and to_csv to read and write excel and csv data respectively
Creating the road network¶
Note
The road network is combined from datasets of national, provincial and rural roads in Argentina. The raw GIS data for these three types of networks were obtained from the Ministry of Transport and Dirección Nacional de Vialidad (DNV)
Road data | Source |
---|---|
National | https://www.argentina.gob.ar/vialidad-nacional/sig-vial |
Province | Provided through World Bank from MoT |
Rural | Provided through World Bank from MoT |
National roads bridges | https://www.argentina.gob.ar/vialidad-nacional/sig-vial |
OpenStreetMaps (OSM) | https://openmaptiles.com/downloads/dataset/osm/south-america/argentina/#2.96/-40.83/-63.6 |
National roads widths | Provided through World Bank from DNV |
National roads speeds | Provided through World Bank from DNV |
Road vehicle costs | Provided through World Bank from DNV |
The portal https://ide.transporte.gob.ar/geoserver/web/ also contains open-source transport data that was downloaded, including the province and rural road networks. See the Python script atra.preprocess.scrape_wfs
- The road network data is stored:
- In sub-folders in the file path -
/data/pre_processed_networks_data/roads/
- As Shapefiles with attributes
- File in sub-folder
/national_roads/rutas/
contains national roads - We extract the columns
cod_ruta
(forroad_name
),sentido = A
andgeometry
- We extract the columns
- File in sub-folder
- File in sub-folder
/province_roads/
contains province roads - We extract the columns
nombre
(forroad_name
),clase
(forsurface
) andgeometry
- We extract the columns
- File in sub-folder
- File in sub-folder
/rural_roads/
contains rural roads - We extract the columns
characteris
(forsurface
) andgeometry
- We extract the columns
- File in sub-folder
- File in sub-folder
/osm_roads/
contains OSM roads for gap filling - We extract the columns
road_name
,road_type
andgeometry
- We extract the columns
- File in sub-folder
- In sub-folders in the file path -
- National Roads specifc GIS data are stored:
- In sub-folders in the path -
/incoming_data/pre_processed_network_data/roads/national_roads/
- As Shapefiles with attributes
- File in sub-folder
/indice_de_estado/
contains road surface quality as numeric values - We use the columns
nro_regist
as id,``valor`` forroad_quality
,sentido = A
andgeometry
- Road surface quality is used to estimate speeds on the national roads
- We use the columns
- File in sub-folder
- File in sub-folder
/indice_de_serviciabilidad/
contains road service quality as numeric values - We use the columns
nro_regist
as id,``valor`` forroad_service
,sentido = A
andgeometry
- Road service quality is used to estimate speeds on the national roads
- We use the columns
- File in sub-folder
- File in sub-folder
/materialcarril_sel/
contains road surface meterial as string values - We use the columns
id_materia
as id,``grupo`` formaterial_group
,``sentido = A`` andgeometry
- Surface material determines the conditon of the national roads for adaptation investments
- We use the columns
- File in sub-folder
- File in sub-folder
/tmda/
contains TMDA counts as numeric values - We use the columns
nro_regist
as id,``valor`` forroad_service
,sentido = A
andgeometry
- TMDA gives observed vehcile counts on national roads
- We use the columns
- File in sub-folder
- File in sub-folder
/v_mojon/
contains locations of kilometer markers - We use the columns
id
,progresiva
,distancia
andgeometry
- kilometer markers are used in assinging properties on national roads and locating bridges
- We use the columns
- File in sub-folder
- In sub-folders in the path -
- Data on select national roads widths and terrains are stored:
- In the Excel file path -
incoming_data/road_properties/Tramos por Rutas.xls
- We use the sheet
Hoja1
- In the Excel file path -
- Data on select national roads speeds are stored:
- In the Excel file path -
incoming_data/road_properties/TMDA y Clasificación 2016.xlsx
- We use the sheet
Clasificación 2016
- In the Excel file path -
- Road costs are stored:
- In the path -
/incoming_data/costs/road/
- As Excel files
- The Vehicle Operating Costs are in the file
Costos de Operación de Vehículos.xlsx
- We use the sheet
Camión Pesado
for costs - The tariff costs are in the file
tariff_costs.xlsx
- In the path -
Note
- The finalized road network is created by executing 3 Python scripts:
- Run
atra.preprocess.combine_roads
to extract data from the files described in Step 1 above - Run
atra.preprocess.network_road_topology
to create road nodes and edges topology - Run
atra.preprocess.road_network_creation
to assign road properties described above. This is the main script that creates the finalized road network and requires several inputs
- Run
The result of these scripts create the road_edges
and road_nodes
files described in the folder path data/network/
The topology script above is very specific to the case of the particular input data provide here. Unfortunaly if the data is changed them the users might have to test their results again if they run the topology script. We had to manually clean, edit and add some new edges to complete the topology. But this depends upon the quality of input provided and not the python script!
The Python codes require the specific inputs of the above datasets from the users to be able to identify the specific rows and columns in the data. If the users change these datasets in the future then, to use the same Python codes, then should preserve the column names and their properties
In the excel sheets in incoming_data/road_properties/
and incoming_data/costs/road/
the original data obtained from the DNV are preserved, and changing the locations and columns and rows will require making changes to the scripts. When data is missing some assumptions of values are taken, which are hard coded in the Python script.
The users should familiarize themselves with the functions
in the script atra.preprocess.road_network_creation
if
they want to change data. Below the kinds of user inputs changes in this script are explained
- Lines 445-554 where all the inputs are given to the code. See the function:py:mod:main
- Currency exchange rate from ARS to USD is 1 ARS = 0.026 USD. See the function:py:mod:main
- The default
surface
of a national road is assumed to beAsfalto
, and other roads it isTierra
. See the functionassign_road_surface
- The default
width
of national and province roads is assumed to be 7.3m (2-lane) and rural roads is 3.65m (1-lane). The defaultterrain
is assumed flat. See the functionassign_road_terrain_and_width
- If no informattion on road speeds is provided through the data in
incoming_data/road_properties/TMDA y Clasificación 2016.xlsx
then the road speeds are assumed to be as following. See the functionassign_min_max_speeds_to_roads
- For national roads with poor to fair quality (0 <
road_service
<= 1) or (0 <road_quality
<= 3) speeds vary from 50-80 km/hr- For national roads with fair to good quality (1 <
road_service
<= 2) or (3 <road_quality
<= 6) speeds vary from 60-90 km/hr- For national roads with good to very good quality speeds vary from 70-100 km/hr
- For all province roads speeds vary from 40-60 km/hr
- For all rural roads speeds vary from 20-40 km/hr
Creating the national roads bridges data¶
- National-roads bridges GIS data are stored:
- In the path -
/incoming_data/pre_processed_network_data/bridges/puente_sel/
- As Shapefiles with Point geometry of nodes with projection ESPG:4326
- As Excel file with bridges attributes in sheetname
Consulta
- In the path -
Note
- The finalized national-roads bridges data is created by executing 1 Python script after the road network has been already created:
- Run
atra.preprocess.road_bridge_matches
to extract data from the files described in Step 1 above
- Run
The original bridges data downloaded from https://www.argentina.gob.ar/vialidad-nacional/sig-vial provided a shapefile with only bridge locations, and the excel sheet with bridge properties. Unfortunately these two files did not have a common ID column to link them together. Hence the python script mainly matches the bridges to their location information using the kilometer marker locations specified for the bridge Excel data and matching these with the kilometer markers and national roads GIS data provided for the national roads, explained in Creating the road network. If the users alrready have a bridge dataset has all attribtues in a geocoded files, then they do not need to run the Python script. But they will still have to match the bridge_id
to the edge_id
column of the road_edges
dataset.
- The result of this script creates the
bridge_edges
andbridges
files described in the folder pathdata/network/
. If the users change the bridges datasets in the folder path/incoming_data/pre_processed_network_data/bridges/puente_sel/
, then to use the same Python script to create newbridge_edges
andbridges
files they should replace the shapefile and excel sheet data while still retaining the following column names in their data id_estruct
- Numeric values to ID column only present in shapefileids
- Numeric values of bridge ID. Renamed tobridge_id
by the modellongitud
- Float values of bridge length in meters. Renamed tolength
by the modelancho de vereda derecha
- Float values of right lane width of bridge in meters. Used for estimatingwidth
ancho de vereda izquierda
- Float values of left lane width of bridge in meters. Used for estimatingwidth
ancho pavimento asc.
- Float values of pavement width of bridge in meters. Used for estimatingwidth
ancho pavimento desc.
- Float values of pavement width of bridge in meters. Used for estimatingwidth
tipo de estructura
- String description of the type of bridge. Renamed tostructure_type
by the modelruta
- String name to national road where bridge belongsgeometry
- Point and line geometries of bridges with projection ESPG:4326- Several other attributes which are not used in the rest of the model
Creating road OD matrix at node level¶
Note
- The road OD matrix data is matched to the
road_nodes
data by executing 1 Python script after the road network has been already created: - Run
atra.preprocess.road_od_flows
to create the road OD matrix at node-node level
- Run
- The original road OD data provided by the Secretaría de Planificación de Cargas contains high-level annual OD matrices for 123 domestic zones in Argentina. This data is disaggregated at the road node level based on follwing assumptions:
- The nodes on national and province roads are only considered as OD nodes
- For each node the near population (obtained from census data) is estimated and only those nodes with population above 1000 are considered as OD nodes
- The OD nodes flows allocation is similar to a gravity model based on the importance of origin and destination nodes in creating and attracting OD flows.
- The OD matrices are annual and are converted to daily flows by dividing by 365
If the users want to change the high-level OD data then they should replace the OD datasets as described below. They can also can update the road_nodes
, province and census shapefiles described in Administrative areas with statistics data requirements
- Road commodity OD matrices data are stored:
- In the path -
/incoming_data/OD_data/road/Matrices OD 2014- tablas/
- As Excel files
- The name of the excel file and excel sheet correspond to commodity groups and subgroups
- Each Excel Sheet is a 123-by-123 matrix of OD tons with first row and first column showing Zone IDs
- We use the sheets
Total Toneladas 2014
if given otherwise add tons across sheets - Each Excel Sheet is a 123-by-123 matrix with first row and first column showing Zone IDs
- In the path -
- Road commodity OD Zone data is stored:
- In the path -
/incoming_data/OD_data/road/Lineas de deseo OD- 2014/3.6.1.10.zonas/
- As Shapefile
data
- Theod_id
that matches the OD matrices Excel datageometry
- Polygon geometry of zone with projection ESPG:4326
- In the path -
Creating the rail network and OD matrix¶
Note
- The finalized rail network and OD matrix data are all created by executing 1 Python script:
- Run
atra.preprocess.rail_od_flows
to create the rail network and OD matrix at node-node level
- Run
Rail data | Source |
---|---|
Rail lines | Provided through World Bank from MoT |
Stations | Provided through World Bank from MoT |
OD data | Secretaría de Planificación de Cargas |
Transport Costs | Estimated from COSFER model by Secretaría de Planificación de Transporte |
Rail GIS data can also be downloaded from the portal https://ide.transporte.gob.ar/geoserver/web/.
See the Python script atra.preprocess.scrape_wfs
- The original rail OD data provided by the Secretaría de Planificación de Cargas contains station-station OD matrices which are time-stamped for the year 2015. But there are several issues with using the rail GIS network and OD data directly:
- The names of the OD stations do not always match the nodes in the GIS data. So we do not always know the location of OD nodes
- The route information does not match any GIS data, if it exists
- In several cases the time-stamps are missing, so we do not know the time of start and end of a jounrey
- In several cases the distance of travel is missing, so we do not know the length of the jounrey
- Only is some instances does the data indicate the origin and destination provinces
- The GIS network shows several historic lines, which are no longer used. The GIS data does not indicate which lines are no longer in operation
- The script
atra.preprocess.rail_od_flows
resolves some of the issues above. The following operations are performed by the script: - The OD nodes are matched to GIS nodes
- The OD flows are routed on the GIS network, to check as best whether the observed OD distances match the estimated OD distances obtained from the GIS network. This helps in validating whether OD nodes were assigned correctly on the GIS network
- The total OD tonnages are aggregated over a day, based on the start date. From this the minimum and maximum OD flows are estimated betwork OD pairs
- Speeds are assigned based on the time-stamps of origin and destination stations. Default speeds of rail lines are assumed to be 20 km/hr
Unfortunalety the script atra.preprocess.rail_od_flows
is very specific to the input datasets, and relies on having the same column names and organisation of data as described in the input data used in this current version
- Rail GIS data are stored:
- In the path -
/incoming_data/pre_processed_network_data/railways/national_rail/
- As Shapefiles
- In the path -
Note
The topology is assumed to have already been created in the rail network. We had to create some of this manually, so we cannot provide a automated Python script to do so. The user is recommended to check tools in the Python library snkit for creating network topology.
- Rail OD matrices data are stored:
- In the path -
/incoming_data/OD_data/rail/Matrices OD FFCC/
- As Excel files
- The names of the sheets within the excel files vary. See the Python script for specific information
- The OD data in each excel sheet varies, but some information is necessary for OD matrix creation
origin_station
- String name of origin stationorigin_date
- Datetime object for date of journeydestination_station
- String name of destination stationcommodity_group
- String name of commodity groupsline_name
- String name of thee line used for transporttons
- Numeric values of tonnages- Several other column, which are referred to in the Python script
- In the path -
- A file to match names of OD stations to GIS nodes is stored:
- In the path -
/incoming_data/pre_processed_network_data/railways/rail_data_cleaning/station_renames.xlsx
- As Excel file
- This was created manually by looking at the OD and GIS data, and inferring matches based on Google searches and our judgement
- In the path -
- Rail costs are stored:
- In the Excel file path -
incoming_data/costs/rail/rail_costs.xlsx
- We use the sheet
route_costs
- In the Excel file path -
Creating the port network and OD matrix¶
Note
- The port network and OD matrix data are all created by executing 1 Python script:
- Run
atra.preprocess.port_od_flows
to create the port network and OD matrix at node-node level
- Run
Port data | Source |
---|---|
Port locations | Secretaría de Planificación de Cargas |
Maritime routes | Created manually from OSM data |
OD data | Secretaría de Planificación de Cargas |
Transport Costs | Estimated from data from Secretaría de Planificación de Transporte |
Port GIS node data can also be downloaded from the portal https://ide.transporte.gob.ar/geoserver/web/. See the Python script atra.preprocess.scrape_wfs
- The original port OD data provided by the Secretaría de Planificación de Cargas contains port specific OD data which are time-stamped for the year 2017. But there are several issues with using the port GIS network and OD data directly:
- The original data gives port specific information on how much different types of freight are exported, imported or transiting at the port
- The information on the origin and destination of the freights are mostly missing, so we have inferred them as best
- In several cases the time-stamps are missing, so we do not know the time of start and end of a jounrey
- Only is some instances does the data indicate the origin and destination provinces or countries
- The script
atra.preprocess.port_od_flows
resolves some of the issues above. The following operations are performed by the script: - The OD nodes are inferred by gap filling the port-level flow data
- The total OD tonnages are aggregated over a day, based on the start date. From this the minimum and maximum OD flows are estimated betwork OD pairs
- Default speeds are assumed to be 4-5 km/hr
Unfortunalety the script atra.preprocess.port_od_flows
is very specific to the input datasets, and relies on having the same column names and organisation of data as described in the input data used in this current version
- Port GIS data are stored:
- In the path -
/incoming_data/pre_processed_network_data/ports/
- As Shapefiles
- In the path -
Note
The topology is assumed to have already been created in the rail network. We had to create some of this manually, so we cannot provide a automated Python script to do so. The user is recommended to check tools in the Python library snkit for creating network topology.
- A file to match names of ports and commodity to GIS nodes is stored:
- In the path -
/incoming_data/pre_processed_network_data/ports/rail_od_cleaning/od_port_matches.xlsx
- As Excel file
- This was created manually by looking at the OD and GIS data, and inferring matches based on Google searches and our judgement
- In the path -
- Port specific freight data are stored:
- In the Excel file path -
/incoming_data/OD_data/ports/Puertos/Cargas No Containerizadas - SSPVNYMM.xlsx
- We use the excel sheet
2017
- Some information is necessary for OD matrix creation
Puerto
- String name of port where data is recordedPuerto de Procedencia
- String name of origin portPaís de Procedencia
- String name of origin countryFecha Entrada
- Datetime object for entrance date recorded at portPuerto de Destino
- String name of destination portPaís de Destino
- String name of destination countryProducto Corregido
- String name of commodity subgroupsRubro
- String name of commodity groupsTipo de Operación
- String name of operation type, associated to exports, imports, and transitTotal Tn
- Numeric values of tonnagesMedida
- String value of type of tonnages
- In the Excel file path -
- Port costs are stored:
- In the Excel file path -
incoming_data/costs/port/port_costs.xlsx
- We use the excel sheet
costs
- In the Excel file path -
Creating the air network and passenger data¶
Note
- The air network and passenger flow data are all created by executing 1 Python script:
- Run
atra.preprocess.network_air
to create the air network and passenger flows at node-node level
- Run
Air data | Source |
---|---|
Airport locations | https://ide.transporte.gob.ar/geoserver/web/ |
Passenger number - 2016 | Secretaría de Planificación de Cargas |
Airport GIS nodee data is downloaded from the portal https://ide.transporte.gob.ar/geoserver/web/. See the Python script atra.preprocess.scrape_wfs
- Air passenger OD data is contained in the airlines shapefile
- In the file -
/data/pre_processed_networks_data/air/SIAC2016pax.shp
- Some information is necessary for OD matrix creation
Cod_Orig
- String IATA code of origin airportCod_Destt
- String IATA code of destination airportPax_2016
- Numeric values of passenger numbers
- In the file -
Creating the multi-modal network edges¶
Note
- The multi-modal network edges are all created by executing 1 Python script:
- Run
atra.preprocess.multi_modal_network_creation
- Run
The multi-modal edges can only be created once all the other network are created. The code inputs the finalized road
, rail
and port
files in the data/network/
folder path
Industry specific province-level OD matrix¶
Note
- For macroeconomic analysis an industry specific province-level OD matrix is created by executing 1 Python script:
- Run
atra.preprocess.od_combine
- Run
The province OD matric can only be created once all the other OD matrices are created. The code inputs the finalized {mode}_province_annual_ods.csv
OD files in the data/OD_data/
folder path
Preparing Hazard Data¶
Note
- Convert GeoTiff raster hazard datasets to shapefiles based on flood depth thresholds
- Load data as described in Hazards data requirements
- Create hazard shapefiles with:
ID
- equal to 1geometry
- Polygon outline of selected hazard
- Store outputs in same paths in directory
/data/flood_data/FATHOM/