repository for geography work and other things completed by kufre u.
Throughout this winter term, I have run into problems running the COVID-19 Accessibility Juptyer Notebook in CyberGISX. After only recently being able to run the code preparing the data for the model, I ran into another roadblock: creating the catchment areas. Earlier this week, the cell where the catchments were calculated ran for two days with no sign of stopping. I spent part of this week working to improve the performance of this section of the notebook and began looking at how the catchments are calculated after receiving an insightful email from Professor Holler.
def calculate_catchment_area(G, nearest_osm, distance, distance_unit = "time"):
road_network = nx.ego_graph(G, nearest_osm, distance, distance=distance_unit)
nodes = [Point((data['x'], data['y'])) for node, data in road_network.nodes(data=True)]
polygon = gpd.GeoSeries(nodes).unary_union.convex_hull ## to create convex hull
polygon = gpd.GeoDataFrame(gpd.GeoSeries(polygon)) ## change polygon to geopandas
polygon = polygon.rename(columns={0:'geometry'}).set_geometry('geometry')
return polygon.copy(deep=True)
He found that the culprit for the slow running times in calculate_catchment_area
is its use of the function ego_graph
.
road_network = nx.ego_graph(G, nearest_osm, distance, distance=distance_unit)
With each iteration that calculate_catchment_area
is called and ego_graph
is used within it, a subset of an existing graph is created, which was rather time consuming.
Professor Holler proposed using single_source_dijkstra_path_length
and subgraph
as a replacement.
road_network = G.subgraph(nx.single_source_dijkstra_path_length(G, nearest_osm, distance, distance_unit))
First, single_source_dijkrtra
returns a dictionary of identifiers for the nodes that compose the shortest paths from a given node. This dictionary of node identifiers is then used in subgraph
to create a view of a larger graph, in our case the road network of Illinois, rather than creating a new graph with its own attributes as is done in ego_graph
. I’ve put together a short notebook that demonstrates the time saved when replacing ego_graph
with a combination of single_source_dijkstra_path_length
and subgraph
in calculate_catchment_area
and that the results are equivalent to when ego_graph
is used. For example, creating a catchment area using the latter, shown in the notebook as dijkstra_cca
, took 880 milliseconds while the original function (ego_cca
) took 4.67 seconds.