kufreu.github.io

repository for geography work and other things completed by kufre u.

View My GitHub Profile

comparing the method and data section in kang et al. to the provided code

the methods

It can be seen when comparing the workflow in Figure 4 in Kang et al. for the enhanced two-step floating catchment area (E2SFCA) to the code provided in the COVID-19 Accessibility Juptyer Notebook that the code follows the workflow closely. I’ll break up the pseudocode into its three main loops and have the equivalent Python code from the notebook to accompany it and compare it to.

finding the nearest node to the hospital:

pseudocode:

for hospital in hospitals do
  Calculate node n in road_network closest to hospital

Python:

def hospital_setting(hospitals, G):
    hospitals['nearest_osm']=None
    for i in tqdm(hospitals.index, desc="Find the nearest osm from hospitals", position=0):
        hospitals['nearest_osm'][i] = ox.get_nearest_node(G, [hospitals['Y'][i], hospitals['X'][i]], method='euclidean') # find the nearest node from hospital location
    print ('hospital setting is done')
    return(hospitals)

creating catchment areas:

pseudocode:

catchment <-- []
for hospital in hospitals do
  for driving_time in [10,20,30] do
    g <-- ego-centric graph around hospital within driving_time
    catchment <-- calculate convex hull around nodes in g
    catchment.population <-- 0
    for centroid in population_data do
      if centroid in population_data do
        catchment.population += centroid.population x weights[driving_time]
        catchment.time <-- driving_time

Python: for driving_time in [10,20,30] do

# get ego-centric graph around hospital around nodes in g and calculate convex hull around nodes in g
def calculate_catchment_area(G, nearest_osm, distance, distance_unit = "time"):
    road_network = nx.ego_graph(G, nearest_osm, distance, distance=distance_unit)
    nodes = [Point((data['x'], data['y'])) for node, data in road_network.nodes(data=True)]
    polygon = gpd.GeoSeries(nodes).unary_union.convex_hull ## to create convex hull
    polygon = gpd.GeoDataFrame(gpd.GeoSeries(polygon)) ## change polygon to geopandas
    polygon = polygon.rename(columns={0:'geometry'}).set_geometry('geometry')
    return polygon.copy(deep=True)

Python: for centroid in population_data do

def hospital_measure_acc (_thread_id, hospital, pop_data, distances, weights):
    ##distance weight = 1, 0.68, 0.22
    polygons = []
    for distance in distances:
        polygons.append(calculate_catchment_area(G, hospital['nearest_osm'],distance))
    for i in range(1, len(distances)):
        polygons[i] = gpd.overlay(polygons[i], polygons[i-1], how="difference")

    num_pops = []
    #### for centroid in population_data do ####
    for j in pop_data.index:
        point = pop_data['geometry'][j]
        for k in range(len(polygons)):
            if len(polygons[i]) > 0: # to exclude the weirdo (convex hull is not polygon)
            #### if centroid in population_data do ####
                if (point.within(polygons[k].iloc[0]["geometry"])):
                    num_pops.append(pop_data['pop'][j]*weights[k])  
    total_pop = sum(num_pops)
    for i in range(len(distances)):
        polygons[i]['time']=distances[i]
        polygons[i]['total_pop']=total_pop
        polygons[i]['hospital_icu_beds'] = float(hospital['Adult ICU'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
        polygons[i]['hospital_vents'] = float(hospital['Total Vent'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
        polygons[i].crs = { 'init' : 'epsg:4326'}
        polygons[i] = polygons[i].to_crs({'init':'epsg:32616'})
    print('\rCatchment for hospital {:4.0f} complete'.format(_thread_id), end="")
    return(_thread_id, [ polygon.copy(deep=True) for polygon in polygons ])

overlapping hexagons with catchment areas

pseudocode:

result { } // dictionary of hexagon IDs to accessibility
for catchment in catchments do
  for hexagon in hexagons do
    overlap area(intesect(hexagon,catchment))/area(catchment)
    if overlap ≥ 0.5 then
      result[hexagon.id]+= catchment.population X weights[catchment.time]

Python: for one grid cell

from collections import Counter
def overlap_calc(_id, poly, grid_file, weight, service_type):
  # calculates and aggregates accessibility statistics for one catchment on grid file
    value_dict = Counter()
    if type(poly.iloc[0][service_type])!=type(None):           
        value = float(poly[service_type])*weight
        intersect = gpd.overlay(grid_file, poly, how='intersection')
        intersect['overlapped']= intersect.area
        intersect['percent'] = intersect['overlapped']/intersect['area']
        intersect=intersect[intersect['percent']>=0.5]
        intersect_region = intersect['id']
        for intersect_id in intersect_region:
            try:
                value_dict[intersect_id] +=value
            except:
                value_dict[intersect_id] = value
    return(_id, value_dict)

def overlap_calc_unpacker(args):
    return overlap_calc(*args)

Python : for all grid cells, uses overlap_calc/overlap_calc_unpacker

def overlapping_function (grid_file, catchments, service_type, weights, num_proc = 4):
    grid_file[service_type]=0
    pool = mp.Pool(processes = num_proc)
    acc_list = []
    for i in range(len(catchments)):
        acc_list.extend([ catchments[i][j:j+1] for j in range(len(catchments[i])) ])
    acc_weights = []
    for i in range(len(catchments)):
        acc_weights.extend( [weights[i]]*len(catchments[i]) )
    results = pool.map(overlap_calc_unpacker, zip(range(len(acc_list)), acc_list, itertools.repeat(grid_file), acc_weights, itertools.repeat(service_type)))
    pool.close()
    results.sort()
    results = [ r[1] for r in results ]
    service_values = results[0]
    for result in results[1:]:
        service_values+=result
    for intersect_id, value in service_values.items():
        grid_file.loc[grid_file['id']==intersect_id, service_type] += value
    return(grid_file)

the data and visualizations

Other than the image in Figure 3 that illustrates the creation of catchment areas shared, the research paper shares none of its figures with the Jupyter Notebook associated with the research. The notebook produces none of the maps or graphs present in the paper. And other than the network which is downloaded using OSMnx and prepared for analysis within the notebook, data collection and processing is mostly absent from the Jupyter Notebook. For example, it was mentioned in the paper that the 2018 American Community Survey 5-year detail table for Illinois’ census tracts was obtained through an API, though neither this process nor the processing of this data was shown in the notebook. It was also noted in the paper that certain types of hospitals were excluded from the analysis but the filtering of the hospital dataset was not shown.