Purpose

The Voting Location and Outreach Tool is designed to help Hawaii county election officials identify optimal sites for potential voting locations, as well as provide visualization of demographic and voter data at the community level.

Please see the Voting Location and Outreach Tool's current methodology below. For more details, see CID's contact page.

Voting Location Modeling Methodology

The CID acknowledges that there are many factors that go into the decision-making process for site selection. Therefore, due to the limitations of data and the need to incorporate local on-the-ground knowledge, this tool does not identify exact sites to be used.

Suitable Areas

We identified all areas that were potentially suitable for hosting a site. We created a grid made up of 0.5 mile cells covering the entire county, where each cell is a potential area to host a site. These “suitable areas” were determined using a combined approach for road density and “points of interest” density. A suitable area needed to contain either a sufficient density of roads or at least two points of interest, while also avoiding bodies of water. This means that suitable areas are areas that have some concentration of activity and therefore are more likely to have buildings and infrastructure. Note that we are not suggesting that all points of interest are suitable sites to host a voting location, but that the presence of a point of interest suggests a general concentration of infrastructure.

Points of interest were sourced from OpenStreetMap and were defined by the research team to be both governmental and non-governmental buildings (shown as two different layers in the tool) that could serve as potential voting locations. As crowdsourced data, the accuracy and coverage of OpenStreetMap data is dependent on user input and varies across geographies.

Facility Location

To generate suggested locations based on the number of sites each county has publicly reported, we used a k-means model and a facility location model.

First we conducted a k-means cluster analysis to aggregate populated census block sites into a smaller number of computationally manageable geographic clusters. Next we estimated travel time from every census block cluster to every potential area for a voting location (defined by the “suitable areas” grid).

We then used a facility location model to determine optimal locations. The inputs our facility location model used were: the cost of site creation (which included the area score), the travel time to neighboring sites, and the estimated capacity of the site.

The facility location model selected for both the highest gain (most voters served), and the lowest penalty (high travel time and low weighted scores). These scores were generated by the indicators listed in the Data section below, and weighted based on importance. The total score was defined as the sum of individual scores across indicators, multiplied by the weights. The facility location model prefers to locate potential voting areas on sites that are near a high number of voters and/or are near a site with a high score.

The model first selects a certain number of Election Day sites. These sites become fixed points when running the model to include additional sites. The number of final locations is defined by Hawaii law.

Scoring Model

The data were normalized and combined in a weighted average. The weighting schema is described below in the section Variable Weights. A higher score indicates that there were multiple priority characteristics, whereas a medium score indicates that this area has some priority characteristics, but not all. For example, an area with a higher rate of eligible non-registered voter population, a higher percent disabled population, and a higher percent limited English proficient population would receive a score that is higher than an area with a lower rate of eligible non-registered voter population, lower percent disabled population, and lower percent limited English proficient population.

Data

For more information on the data source and calculation of these variables see below.

Percent of County Voting Age Citizens: The number of citizens in this tract who are voting age, divided by the county's total number of voting age citizens.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020).
  • Calculation: The percent of the population in the tract that is eligible to vote (voting age citizen) out of the total county citizen population.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to communities of voting age citizens.
  • Limitations: Estimates can have high margins of error for small samples.

Percent of County Workers: The percent of all in-county workers who work in the tracts. Query only records for people who work and live in the same county. Summarize the total number of in-county workers per block, and divide by the total number of in-county workers in the county.

  • Data sources: Census LEHD Origin-Destination Employment Statistics (LODES) 2019, workplace area characteristics.
  • Calculation: The percent of the population in the tract that work in the county out of the total number of workers that work in that county.
  • Scale: Census Block - Note that although this variable is displayed on the web map at the census tract level, the model input was at the census block level.
  • Rationale: Voting location/drop off should be proximal to where people work.
  • Note: Although the model input was at the census block level, this variable is displayed on the web map at the census tract level by aggregating the proportion of workers in a block to the census tract level.

Percent of Eligible Voters Not Registered: The percentage of voting age citizens who are not registered to vote.

  • Data sources: Catalist Voter Registration Data (2020 General Election); American Community Survey 5-Year Estimate (2016-2020); Census 2020.
  • Calculation: Convert 2020 voter data from the precinct to the tract level. Subtract the number of registered voters per tract from the citizen voting age population (CVAP). Divide by the total CVAP estimate in the tract. Where the incarcerated population is over 25% of the CVAP, use the 2016-2020 ACS 5-year estimate for non-institutionalized populations instead of CVAP.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to communities of eligible voters who aren’t registered to vote.
  • Limitations: Imperfect conversion from precinct to tract level. The 2016-2020 ACS data is the most recent estimate for non-institutionalized populations at the tract level. Data is unavailable for some tracts.

Percent of Population with Vehicle Access: The percentage of the population with access to a vehicle.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B25044.
  • Calculation: The percent of households with access to at least one vehicle available. The direction of this variable is inverted before it enters the model score, so that areas with a high percentage of car access receive lower priority for siting.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to communities with low rates of household vehicle ownership.
  • Limitations: Estimates can have high margins of error for small samples.

Percent of Population in Poverty: The percentage of the population with income below the poverty level.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B17001.
  • Calculation: The percent of residents living below poverty in a census tract
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to low-income communities.
  • Limitations: Estimates can have high margins of error for small samples.

Population Density: The total population density per square kilometer.

  • Data sources: Census 2020.
  • Calculation: Divide the total block population by the area of the block (square kilometers).
  • Scale: Census Block - Note that although this variable is displayed on the web map at the census tract level, the model input was at the census block level.
  • Rationale: Voting location/drop off should be proximal to population centers.
  • Limitations: Data shows where people live, not where they work.
  • Note: Although the model input was at the census block level, this variable is displayed on the web map at the census tract level by aggregating the proportion of total precinct voters in a block to the census tract level.

2020 Percent of County In-Person Voters: The number of voters who voted at a voting location in a census tract divided by the total number of voters who voted at a voting location in the county.

  • Data sources: Catalist Voter Registration Data (2020 General Election).
  • Calculation: Convert from precinct to census block level, calculate the percent of voters in a block who voted at a voting location out of the county total. Divide the number of people who voted at a voting location in the 2020 General Election in the block by the total number of people who voted at a voting location in the 2020 General Election in the county.
  • Scale: Census Block - Note that although this variable is displayed on the web map at the census tract level, the model input was at the census block level.
  • Rationale: Voting location/drop off should be proximal to communities with historically low voting location usage.
  • Limitations: Imperfect conversion from precinct to tract level.
  • Note: Although the model input was at the census block level, this variable is displayed on the web map at the census tract level by aggregating the proportion of total precinct voters in a block to the census tract level.

2020 Vote by Mail Rate: The percentage of voters who voted by mail out of total voters.

  • Data sources: Catalist Voter Registration Data (2020 General Election).
  • Calculation: Convert from precinct to block level, calculate as percent of total vote. Calculate the VBM rate for the total vote by dividing the number of voters who voted by mail by the total number of voters who voted. The direction of this variable is inverted before it enters the model score, so that areas with a high VBM rate receive lower priority for siting.
  • Scale: Census Block - Note that although this variable is displayed on the web map at the census tract level, the model input was at the census block level.
  • Rationale: Voting location/drop off should be proximal to communities with low VBM usage.
  • Limitations: Imperfect conversion from precinct to block level.
  • Note: Although the model input was at the census block level, this variable is displayed on the web map at the census tract level by aggregating the proportion of total precinct voters in a block to the census tract level.

2016 Vote by Mail Rate (Total): The percentage of voters who voted by mail out of total voters in the 2016 General Election.

  • Data sources: Catalist Voter Registration Data (2016 General Election).
  • Calculation: Convert from precinct to block level, calculate as percent of total vote. Calculate the VBM rate for the total vote by dividing the number of voters who voted by mail by the total number of voters who voted.
  • Scale: Census Tract.
  • Limitations: Imperfect conversion from precinct to block level.
  • Note: This data is included as contextual voter information only, it was not included in the model or in the scoring of the potential areas. This variable is displayed on the web map at the census tract level by aggregating the proportion of total precinct voters in a block to the census tract level.

2020 Registered Voter Turnout Rate (Total): The percentage of total registered voters who voted in the 2020 General Election.

  • Data sources: Catalist Voter Registration Data (2020 General Election).
  • Calculation: Convert voting data from precinct to block level. Calculate the turnout rate for the total vote by dividing the total number of voters who voted by the total number of registered voters.
  • Scale: Census Tract.
  • Limitations: Imperfect conversion from precinct to block level.
  • Note: This data is included as contextual voter information only, it was not included in the model or in the scoring of the potential areas. This variable is displayed on the web map at the census tract level by aggregating the proportion of total precinct voters in a block to the census tract level.

2016 Registered Voter Turnout Rate (Total): The percentage of total registered voters who voted in the 2016 General Election.

  • Data sources: Catalist Voter Registration Data (2016 General Election).
  • Calculation: Convert voting data from precinct to block level. Calculate the turnout rate for the total vote by dividing the total number of voters who voted by the total number of registered voters.
  • Scale: Census Tract.
  • Limitations: Imperfect conversion from precinct to block level.
  • Note: This data is included as contextual voter information only, it was not included in the model or in the scoring of the potential areas. This variable is displayed on the web map at the census tract level by aggregating the proportion of total precinct voters in a block to the census tract level.

Geographically Isolated Community:

  • Data sources: Model
  • Calculation: The model accounts for geographically isolated communities encouraging dispersion of sites. The additional suggested areas based on distance account for any remaining communities that have a greater travel time to a suggested site.
  • Scale: N/A
  • Rationale: Voting locations should be proximal to geographically isolated communities.
  • Limitations: There is no clear definition or data source for geographically isolated communities.

Travel Time By Car:

  • Data sources: OpenStreetMap
  • Calculation: Use k-means clustering to create groups of computationally-manageable census blocks. Calculate travel time from each group of blocks to each potential siting area using road network analysis. The time is estimated for travel by vehicle, assuming standard rates of travel. This data does not go into the score, but is used to locate sites throughout the county.
  • Scale: N/A
  • Limitations: Assessment of how travel time is affected by traffic is not included.

Black Percent of Population: The percentage of the population that is Black alone, not Hispanic or Latino.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03002.
  • Calculation: The percent of residents that are Black alone, not Hispanic or Latino in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Asian-American Percent of Population: The percentage of the population that is Asian-American alone, not Hispanic or Latino.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03002.
  • Calculation: The percent of residents that are Asian-American alone, not Hispanic or Latino in a tract out of the total tract population. The categories for Asian-American alone and Native Hawaiian and Other Pacific Islander alone were summed in order to get a total Asian-American population total.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Asian Indian Percent of Population: The percentage of the population that is Asian Indian.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B02016.
  • Calculation: The percent of residents that are Asian Indian in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Chinese Percent of Population: The percentage of the population that is Chinese.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B02016.
  • Calculation: The percent of residents that are Chinese in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Filipino Percent of Population: The percentage of the population that is Filipino.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B02016.
  • Calculation: The percent of residents that are Filipino in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Japanese Percent of Population: The percentage of the population that is Japanese.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B02016.
  • Calculation: The percent of residents that are Japanese in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Korean Percent of Population: The percentage of the population that is Korean.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B02016.
  • Calculation: The percent of residents that are Korean in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Vietnamese Percent of Population: The percentage of the population that is Vietnamese.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B02016.
  • Calculation: The percent of residents that are Vietnamese in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Latino Percent of Population: The percentage of the population that is Hispanic or Latino.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03002.
  • Calculation: The percent of residents that are Hispanic or Latino in a tract out of the total tract population.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to communities with historically low VBM. CID research finds Latino voters have lower VBM use.
  • Limitations: Estimates can have high margins of error for small samples.

Mexican Percent of Population: The percentage of the population that is Mexican.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03001.
  • Calculation: The percent of residents that are Mexican in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Puerto Rican Percent of Population: The percentage of the population that is Puerto Rican.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03001.
  • Calculation: The percent of residents that are Puerto Rican in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Cuban Percent of Population: The percentage of the population that is Cuban.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03001.
  • Calculation: The percent of residents that are Cuban in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Dominican Percent of Population: The percentage of the population that is Dominican.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03001.
  • Calculation: The percent of residents that are Cuban in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

South American Percent of Population: The percentage of the population that is South American.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03001.
  • Calculation: The percent of residents that are South American in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Central American Percent of Population: The percentage of the population that is Central American.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03001.
  • Calculation: The percent of residents that are Central American in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Native American Percent of Population: The percentage of the population that is Native American.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03002.
  • Calculation: The percent of residents that are Native American in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.

Native Hawaiian Percent of Population: The percentage of the population that is Native Hawaiian.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), B03002_001.
  • Calculation: The percent of residents that are Native Hawaiian in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.

White Percent of Population: The percentage of the population that is White alone, not Hispanic or Latino.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B03002.
  • Calculation: The percent of residents that are White alone, not Hispanic or Latino in a tract out of the total tract population.
  • Scale: Census Tract
  • Limitations: Estimates can have high margins of error for small samples.
  • Note: This data is included as contextual population information only, it was not included in the model or in the scoring of the potential areas.

Youth Percent of Population: The percentage of the population between the age of 18 and 24 years old.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B01001.
  • Calculation: The percent of residents between the ages of 18 to 24 years in a tract out of the total tract population.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to communities with historically low VBM usage. CID research finds youth have lower VBM use.
  • Limitations: Estimates can have high margins of error for small samples.

Disabilities Percent of Population: The percentage of the population that is disabled.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B23024.
  • Calculation: The percent of residents with disabilities in a census tract out of the total population in the census tract.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to voters with disabilities.
  • Limitations: Data shows disabled population, not voters.

Limited English Proficient Percent of Population: The percentage of the population that has limited English proficiency.

  • Data sources: American Community Survey 5-Year Estimate (2016-2020), Table B16001.
  • Calculation: The percent of the population with limited English proficiency in a census tract. Limited English proficiency is defined as people who speak English “less than very well”.
  • Scale: Census Tract
  • Rationale: Voting location/drop off should be proximal to language minority communities.
  • Limitations: Estimates can have high margins of error for small samples.

Transit Stops: Transit points indicate the location of transit stops in the county and are sourced from regional General Transit Feeds (GTFS). Where GTFS data is missing local transit agency data is used instead.

  • Data sources: GTFS and local transit agencies. GTFS data is current as of when data was pulled by CID on June 8, 2022 and may not reflect the most up-to-date information.
  • Calculation: Frequency of service to each transit stop was normalized to a range of 1-4 that indicates low to high service.
  • Scale: Point
  • Rationale: Voting location/drop off should be proximal to public transportation.
  • Limitations: Assessment of quality will be generally based on published timetables that are subject to change. Some transit stops do not have published stop frequency information. These are retained on the map for visual purposes, but excluded from the analysis.

OpenStreetMap Points of Interest: A collection of crowdsourced buildings provided by OpenStreetMap.

  • Data sources: OpenStreetMap. OpenStreetMap data is current as of when data was pulled by CID on June 8, 2022 and may not reflect the most up-to-date information.
  • Calculation: N/A
  • Scale: Point
  • Rationale: The presence of an OpenStreetMap point of interest suggests a general concentration of infrastructure and may be used to identify potentially suitable voting site areas.
  • Limitations: OpenStreetMap is a crowdsourced database. While the organization does some validating of data, the extent and quality of a county’s coverage depends on the user.

Voting Locations: Locations previously used as a voting location, which may include 2020 General Election and 2020 Presidential Primary voting locations.

  • Data sources: Voting Information Project
  • Calculation: N/A
  • Scale: Point
  • Rationale: The presence of a prior voting location may be used to identify potentially suitable voting site areas.
  • Limitations: The accuracy of the prior voting locations used in the Tool is subject to the integrity of the data reported by counties to the Voting Information Project.

2020 General Election Precinct Boundaries: 2020 General Election precincts for the state of Hawaii.

  • Data sources: Harvard Dataverse Voting and Election Science Team.
  • Calculation: N/A
  • Scale: Lines

Variable Weights

Variables were weighted equally with the exception of several variables that received additional weight due to being flagged as high priority. The variables that received higher weighting were voters with disabilities, voters with limited English proficiency, areas with high populations of eligible non-registered voters, areas with low VBM use for Latino and Youth voters, and areas with relatively high worker density. The variables that received the highest weight were areas with low total VBM use, areas close to public transit stops, and areas with relatively high population density.

The same weighting system was used for modeling optimal ballot drop box areas, with the exception of areas with eligible non-registered voters, which received no extra weight.

Assessing Gaps in Voting Location Coverage

After the model selects optimal areas for voting locations, we identify areas for additional siting coverage either by geographic distance or by additional need. These areas might be considered service gaps. We are interested in seeing where additional voting locations could be placed to minimize the number of people that are more than 20 minutes travel time distance from a voting location, or to minimize the overall travel cost of all voters. The number of additional suggested areas is 10% of the total minimum required facilities.

Reliability of Estimates

Some data published by the American Community Survey (ACS) rely on small sample sizes, meaning that the resulting estimates can have a high degree of uncertainty. The coefficient of variation (CV) was calculated for every ACS-based variable, where the CV is equal to the standard error (associated with 90% confidence interval) divided by the estimate (see ACS documentation, Appendix 1). Tract estimates with a CV over 40% were considered to have a high degree of uncertainty and are flagged visually on the web map with cross-hatching.

Water Boundaries

The shapefiles used for both modeling and web mapping are clipped to coastlines using the Census Bureau's cartographic boundary files in places where the county or census tract boundary extends over the ocean or large lakes. Census tracts that are entirely made up of water are also excluded. Please note that cartographic boundaries do not necessarily match base map layers, particularly in tidal zones and areas with wetlands.

Voting Location Areas

Due to the limitations of data and the need to incorporate local on-the-ground knowledge, this tool does not identify exact sites to be used. Instead, voting locations are suggested within 0.5 square mile areas, which in some cases may not visually align with the census block data used to determine that area's siting score due to differences in scale.