Day 1: Points

This is my first post for the 2019 30 Day Map Challenge. It's November 4th, so clearly I'm a little bit behind, but I hope to catch up throughout the rest of the month.

I've been contemplating what it is I want to get out of this exercise. When I was in school, I was often concerned with meeting the requirements of the assignment. It served me well in terms of grades, but when I look back at how I approached my formal education, I have some regret that I didn't allow myself to stray outside the lines a little bit more. I get the sense that this month is more intended as a cartographic exercise, but I am going to use the prompts for each day more generally as means for focusing my own exploration of new (to me at least) tools and techniques for working with spatial data. In other words, I'm not promising pretty or otherwise remarkable maps. The product of this for me will hopefully be growth and I will use this blog and a catalog of how I produced my daily maps.

So for the first daya of the 2019 30 Day Map Challenge, I'm going to be mapping GoRaleigh bus stops and bus shelters. Recently it was announced that Raleigh's local transit system would be dramtically expanding the number of bus shelters. As I read about this exciting development, I got to wondering about the distribution of bus shelters generally throughout Raleigh. As I was looking around the GoRaleigh's services on ArcGIS Online I found two datasets that I figured would help me interrogate this curiosity:

  1. GoRaleigh Shelters
  2. GoRaleigh Bus Stop

While on face, it might seem I should be able to get everything I need from the GoRaleigh Bus Stop data, the GoRaleigh Shelters data contains more detailed information about the type of shelter and whether it is existing or planned. So for this first challenge, my goal was to combine these datasets in such a way that I could make a map of GoRaleigh Bus Stops with detailed information about shelters.

I also wanted to try out a few different tools. Recordings from the 2019 NACIS Conference were recently released and I've been working my way through those. One presentation that caught my attention was by Mamata Akella from CARTO regarding their Python package, CARTOframes . I had tried out this package a couple years ago and thought it showed a lot of potential, but when CARTO stopped providing accounts priced for tinkering, I decided to focus on other tools. As I watched the presentation I was really impressed to see how far the package has come. But perhaps the biggest takeaway was that you don't need a CARTO account to use CARTOframes.

As I was thinking through how to approach today's challenge, I decided this would be a great chance to also work with CARTOframes. What follows below is my Python processing of the GoRaleigh shelters and bus stops data using a variety of Python packages and eventually visualized using CARTOframes.

Note: I started down the path of lots of narrative around the various elements of the analysis, but I'm already 4 days behind in the challenge so I making an editorial decision to just let the code do the talking. Apologies if anything is unclear. Hit me up on twitter (@maptastik) if you want to chat more about this notebook.

Libraries

This notebook was produced using Google Colaboratory , which is basically Jupyter Notebooks meets Google Docs. A nice thing about Colaboratory is each notebook includes a Python 3 environment loaded with lots of libraries commonly used for data science type work. Pretty nice! And while most of the libraries we need come with the Colaboratory environment, there are couple we'll need to install:

  1. CARTOframes (beta
  2. geopandas

Fortunately, we can use pip to install these libraries and their dependencies.

In [0]:
! pip install cartoframes==1.0b4 geopandas

With CARTOframes and geopandas installed we can get down to business. It's somewhat a matter of style in notebooks, but I like to import all the libraries and classes at the beginning. Some folks prefer to import them when they first use them within the flow of the notebook. I think there are merits to both approaches.

In [0]:
import requests
from io import BytesIO
import pandas as pd
import geopandas as gpd
from cartoframes.viz import Map, Layer, Popup, Legend, Layout, basemaps
from cartoframes.viz.widgets import category_widget, formula_widget
from cartoframes.viz.helpers import color_category_layer

Functions

Data loader from ArcGIS REST Service

I know I'm going to be pulling in some data from a couple ArcGIS REST services so to prevent myself from repeating the somewhat verbose procedure for querying those and dumping the results into a GeoDataFrame, I've put together a function that in one line will carry out that process.

In [0]:
def arcgis_rest_to_gdf(url, layer_id):
  url = f'{url}/{layer_id}/query'
  params = {
    'f': 'geojson',
    'where': '1=1',
    'outFields': '*',
    'outSR': 4326
  }
  r = requests.get(url, params = params)
  return gpd.read_file(BytesIO(r.content))

Loading and Examining the Data

Shelters

In [4]:
shelters_gdf = arcgis_rest_to_gdf("https://services.arcgis.com/v400IkDOw1ad7Yad/arcgis/rest/services/GoRaleigh_Shelters/FeatureServer", 0)
display(shelters_gdf.head(), shelters_gdf.info())
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 334 entries, 0 to 333
Data columns (total 9 columns):
OBJECTID           334 non-null int64
Stop_ID            334 non-null int64
Stop_Name          334 non-null object
GoTriangle_Stop    334 non-null int64
GoRaleigh_Stop     334 non-null int64
Wolfline_Stop      334 non-null int64
Shelter            334 non-null object
GlobalID           334 non-null object
geometry           334 non-null geometry
dtypes: geometry(1), int64(5), object(3)
memory usage: 23.6+ KB
OBJECTID Stop_ID Stop_Name GoTriangle_Stop GoRaleigh_Stop Wolfline_Stop Shelter GlobalID geometry
0 7 1811 White St at Elm Ave (Park-and-Ride) 0 1 0 Full 3ab998f8-ecdc-4d47-ae35-87a16d038fce POINT (-78.51097 35.97479)
1 17 1821 Allen Rd at Best St 0 1 0 Full 3e7f9690-0a16-4008-a214-4971109cadaa POINT (-78.50072 35.97871)
2 31 1835 Common Oaks Dr at Oliver Rd 0 1 0 Full 7e3f286a-889d-4117-89bf-5d9612a7ad54 POINT (-78.54659 35.94208)
3 46 8001 GoRaleigh Station 0 1 0 Station dc87fb45-4847-4eb6-a40f-4f8eb951c298 POINT (-78.63716 35.77752)
4 47 8002 Wilmington St at Morgan St 1 1 0 DT cfe81a45-a51c-4dc3-a8c8-70c4220e03a5 POINT (-78.63805 35.77994)
None
In [5]:
display(sorted(shelters_gdf["Shelter"].unique()))
['Attached',
 'Brick',
 'Custom',
 'DT',
 'ES - Planned',
 'Full',
 'Full2',
 'GR - Planned',
 'Mesh',
 'P',
 'R-Line',
 'Slim',
 'Slim2',
 'Station',
 'Stone']
In [6]:
color_category_layer(shelters_gdf, value = 'Shelter', title = 'Shelter Type', top = 5)
Out[6]:

Stops

In [7]:
stops_gdf = arcgis_rest_to_gdf("https://services.arcgis.com/v400IkDOw1ad7Yad/ArcGIS/rest/services/GoRaleigh_Stops/FeatureServer", 0)
display(stops_gdf.head(), stops_gdf.info())
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1642 entries, 0 to 1641
Data columns (total 33 columns):
FID           1642 non-null int64
Sequence      1642 non-null int64
StopId        1642 non-null int64
StopAbbr      1642 non-null object
StopName      1642 non-null object
NodeAbbr      1642 non-null object
StreetNum     1642 non-null int64
OnStreet      1642 non-null object
AtStreet      1642 non-null object
Unit          1642 non-null object
City          1642 non-null object
State         1642 non-null object
ZipCode       1642 non-null object
Lon           1642 non-null int64
Lat           1642 non-null int64
Bench         1642 non-null int64
Shelter       1642 non-null int64
Lighting      1642 non-null int64
Garbage       1642 non-null int64
Bicycle       1642 non-null int64
GpsLon        1642 non-null int64
GpsLat        1642 non-null int64
DATAStop      1642 non-null int64
CATStop       1642 non-null int64
TTAStop       1642 non-null int64
NodeId        1642 non-null int64
NodeAbbr_1    1642 non-null object
NodeName      1642 non-null object
LineAbbr      1642 non-null object
LineName      1642 non-null object
LineType      1642 non-null object
tpField00     1642 non-null int64
geometry      1642 non-null geometry
dtypes: geometry(1), int64(18), object(14)
memory usage: 423.5+ KB
FID Sequence StopId StopAbbr StopName NodeAbbr StreetNum OnStreet AtStreet Unit City State ZipCode Lon Lat Bench Shelter Lighting Garbage Bicycle GpsLon GpsLat DATAStop CATStop TTAStop NodeId NodeAbbr_1 NodeName LineAbbr LineName LineType tpField00 geometry
0 1 24 11379 9810 Pullen Rd at Cates Ave 0 -78666104 35781636 0 0 0 0 0 0 0 0 1 0 0 11L Buck Jones Connector 0 POINT (-78.66610 35.78164)
1 2 7 2009 1816 North Ave at Wingate St (SBTS) 0 NULL NULL NULL Wake Forest NC 27587 -78512090 35982030 0 0 0 0 0 0 0 0 1 0 0 WFL Wake Forest Loop 0 POINT (-78.51209 35.98203)
2 3 25 11202 9613 Dunn Ave at Jeter Dr 0 RALEIGH 27607 -78667850 35783800 1 0 0 0 0 0 0 0 1 0 0 11L Buck Jones Connector 0 POINT (-78.66785 35.78380)
3 4 8 11660 1910 N Main St at W Oak Ave 0 -78505355 35988820 0 0 0 0 0 0 0 0 0 0 0 WFL Wake Forest Loop 0 POINT (-78.50535 35.98882)
4 5 26 10422 8525 Morrill Dr at Cates Ave (Outbound) MRLCTS O 0 MORRILL DR PRIVATE RD RALEIGH NC -78671192 35782617 0 0 0 0 0 0 0 0 1 0 290 Morrill Dr at Cates Ave OB 11L Buck Jones Connector 0 POINT (-78.67119 35.78262)
None
In [8]:
Map(Layer(stops_gdf))
Out[8]:

Combine datasets

In [9]:
shelters_reduced_gdf = shelters_gdf[["Stop_ID", "Stop_Name", "Shelter", "geometry"]]
shelters_reduced_gdf["Stop_ID"] = shelters_reduced_gdf.apply(lambda x: str(x["Stop_ID"]), axis = 1)
shelters_reduced_gdf["Status"] = shelters_reduced_gdf.apply(lambda x: "Planned" if "Planned" in x["Shelter"] else "Existing", axis = 1)
shelters_reduced_gdf["Shelter"] = shelters_reduced_gdf.apply(lambda x: x["Shelter"].split(' - ')[0], axis = 1)

display(shelters_reduced_gdf.head(), shelters_reduced_gdf.info())
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 334 entries, 0 to 333
Data columns (total 5 columns):
Stop_ID      334 non-null object
Stop_Name    334 non-null object
Shelter      334 non-null object
geometry     334 non-null geometry
Status       334 non-null object
dtypes: geometry(1), object(4)
memory usage: 13.2+ KB
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
Stop_ID Stop_Name Shelter geometry Status
0 1811 White St at Elm Ave (Park-and-Ride) Full POINT (-78.51097 35.97479) Existing
1 1821 Allen Rd at Best St Full POINT (-78.50072 35.97871) Existing
2 1835 Common Oaks Dr at Oliver Rd Full POINT (-78.54659 35.94208) Existing
3 8001 GoRaleigh Station Station POINT (-78.63716 35.77752) Existing
4 8002 Wilmington St at Morgan St DT POINT (-78.63805 35.77994) Existing
None
In [10]:
stops_reduced_gdf = stops_gdf[["StopAbbr", "StopName", "geometry"]]
stops_reduced_gdf = stops_reduced_gdf.groupby("StopAbbr").first().reset_index()[["StopAbbr", "StopName", "geometry"]]
display(stops_reduced_gdf.info(), stops_reduced_gdf.head())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1308 entries, 0 to 1307
Data columns (total 3 columns):
StopAbbr    1308 non-null object
StopName    1308 non-null object
geometry    1308 non-null geometry
dtypes: geometry(1), object(2)
memory usage: 30.8+ KB
None
StopAbbr StopName geometry
0 1201 Hillsborough St at Snow Ave POINT (-78.65174 35.78093)
1 1284 Wilmington St at Cabarrus St POINT (-78.63828 35.77433)
2 1293 Dawson St at Martin St POINT (-78.64359 35.77738)
3 1528 Hammond Rd at Rush St POINT (-78.64079 35.74228)
4 1530 Hammond Rd at Rush St POINT (-78.64050 35.74256)
In [11]:
stops_shelters_gdf = stops_reduced_gdf.merge(shelters_reduced_gdf, how = 'outer', left_on = 'StopAbbr', right_on = 'Stop_ID', suffixes = ('', '_shelters'), sort = True)

# Clean up some of the fields and pivot Status field for use with formula widget
stops_shelters_gdf['StopAbbr'].fillna('0', inplace = True)
stops_shelters_gdf["StopName"].fillna("Unnamed Stop", inplace = True)
stops_shelters_gdf["Shelter"].fillna("No Shelter", inplace = True)
stops_shelters_gdf["Status"].fillna('No Shelter Planned', inplace = True)
stops_shelters_gdf["Existing"] = stops_shelters_gdf.apply(lambda x: 1 if x["Status"] == "Existing" else 0, axis = 1)
stops_shelters_gdf["Planned"] = stops_shelters_gdf.apply(lambda x: 1 if x["Status"] == "Planned" else 0, axis = 1)
stops_shelters_gdf["No_Shelter_Planned"] = stops_shelters_gdf.apply(lambda x: 1 if x["Status"] == "No Shelter Planned" else 0, axis = 1)
stops_shelters_gdf["geometry"] = stops_shelters_gdf.apply(lambda x: x["geometry_shelters"] if x["geometry"] is None else x["geometry"], axis = 1)

stops_shelters_gdf = stops_shelters_gdf[["StopAbbr", "StopName", "Shelter", "Status", "Existing", "Planned", "No_Shelter_Planned", "geometry"]]
stops_shelters_gdf = gpd.GeoDataFrame(stops_shelters_gdf, crs = {"init":"epsg:4326"}, geometry = "geometry")
stops_shelters_gdf.head()
Out[11]:
StopAbbr StopName Shelter Status Existing Planned No_Shelter_Planned geometry
0 0 Unnamed Stop ES Planned 0 1 0 POINT (-78.69470 35.76832)
1 0 Unnamed Stop ES Planned 0 1 0 POINT (-78.69848 35.76683)
2 1201 Hillsborough St at Snow Ave No Shelter No Shelter Planned 0 0 1 POINT (-78.65174 35.78093)
3 1284 Wilmington St at Cabarrus St No Shelter No Shelter Planned 0 0 1 POINT (-78.63828 35.77433)
4 1293 Dawson St at Martin St No Shelter No Shelter Planned 0 0 1 POINT (-78.64359 35.77738)

Final Map

In [12]:
Map(
    Layer(
        stops_shelters_gdf,
        '''
        color: ramp(buckets($Status, ["Existing", "Planned", "No Shelter Planned"]), [#4CAF50, #FFC107, #B0BEC533]),
        width: 5
        ''',
        legend = Legend(
            'color-category',
            title = "GoRaleigh Status of Bus Stop Shelters",
            footer = "Data: GoRaleigh, City of Raleigh"
        ),
        popup = Popup({
           'click': [{
                'title': 'Stop',
                'value':'$StopName'
               }, {
                'title': 'Shelter',
                'value': '$Shelter'
               }, {
                'title': 'Shelter Status',
                'value': '$Status'
               }
            ] 
        }),
        widgets = [
          formula_widget(
              'Existing',
              'sum',
              title = "Existing Bus Stop Shelters"
          ),
          formula_widget(
              'Planned',
              'sum',
              title = "Planned Bus Stop Shelters"
          ),
          formula_widget(
              'No_Shelter_Planned',
              'sum',
              title = "No Planned Bus Stop Shelters"
          ),
          category_widget(
              'Status',
              title = "Shelter Status",
              description = "Click to filter by shelter status"
          ),
          category_widget(
              'Shelter',
              title = "Shelter Type",
              description = "Click to filter by shelter Type"
          )
        ]
    )
)
Out[12]:
In [0]:
stops_shelters_gdf.to_file("goraleigh_stops_shelters.geojson", driver = "GeoJSON")