Analyzing Violence in South Seattle

The corner of 37th & Oregon is about a 15 minute walk from my house in Columbia City. Last week it was a candidate for the ignominious title of most dangerous place in America. On July 9th, a person was fatally shot there:

5 days later on July 14th, a second person was fatally shot visiting the above shrine for the first victim. Now there are two shrines less than a block apart (and two police cars parked 24×7):

This is awful. No one wants people dying on any of Seattle’s streets and, in our neighborhood, there’s a perception that we’re sliding backwards in terms of safety. I wanted to dig into the data and see if there is anything to back up the perception that the neighborhood feels substantially more dangerous than before.

The City of Seattle maintains an Open Data portal and I’ve used it to get both crime data (in the form of police 911 calls) and neighborhood boundaries.

The data contains all reports of “shots fired” and I wanted to see if it’s up for 2017 in my neighborhood, Columbia City. We’re likely going to be ahead in July (data only for 1/2 the month) but YTD the numbers are not materially different:

Homicides, however, are up. July’s two murders are that spike below:

We had no murders in 2016. Moreover, 2017’s homicides are different than the one we had in 2015. That one was a shooting within a family. According to the police, the two this year are gang-related.

Moreover, drive-by shootings in Columbia City are way up – 11 YTD vs. 3 in 2016:

Equally concerning is that these shootings are more brazen in timing. These shootings are increasingly occurring during the day (and consequently increasing the likelihood of innocent bystanders getting hit).

For folks in Columbia City, the location of these shootings is also increasingly concerning. The map below shows 2017 YTD drive-by shootings in South Seattle (drive-by shootings in Seattle are exclusively a Central District/South Seattle phenomenon). The three dots below the words “Columbia City” represent three shootings on suburban streets near Columbia City’s main pedestrian thoroughfare:

These are the first shootings in the Angeline to Dawson corridor. There were none there in 2016:

Or 2015:

The goal for South Seattle is zero gang-related shootings in any of our neighborhoods. The data above suggests that gangs are getting more brazen in both the frequency, location and hours of their attacks. I look forward to seeing what the SPD is going to do to combat it and need to figure out what I can do to help make all of South Seattle safe.

Update: created a repo with some of the raw data plus how to do your own analysis.

An Open Letter to Mayor Bloomberg & the Future Head of the DOITT

I was part of a team that recently submitted an app for the NYC BigApps contest and therefore had the opportunity to play with a lot of the data sets in the NYC Data Mine.  In the spirit of continuous improvement, I wanted to suggest how the city can take a great resource and make it even better.

After combining nine data sets (and being unable to combine many others), we feel that we can talk to some of the challenges in working with the data and the overall opportunity to improve it.  Our main suggestion is that the City should think of itself as a platform that provides standardized data and end users like ourselves add value by bringing together the different data sets and 3rd party services.  We can’t claim to have created the “government as a platform” idea, but we can tell you some of the issues New York City faces if this is the path you (hopefully) take.

With that, here are a few suggestions:

  1. Make all your data machine readable.  Some of your data is incredibly rich, but buried in formats that are hard to parse.  The best example is probably the community district demographic info: there are over 100 spreadsheet pages of rich data there, but it’s incredibly difficult to use in CSV form.  Ideally, this would all simply be XML and it could be reused in seconds.
  2. Open source formats please.  Some of your data is locked in proprietary formats, accessible only to users with multi-thousand dollar software packages (a great example are the gdb files for ArcView shape data).  If you want random citizens like us to mash up your data, please provide the data in exclusively open source formats as our software budget is $0.  (It looks like this is changing – I noticed that some of the gdb files at the Data Mine now include shape files as well.  Kudos)
  3. Add lat/longs (the commercial standard for mapping services) to every address/point.  Some of your address data consists just of addresses without lat/longs.  Some of these (e.g., certain school locations) are improperly formatted, so we can’t use commercial systems like the Yahoo Geocoding API to calculate lat/longs and put them on a map or assign them to a district/neighborhood.  Other files contain lat/longs but they’re in a different coordinate system: this means that developers need to convert them into lat/longs in order to map them – and this process introduces errors and takes unnecessary time.
  4. More documentation.  Some of your files contain lots of data but lack the documentation necessary to make them usable.  The best example are the health info shape files: they’re full of statistical data for different areas, but there’s no way to figure out exactly what each of the statistical values means.  Similarly, the shape files for each community district contains their area – but there’s no unit of measurement (it’s something called ‘internal units squared’).
  5. More data.  You gave us some great data to play with, but there’s still a ton more.  What about Compstat for crime data?  School scores and their trends over time?  The location of different alarm boxes so we can see how response times vary by area?

These suggestions are meant entirely as an opportunity to build off of a great start.  We – the people of New York – want to work with the city to help make it a better place.  However, the value we add is in creating tools and finding hidden relationships in the data – not in standardizing it; that’s where you can make the system work.

If you provide us with well-formed, machine-readable, standardized data, we’ll help build the services that citizens need and free up the city’s resources to focus where they’re needed most.  We’ll also find new relationships in the data that might help you rethink policy initiatives (see how your data suggests an interesting link between ‘education’ and ‘going green’; I’ve no idea if you knew this).

2009 was a great start for a more open  Can’t wait to see what you do in 2010.