Recently I came across an article that had used the GitHub API to scrape information regarding the number of users in major cities in the US. The article gave me the idea to take this a little further and see if we could map out the number of users in each city, or perhaps more importantly the percentage of people in that city with a GitHub accounts. Before we begin let met point out the obvious flaws with the methodology of this application:
Having said this, it's still interesting to explore the available data and try to see or explain any patterns. Plus it's fun!
Firstly to get the data into a format that could be mapped, it was necessary to instantiate a list of cities that I was interested in, and assign these their populations (I used Wikipedia).
Then using Python and the GitHub API I scraped the number accounts that matched the town name. Here it was necessary to try multiple different matches to get an accurate data. For example, with London it was necessary to try "London, England", "London, Great Britain", "London, United Kingdom" and "London, UK" as these are all valid locations representing the same place. You will need a GitHub account and a token to avoid rate limiting.
City | GitHub Accounts | City Population | Rate |
Cambridge, England | 1313 | 128515 | 1.022 |
Brighton, England | 588 | 163000 | 0.361 |
Oxford, England | 551 | 171380 | 0.322 |
Bath, England | 231 | 88859 | 0.260 |
Reading, England | 291 | 160825 | 0.181 |
Durham, England | 68 | 48069 | 0.141 |
Bristol, England | 837 | 617000 | 0.136 |
York, England | 260 | 204439 | 0.127 |
Norwich, England | 165 | 140452 | 0.117 |
Edinburgh, Scotland | 801 | 782000 | 0.102 |
London, England | 9291 | 9787426 | 0.095 |
Glasgow, Scotland | 558 | 589900 | 0.095 |
Dundee, Scotland | 133 | 153990 | 0.086 |
Exeter, England | 98 | 121800 | 0.080 |
Belfast, Northern-Ireland | 216 | 276705 | 0.078 |
Bangor, Wales | 11 | 16358 | 0.067 |
Aberdeen, Scotland | 125 | 189120 | 0.066 |
Cardiff, Wales | 283 | 447287 | 0.063 |
Bournemouth, England | 116 | 183491 | 0.063 |
Sheffield, England | 362 | 640720 | 0.056 |
Nottingham, England | 389 | 729977 | 0.053 |
Liverpool, England | 236 | 466415 | 0.051 |
Manchester, England | 1291 | 2553379 | 0.051 |
Plymouth, England | 122 | 256600 | 0.048 |
Swansea, Wales | 101 | 239023 | 0.042 |
Newcastle, England | 351 | 879996 | 0.040 |
Southampton, England | 312 | 855569 | 0.036 |
Inverness, Scotland | 21 | 57960 | 0.036 |
Leicester, England | 143 | 509000 | 0.028 |
Leeds, England | 496 | 1777934 | 0.028 |
Gloucester, England | 29 | 125649 | 0.023 |
Warwick, England | 29 | 139396 | 0.021 |
Birmingham, England | 503 | 2440986 | 0.021 |
Newport, Wales | 26 | 145700 | 0.018 |
Derry, Northern-Ireland | 6 | 83652 | 0.007 |
Aylesbury, England | 13 | 184560 | 0.007 |
Lisburn, Northern-Ireland | 4 | 71403 | 0.006 |
We can see highest on the list is Cambridge with over 1 % of the population having a GitHub account. Lowest on the list was Lisburn with 0.006 closely followed by Aylesbury (where I live ) and Derry with 0.007%. To put this into perspective the original Hirily analysis found 3% of San Francisco's population had a GitHub account!
The script outputs a CSV which was then uploaded into ArcGIS Online content pane using a developer account. When uploading the CSV we can set the city column to be geocoded. This allows us to take the address of the city and turn it into a latitude and longitude, in turn allowing us to map the data.
The process asks if you want to review (probably worth while as some points can end up astray). Once this was done, I gained a Feature Service of the data (a REST end point we can get our data from). From here I took this into a Esri Leaflet map (one of Esri's GitHub projects!). The main bulk of the mapping is outlined in the JavaScript code below:
var map = L.map('map').setView([ 54.514, -2.122], 6); L.esri.basemapLayer("Gray").addTo(map); L.esri.basemapLayer("GrayLabels").addTo(map); var ukGitHub = "http://services1.arcgis.com/Q6SkXeZHDxVxhXA4/arcgis/rest/services/GitHub_Data/FeatureServer/0"; var gh = L.esri.featureLayer(ukGitHub, { pointToLayer: function (geojson, latlng) { console.log(geojson); var rate = geojson.properties.Rate; var size; if (rate >= 0.361 && rate < 1.2 ) { size = [65, 63]; } else if (rate >= 0.181 && rate < 0.361 ) { size = [55, 53]; } else if (rate >= 0.095 && rate < 0.181 ) { size = [45, 43]; } else if (rate >= 0.046 && rate < 0.095 ) { size = [35, 33]; } else if (rate >= 0 && rate < 0.046 ) { size = [25, 23]; } return L.marker(latlng, { icon: L.icon({ iconUrl: 'imgs/github4.png', iconSize: size, iconAnchor: [size[0] / 2, size[1] / 2], popupAnchor: [0, -11] }) }); } }).addTo(map);
A screenshot of the map can be seen below, a live demo can be seen here.
You can find the code on my GitHub account: JamesMilnerUK/github-mapping · GitHub
set the city column to be geocoded. This allows us to take the address of the city and turn it into a latitude and longitude, in turn allowing us to map the data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.