JMilneresriuk-esridist

Where is GitHub most popular in the UK?

Blog Post created by JMilneresriuk-esridist Employee on Jul 9, 2015

Recently I came across an article that had used the GitHub API to scrape information regarding the number of users in major cities in the US. The article gave me the idea to take this a little further and see if we could map out the number of users in each city, or perhaps more importantly the percentage of people in that city with a GitHub accounts. Before we begin let met point out the obvious flaws with the methodology of this application:

 

  • Populations are estimations (plus the UK census is now 4 years old)
  • Populations for cities can be difficult to define (city, urban, metro area)
  • GitHub accounts can be owned by companies as well as people
  • Not everyone gives their location on their GitHub accounts, or people may lie/not update

 

Having said this, it's still interesting to explore the available data and try to see or explain any patterns. Plus it's fun!

 

Data Scraping

 

Firstly to get the data into a format that could be mapped, it was necessary to instantiate a list of cities that I was interested in, and assign these their populations (I used Wikipedia).

 

Then using Python and the GitHub API I scraped the number accounts that matched the town name. Here it was necessary to try multiple different matches to get an accurate data. For example, with London it was necessary to try "London, England", "London, Great Britain", "London, United Kingdom" and "London, UK" as these are all valid locations representing the same place. You will need a GitHub account and a token to avoid rate limiting.

 

The Results

    

CityGitHub AccountsCity PopulationRate
Cambridge, England13131285151.022
Brighton, England5881630000.361
Oxford, England5511713800.322
Bath, England231888590.260
Reading, England2911608250.181
Durham, England68480690.141
Bristol, England8376170000.136
York, England2602044390.127
Norwich, England1651404520.117
Edinburgh, Scotland8017820000.102
London, England929197874260.095
Glasgow, Scotland5585899000.095
Dundee, Scotland1331539900.086
Exeter, England981218000.080
Belfast, Northern-Ireland2162767050.078
Bangor, Wales11163580.067
Aberdeen, Scotland1251891200.066
Cardiff, Wales2834472870.063
Bournemouth, England1161834910.063
Sheffield, England3626407200.056
Nottingham, England3897299770.053
Liverpool, England2364664150.051
Manchester, England129125533790.051
Plymouth, England1222566000.048
Swansea, Wales1012390230.042
Newcastle, England3518799960.040
Southampton, England3128555690.036
Inverness, Scotland21579600.036
Leicester, England1435090000.028
Leeds, England49617779340.028
Gloucester, England291256490.023
Warwick, England291393960.021
Birmingham, England50324409860.021
Newport, Wales261457000.018
Derry, Northern-Ireland6836520.007
Aylesbury, England131845600.007
Lisburn, Northern-Ireland4714030.006

 

 

 

We can see highest on the list is Cambridge with over 1 % of the population having a GitHub account. Lowest on the list was Lisburn with 0.006 closely followed by Aylesbury (where I live ) and Derry with 0.007%. To put this into perspective the original Hirily analysis found 3% of San Francisco's population had a GitHub account!

 

Making the Map

 

The script outputs a CSV which was then uploaded into ArcGIS Online content pane using a developer account. When uploading the CSV we can set the city column to be geocoded. This allows us to take the address of the city and turn it into a latitude and longitude, in turn allowing us to map the data.

 

githubcsv.png

 

The process asks if you want to review (probably worth while as some points can end up astray). Once this was done, I gained a Feature Service of the data (a REST end point we can get our data from). From here I took this into a Esri Leaflet map (one of Esri's GitHub projects!). The main bulk of the mapping is outlined in the JavaScript code below:

 

    var map = L.map('map').setView([ 54.514, -2.122], 6);

    L.esri.basemapLayer("Gray").addTo(map);
    L.esri.basemapLayer("GrayLabels").addTo(map);


    var ukGitHub =
    "http://services1.arcgis.com/Q6SkXeZHDxVxhXA4/arcgis/rest/services/GitHub_Data/FeatureServer/0";
    var gh = L.esri.featureLayer(ukGitHub, {
        pointToLayer: function (geojson, latlng) {
            console.log(geojson);
            var rate = geojson.properties.Rate;
            var size;


            if (rate >= 0.361 && rate < 1.2 ) {
                size = [65, 63];
            }
            else if (rate >= 0.181 && rate < 0.361 ) {
                size = [55, 53];
            }
            else if  (rate >= 0.095 && rate < 0.181 ) {
                size = [45, 43];
            }
            else if  (rate >= 0.046 && rate < 0.095 ) {
                size = [35, 33];
            }
            else if  (rate >= 0 && rate < 0.046 )  {
                size = [25, 23];
            }


            return L.marker(latlng, {
                icon: L.icon({
                    iconUrl: 'imgs/github4.png',
                    iconSize: size,
                    iconAnchor: [size[0] / 2, size[1] / 2],
                    popupAnchor: [0, -11]
                })
            });
        }
    }).addTo(map);

 

 

Screenshot and Live Demo

 

A screenshot of the map can be seen below, a live demo can be seen here.

 

githubmap.png

 

Where's the code?

 

You can find the code on my GitHub account: JamesMilnerUK/github-mapping · GitHub 

set the city column to be geocoded. This allows us to take the address of the city and turn it into a latitude and longitude, in turn allowing us to map the data.

Outcomes