Document

Week One

Our first task was to embed charts

The first chart illustrates meat consumption in OECD countries.

My second chart looks at ethereum trading statistics from 2016-2021

Week Two

Our second task was to embed the original and a corrected version of total IMF loans outstanding

First the original, in nominal terms:

This version shows the total loans outstanding as a fraction of global GDP:

In Conclusion, we see that 1984 is actually the riskiest year, with the highest outstanding loans as % of GDP.

Week Three

Our third task was to create two charts with embedded data

First, a scatter plot showing the correlation between students enrolled and amount of teachers by district.

By applying a transform in vega, I made a bar chart which shows that the amount of students per teacher is centered at ~ 20.

Week Four

Our fourth task was to scrape, clean and normalise data from a website

I scraped World Cup winner odds from justbookies and calculated the winning probabilities. By taking aggregated data from several bookmakers, I could get a more robust result.

Link to Colab

Week Five

Our fifth task was to loop on the ONS API to quickly produce 9 time series from a dataset

Link to Colab

Week Seven

Our sixth task was to produce a base map and a choropleth map

First, my base map of South America:

Now the Gini Index of each country is embedded in the map, showing Uruguay to be the most equal country in terms of income.

Week Eight

Our seventh task was to produce two charts to support or refute an argument made at the Festival of Economics.

‘Lockdown increased the appeal of crypto - there was a lack of sports betting, people were looking for a rush - this led to the gamification of crypto.’

While we cannot establish causality, limited sports betting led to people seeking similar thrills elsewhere. Hence, it is plausible that more people looking for a rush entered the crypto scene.

Week Nine

Our eighth task was to produce two advanced visualisations

In this regression, there is a positive correlation between the % of English Learners and the % of students qualifying for reduced lunches.

Secondly, I made a heatmap showing the frequency of scorelines in the Premier League (2016-17)

Weeks Ten and Eleven

Finally, we were introduced with machine learning and tasked with performing both a supervised and unsupervised learning technique.

My first hyopthesis: Using K-means clustering I can classify wines into distinguished groups with similar properties in each cluster.

To the naked eye, there seem to be loosely defined boundaries but fairly distinct clusters.

Link to colab

My supervised learning hypothesis: Does logistic regression predict whether or not a customer purchases better than a decision tree?

We can see from the confusion matrices that the logistic regression does predict more accurately (74-6) than the decision tree (67-13). The decision tree largely overfits on the training data.