top of page
  • Writer's pictureHaripriya Sridharan

Visualising Wikipedia: Pet Project

I love maps, data and charts. Whenever I read any text, my natural instinct is to use maps to visualise it in a demographic context, which, in turn, deepens my understanding of the topic.

I have been eager to create visualisations of internet data with a demographic focus. And eventually get to a point where I don't have to interact with BOT to retrieve information. (Influenced by all the sci-fi capers I grew up with!)

As a starting point, I aimed at presenting population data across globe.

Open Data Sources

I started with wiki dataset from hugging face. I was exploring their latest collection (2022)

from datasets import load_dataset
dataset = load_dataset("wikipedia", "20220301.en")

num_entries = len(dataset['train'])

They have quite a few collections but each had about 64L entries. So, I parked it on the side for later use.

I stumbled upon I liked their animation of live population count.

I also tested by giving date of birth as current date and below is what I got.

We estimate you will live until age 82.1 years 13 Jun 2105

Obviously inaccurate but looks pretty cool! It inspired me to look out for open apis.

I eventually came across open source wiki api that gives you data in json format. They are pretty fast and requires no api-key. Important to note here is the param "action: parse" . You can also use "action:query" but you wouldn't be able to scrape tabulated data.

api_url = ""

params = {
    "action": "parse",
    "page": "List_of_countries_and_dependencies_by_population",
    "format": "json"

Developing Server and Client

For backend: I wrote a simple server side code to scrape population count from the wiki page List_of_countries_and_dependencies_by_population. Code source: here

For frontend: I started with popular 3D geospatial service Cesium Ion. However it has its limitations catering to only lat and long values.

For my needs, I required a solution that connects lat and long coordinates to specific countries. So, I turned to opencage. They offer a generous allowance of 2500 free calls per day. (Not bad).

Here is how I connected them.

This is the final result.

You can find the on-going project here. I'm curious to keep this journey going and see where it takes me. I'll be sharing more as I explore new platforms/data sources.



bottom of page