I love maps, data and charts. Whenever I read any text, my natural instinct is to use maps to visualise it in a demographic context, which, in turn, deepens my understanding of the topic.
I have been eager to create visualisations of internet data with a demographic focus. And eventually get to a point where I don't have to interact with BOT to retrieve information. (Influenced by all the sci-fi capers I grew up with!)
As a starting point, I aimed at presenting population data across globe.
Open Data Sources
I started with wiki dataset from hugging face. I was exploring their latest collection (2022)
from datasets import load_dataset
dataset = load_dataset("wikipedia", "20220301.en")
num_entries = len(dataset['train'])
print(num_entries)
They have quite a few collections but each had about 64L entries. So, I parked it on the side for later use.
I stumbled upon population.io. I liked their animation of live population count.
I also tested by giving date of birth as current date and below is what I got.
We estimate you will live until age 82.1 years 13 Jun 2105
Obviously inaccurate but looks pretty cool! It inspired me to look out for open apis.
I eventually came across open source wiki api that gives you data in json format. They are pretty fast and requires no api-key. Important to note here is the param "action: parse" . You can also use "action:query" but you wouldn't be able to scrape tabulated data.
api_url = "https://en.wikipedia.org/w/api.php"
params = {
"action": "parse",
"page": "List_of_countries_and_dependencies_by_population",
"format": "json"
}
Developing Server and Client
For backend: I wrote a simple server side code to scrape population count from the wiki page List_of_countries_and_dependencies_by_population. Code source: here
For frontend: I started with popular 3D geospatial service Cesium Ion. However it has its limitations catering to only lat and long values.
For my needs, I required a solution that connects lat and long coordinates to specific countries. So, I turned to opencage. They offer a generous allowance of 2500 free calls per day. (Not bad).
Here is how I connected them.
This is the final result.
You can find the on-going project here. I'm curious to keep this journey going and see where it takes me. I'll be sharing more as I explore new platforms/data sources.
Comments