Practical Open Data: Commuting to Christmas Markets in Berlin with F#

(1/4) Inspirational Datasets

Schedule:

I: Inspirational datasets (28.11.2021)

II: City as a function (6.12.2021)

III: Visualizations (15.12.2021)

IV: Practical Open Data portal (24.12.2021)

Finding a way to facilitate application development with open city data is my interest for a long time.

During the #FsAdvent time I want to show how to build a municipal analytical workflow in F#. The universal (yet very simple) city lab where you can explore and improve your area by … writing single F# functions.

Single function sounds like a trivial solution to the quite complicated endeavor so before we start, let me cite the first of dozen suggestions from Joe Duffy (CEO of Pulumi) on building developer tools/API:

Open data are often called the new oil or even the currency of our times.

Applications and analytics based on open city data (in theory) should provide much higher social and business value than those using regular datasets.

The unfair advantage is the law — hence the monopoly — to gather and process the most exact, important, and fragile data (demography, full business permits/registry, health services, to name a few …)

Despite all those promises, there are not too many applications created with open city data in mind. Most of them are just a result of municipal hackathons and will not face any further development.

That is sad as we have a huge space for imagination targeting solutions for smart cities. We just don’t know how many and what problems we are going to address and solve with all of those next-gen measure devices producing exceptional datasets.

Working with data is based on the so-called “analytical workflow”.

Typical analytical workflow is a process with several steps to find insights (business or social value) via data exploration. A generalized analytical workflow is more or less similar to the diagram below.

Typical Analytical Workflow

It definitely is not simple and not direct. You can notice no exact beginning of the process ( although many people say the beginning happens always before data capture ). The visualization part is most likely black-boxed. We could spend hours talking about the cons of the aforementioned chart but the biggest downside is that you are going to be tired of all the necessary transformations before you come up with some insights.

Let me rephrase our unusual situation:

with open city data we don’t only want to address existing problems, we want to find them out!

When we want to find problems to address with data, we don’t have well-formed questions. After a few explorations of the selected area, we may find something interesting but the prerequisite for this is to have a dataset that can inspire proper questioning.

Analytical Workflow designed to find out the problem/question (with drafted solution)

The inspirational dataset is the first part of our very simple city lab but definitely not the most fun. We will focus on the second part (City as a function) responsible for explorations, visualizations, and comparisons in the next blog entry.

However, I am going to give you a brief example of such a function now. This is because the inspirational dataset is just an input type to that generic function, hence it is better to explain it with a use case:

the rationale behind vote name will soon be clear

‘dataset is just a single type that holds all available data that may help to find insights for a particular city dimension. To be more specific: it is a domain-friendly type, not the one used for storage. It is an intrinsic part of the F# function definition so this dataset is always strongly-typed. You have the full F# power to add DSL extensions and use all available F# features to rank/query the underlying domain.

Inspirational dataset is an exhaustive set of data given to you as a function parameter so you never take care of data capture, preprocessing and matching. There is no setup, no data load, you just start by writing the explorative function definition as immidiately as possible (I will describe how in the last post of this series).

Although you write a function that compares/ranks two places, you are guaranteed it will be run for every possible place(s) in the city (or subareas). It changes the way how you think about the input data, because you may search for one particular best commute or analyze all possible occurrences for marketing or urban planning activities. It encourages you to think about data potential from different angles.

Let's give an example.

Imagine we want to describe (evaluate) any two points in a city with regards to their commute value. Taking into account the Christmas spirit I will evaluate all available Christmas Markets and their connection to any other place. Berlin has a dataset for 45 such markets:

45 Christmass Makets in Berlin

Marked home can be one of the 300k addresses. A lot of data to compare but don’t worry about that for now, just take for granted the dataset gives us info about each and every combination of two locations we want to compare (distance between them, n nearest stations, available transportation modes, route details) and the result of their matching (common transit routes).

Depending on the scenario we can use that data to implement almost any ranking.

The function below will rank all target places (Christmas Markets) base on their distance to your home (or any other important place):

If we live nearby a specific route we can filter only the markets with that connection:

or rank markets based on the count of all nearby routes without specifying their names

we can go pretty far with the calculations and even use real-time data like weather conditions:

As you can see there is nothing specific to Christmas Markets (yet). The engine can feed ‘dataset with schools, companies, playgrounds and the code will remain the same.

Naturally finding the best commute options is nothing special but the tricky part is that we are doing exhaustive function running. Then we can use it for marketing purposes or even urban planning.

However, first, we need to know how to make those exhaustive comparisons with no effort. I will show it in the next post.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Adaptive models during COVID-19

Linear Fixed Effects Models on Undirected Networks

AWS Glue : Optimize crawler runtimes for continuously increasing data (using exclude patterns)

Movie Recommender System: Part 1

Putting a Price on Customer Churn

Data Mesh Solution and Accelerator Patterns

Basic Data Visualisations for starters

Statistical Learning Tools for Data Analysis in Developing Business Strategies

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paweł Stadnicki

Paweł Stadnicki

More from Medium

How to manage the wastewater phosphorus removal process to meet new EPA standards

Want to know about 🦍Bybit's Launchpad🗒️ And Launchpool🏊🏻‍♂️

A Thought on Sharing “Secret” Recipes

What does MatrixETF constitutes?