MAPCAT blog MAPCAT blog MAPCAT blog MAPCAT blog MAPCAT blog

POI data cleaning

Everything you have to know about the POI classification

 

You may have read in our earlier blog post how we handle POIs on MAPCAT. Our goal is to make the user interface easy to use, so we created layers from the most searched place categories. In this way you can see all the elements of a POI type with only one click.

This week we were lucky to sit down with our QM, who kindly talked about the difficulties of the development behind POI data cleaning.

The author

POI classification Quality Manager

 

 

 

Emese Jutasi-Diamant

Quality manager at MAPCAT

What kind of difficulties do we experience in POI management?

The first and the biggest problem is that the POI names in the OpenStreetMap database are free text. This means that some POIs may have different varieties. For example, users are able to add a bank’s name in various ways, like OTP, OTP Bank, OTP PLC. etc.

The other trouble is when only selected data types of a POI are filled out, and all the rest is left blank. We take these issues seriously, as our aim is to help users find what they are looking for as fast as possible.

The current issue that gives us a headache is the cleaning of bicycle POIs. We have 3 bicycle-related POI layers (bicycle, bicycle-parking, bicycle-rental). The problem is that if users don’t write exact POI tags like electric bicycle, bicycle rental or parking we can’t put the information into the appropriate layer, so users don’t get all info on the subject 🙁

This data diversity is unmanageable, so we had to come up with a solution to make the data pool more compact.

 

How do we deal with this data complexity?

We apply a POI classification process. It means that we apply filters to select POIs and then load the data from OSM through these classification rules.

First of all, we’ve selected the most searched POI categories based on the taginfo, and we visualize all POIs of the category on a separate layer. Of course, not all POI types are covered, but only the rarely used ones are missing.

Then, again based on the taginfo, we’ve created sub-categories inside the categories. Like in the case of restaurants. We’ve classified the restaurants into different layers according to their cuisine: there are Italian restuarants, fast-food restaurants, vegan restaurants, etc.

The situation is the same with car showrooms and services, in this case we categorize by brands. If there’s brand name in the label we can visualize only those POI which has the brand’s official name.

We also have to filter among the POI names. This way, we surely lose some items, but we are not in a position to review all POI names worldwide. For example we only use the long official or the legal short version of a bank’s name. 

These are our most important POI layers:

POI layers

 

Which data types do we keep?

We’ve analysed the POI categories and the most used data types per category. We only keep the most demanded and used amenities, and only the top 10. This can be different in every category, for example in case of gas stations – in addition to basic data- it is important what kind of fuel they have, if there’s a car wash,etc., but in case of a restaurant the cuisine is more relevant, or  whether  it’s non-smoking, etc.

Right now we are working on the classification 2.0. We are using a huge database to define the attributions of an amenity, that we will let in during the classification. Naturally we will take the growing trends into account and visualize electric car charging stations, etc. Also, we keep into account the user feedback we receive on MAPCAT.

 

What can a user see about a POI on MAPCAT?

You will find 5 basic data about all POIs:

  • name
  • phone number (we create a unified format, so from some browser you can call with a click)
  • e-mail
  • website
  • wheelchair access

When available, you will also see the additional data types accoring to the category. Furthermore, if you enter the editing mode, you can see a much more detailed version, where we add Wikidata info and you can even contribute to the info pool with a few clicks. (read more about editing)

 

How can U help us and yourself?

Use correct and relevant tags when editing in OSM! You could also upload diabetic restaurants, or you should tag if a gas station has only diesel, gasoline or electronic charging station.  Always take care to write official names and brand names correctly. Unfortunately we don’t have capacity to look through every POI on the world map, and correct every mistake.

In addition, your suggestions for the new classification are very welcome 🙂

 

 

Facebook
Google+
LINKEDIN
INSTAGRAM
PINTEREST
Try our API for free

Leave a Reply

Your email address will not be published.

I am not a robot *