As one of Europe’s fastest-growing economies, Berlin has quickly grown into a tourism and residential real estate magnet. Imagine a hospitality chain or real estate developer aiming to enter the Berlin market to capitalize on growth. How could we help them develop an effective neighborhood-level entry strategy?
A great place to turn to for data to help us s Airbnb, which is a strong proxy for residential real estate and hospitality business development in Berlin. In order to best structure our analysis, get an overview, clarify our assumptions, and understand what’s needed, let’s break down the problem into business questions, data questions, and data needs:
| Business Question | Data Question | Data Need |
| Which of Berlin’s boroughs / neighborhoods, should we enter first? | Which boroughs and neighborhoods have the highest mean / median monthly occupancy and revenue? | Airbnb listing price and occupancy data by neighborhood, Berlin geocoding and geojson data (for mapping) |
| What is the optimal price point? | What is the mean / median price by borough and neighborhood? | Airbnb listing price per night by neighborhood, Berlin geocoding and geojson data (for mapping) |
| What status does a business / chain / developer garner? | What is the occupancy and revenue premium for an Airbnb superhost? | Airbnb listing superhost status along with above data |
| What unit capacity should we focus on? | What are differences in mean / median monthly price, occupancy, revenue by capacity? | Airbnb listing capacity data along with above data |
| What is the estimated revenue of a brand new listing? | What factors can be used to predict monthly revenue? | Airbnb listing variables that can be used as predictors, monthly revenue as labeled target |
| What neighborhoods are most promising for fit and local business partnerships? | What are Berlin’s venues and venue categories by neighborhood and how do they cluster? | Berlin neighborhoods data, venue info and geolocation data using Foursquare API, Berlin geocoding and geojson data |
In this analysis, we’ll work with a data set of >22,500 Airbnb listings in Berlin, including 96 variables downloaded from Kaggle. You can obtain original source Airbnb data for many cities around the world on Inside Airbnb. Berlin geojson data was downloaded from Github, originally sourced from Technologiestiftung Berlin. Berlin’s neighborhoods, specifically for Pankow, were scraped from Wikipedia using BeautifulSoup. Venue data was obtained using the Foursquare API.
Our analysis, results, discussion, and future areas of research is useful for any hospitality, real estate, or tourism-adjacent business interested in entering or expanding in Berlin, as well as for public planning / zoning.
Our code, explanatory notes, observations, visuals, and results and recommendations are below.
- Introduction
- Import Packages & Read in Data
- Pre-processing
- Exploratory Data Analysis
- Feature Engineering & Exploration: Occupancy, Revenue per Month
- Linear Regression Model
- Choropleth Map of Revenue by Borough
- Map, Segment, and Cluster Prenzlauer Berg/Pankow by Venues
- Summary Results & Recommendations
- Future Areas for Analysis















If you’d like to explore, model, or visualize data or trends in your neighborhood, give us a shout at info@crawstat.com!
We love hearing from you,
Rish
I want to post a remark that “The substance of your post is amazing” Great work. data science using python and r programming aurangabad
Thank you!