Datamining a Flat in Munich
When you’re in your mid-twenties, just got employed and don’t have enough cash, finding an affordable flat in Munich is very difficult. Munich ranks 13th in the 20 Most expensive cities on the planet
Honestly, I didn’t need this article to figure that out but anyways, basically, there is too much demand, and very few reasonnable offers. Reasonnable means that the rent doesn’t exceed 800€ for a 15m² flat.
It took me more than one year to find a flat, here is my story about how I datamined the flat search in Munich, and how I conducted some psychological experiments at the same time.
I started my search 4 months before I had to go to Munich. There are three options to find a flat here.
A couple of websites (including facebook groups) to apply for accomodations, appartment rental agencies and some shady forums on which you can seldomly find some adverts.
Appartment rental agencies cost roughly 3 times the monthly rent, which accounts for around 2000€. No need to say this was a no go for me as I prefer keeping that money and making something good out of it.
Anyways, I kept going with the two other options.
First impression: When applying online to any of those adverts (websites and forums) I almost never got an answer. I got like 1 negative answer for every 100 e-mails sent out.
Pretty shitty conversion rate, right ?
4 months passed and here I am in Munich, couchsurfing at a friend’s place. I really needed to find my place, where I can feel at home. I was thinking how it was easier to find a place in Paris…
Anyways, tired of clicking daily on the same links, same adverts to get to apply for something, I decided to automatize the whole process.
First thing, was to get all the data from all the websites, and analyze it as if I was browsing myself. A handy Python utility called Scrapy turned out very handy.
I basically used this tool to crawl each and every website which got adverts for accomodations. I started by making a spider that goes through every website and gets me the data and store it locally as a json file:
I had then to parse the results from the websites, and get clean data, this was done easily in Python with the json library.
The data I crawled was simple: some images, the price of the flat, the size of the flat, the email and number of the people renting the flat and some additional information:
I then automated the script to be runned everyday and curated for me the coolest potential flats I could spend my time calling and sending e-mails to. I got everything daily per e-mail at 12:00.
This helped me improve my conversion rate, I got almost 2 flat visiting each month and I spent less time searching. But this was not enough. Not enough for me. I wanted the tool to do all the work for me while I was dancing and enjoying the music in Rote Sonne.
So I iterated again on this idea, this time I wanted my tool to automatically send emails to people.
I decided to use Mechanize to automatically answer to adverts selected by my super crawler. This was pretty easy as there is absolutely no captcha system used by that time, on those websites. So, this was set up pretty easily. And on the next day, it was answering automatically to e-mails, and I got all the answers back, when there were answers.
Then, I got this idea that bugged me, why are people not answering to me?
There may be lots of answers to this question from which I kept the most important one: What is the number of people answering to an advert.
This one was easy to test, I set up a fake advert and I got 97 answers, in 3hours. And 459 answers in 3 days.
So basicly, competition is insanely rude, each time I answered to some advert that was older than 3 days, I had like 0.2% chances of getting an answer. Haha.
Conclusion: I had to be the first one who answers to any advert. Easy to set up so I did it. But still, the answers, were like 4 per week. Basicly, while I was visiting a flat, my minions were working for me and answering to other adverts. I then wanted to continue to test this tool and find my perfect flat.
I wanted to know why I didn’t get a 80% answer rate, so I started experimenting with the variables I had to fill out in the forms:
- A message
- A name
- A number
- An email
I A/B tested messages, numbers, names and emails. For this, I set up 80 email adresses that where routed to one single master email. I chose this number in order to optimize the answer rate to the maximum. For example: If someone answers to one of those 80 emails, the same person gets the answer in the end: Me.
So, everything was set up, and I had fun with the names, I used a fake name generator to get the names and the details. The messages came from a txt file which had defined message templates.
The conclusion from this little study, is that a girl with an italian name, gets an 90% answer rate, a guy with an arab name and is younger than 25 gets, 1% answer rate. The master of all, is the young munich guy who is around 25 , called Hanz who almost gets an answer all the time.
By that time, I found my dream flat and I am happy with it. The funny thing is that I found the flat through a friend. I then stopped my programs from turning.
Thanks.