3 Introduction to Your Project
3.1 Purpose of the Project Guide
Welcome to the project guide for your TechAcademy Data Science project! This document will guide you through the different steps of your project and provide you with valuable hints along the way. However, the guide is not a detailed step-by-step manual. We think you should develop the skills of coming up with your way of solving different tasks. Such an approach is an excellent opportunity to apply the knowledge and tools you acquired through DataCamp.
Questions might come up, or you might not know how to solve a task right away - but do not worry - that is just a part of coding. In those cases, you can find helpful links in the introductory chapters, where someone might have answered your question(s). If not – and in the unlikely case that even Google cannot provide the answer – our TechAcademy mentors will help you via Slack or directly during the coding meetups. You are strongly encouraged to chat with your group’s mentors!
At the end of the project guide, you will find an overview of the tasks you must complete, depending on your track (beginner vs. advanced). You can use that list to check which tasks you still need to complete or which assignments are relevant for your track.
3.2 What is this Project About?
Having worked with Spotify data last semester, we will focus on a different topic this semester: Airbnb. More precisely, you will analyze data about Airbnb offers in a city of your choice! You can choose from the following list of cities according to your preferences, but each group member should work on a different city:
Amsterdam, Boston, Edinburgh, Madrid, München, and Rom.
Your data was scraped and is hosted by the Inside Airbnb project. You will find all kinds of information in that data set – both valuable and worthless to your analysis. In the analogy of the typical data science workflow, we have split this project into two parts. First, you are going to learn how to perform an Exploratory Data Analysis (EDA). You will have a closer look at the data, transform it and then get to know the different variables and how they look in various visualizations.
Beginners will complete the project after this first part. Still, it would be beneficial for beginners to try and work on the second part too. In the next part of the project, advanced programmers will develop a model that predicts Airbnb prices in their city as accurately as possible. They will start with a linear regression algorithm which they can modify as pleased. Additionally, they will explore other possibilities of modeling and data prediction.
But first things first: What exactly is EDA, and what can you achieve with it?
3.3 Exploratory Data Analysis – Getting to Know the Data Set
As a first step, you will get to know the data set. This step means you will describe the data and answer questions like: What variables does the data set contains? And how are they related? For this purpose, you often use graphical tools like box plots or histograms.
We structured the first part of the project to let you know the data thoroughly by completing the given tasks one after the other. As a beginner, you can stop after this part because you will have fulfilled the necessary coding requirements for your TechAcademy certificate. However, if this first part inspires you to step up your “apartment-search game,” we encourage you to work on the second part!
Since data science concepts are independent of specific programming languages, we will describe the general approach in this part of the text. Having understood the bigger picture and starting with the tasks, you will find language-specific tips and tricks in visually separated boxes.
If you participate in our R
-Track, you will need to look at the boxes with the blue border.
Likewise, you look at the yellow-bordered boxes if you code in Python
.
From time to time, it might be interesting to check out the other language – though you can code the exact solutions in both, they sometimes have a different approach to an identical problem.
It makes sense that you complete the first few beginner chapters mentioned in Section 2.
We recommend that you finish the courses at least until – and including – Exploratory Data Analysis in [your programing language of choice].
3.4 Prediction – Apply Statistical Methods
This part of the project is mainly for the advanced TechAcademy participants. If you are a beginner and have completed the first part and have some leftover motivation, we would love to see you complete the second part too! Statistical models are a significant part of data science, and this gives you a chance to get acquainted with such frameworks.
As you studied the data in part one, you should be familiar with its features, and you can start creating predictions about Airbnb prices – based on the information you learned about the apartments. After completing the second part, you will send us your predictions, and we will check how accurate your model was. The most precise model wins the race!
For this part of the project, we recommend the advanced courses mentioned in Section 2.
Please note that even more DataCamp classes are available, so if you want to extend your skills further, feel free to complete more courses on the topics that interest you.
We recommend that you finish the lessons at least until – and including –Unsupervised Learning in Python for the Python
track and Machine Learning Toolbox for the R
track.
Ready to get your hands dirty? After getting a first impression of what this project is about, let’s get you started!