BookmarkSubscribeRSS Feed

Putting Airbnb Under the Microscope with SASPy and SAS

Started ‎03-06-2020 by
Modified ‎08-03-2021 by
Views 4,587

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

apartment-bed-bedroom-chair-271624.jpg

 

One of the great commercial success stories of recent years has been Airbnb. The company acts as an online marketplace allowing  hosts to offer short or long-term stays at their properties, from which Airbnb gains a commission. It is often thought of as being aimed at people who have a spare room they can rent out for a little extra cash, but it isn’t restricted to that type of host. Some hosts have very many properties listed and could be considered “professional” hosts rather than the “amateurs” with just a single spare bedroom.

 

In this edition of Free Data Friday we will be looking at Airbnb listings data from the web site insideairbnb.com to see if we can find any differences between the type and price of listings of hosts with just a single listing and hosts with multiple listings. In order to do this we will be using the JupyterLab interface from SAS University Edition along with the SASPy interface which allows the calling of SAS from python code. I should point out at this stage that I am by no means a python expert so if you see anything in the code you think could be improved on please leave a comment below.

 

Get the Data

 

FreeDataFriday_graphic.jpgThe data is available as a CSV from the insideairbnb web site – data is available for listings in many different cities. I decided to use listings data for New York given its reasonable quantity and allocation to the easily recognisable five boroughs as neighbourhood groups.

 

Importing the data into SAS proved more problematic than anticipated. The problem arose with the “name” field which appears to be a description of the property from the listing advertisement. Some of the rows contain line breaks which caused errors importing the file into SAS. This left me with the prospect of either pre-editing the file or finding another way of creating the SAS data set. I had been wanting to try SASPy for a while and this seemed the perfect opportunity to see if python and SASPy could facilitate the data load.

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

 

Getting the Data Ready

 

In order to use SASPy you start SAS in the usual way but choose to use the JupyterLab environment instead of SAS Studio. Here’s the first cell in the notebook importing SASPy and the python pandas module:

 

import pandas as pd
import saspy

 

In order to read the CSV file I then used the read_csv function with the usecols parameter. This tells python to only read specific fields from the CSV, ignoring all others. This overcomes the issue of the field with undesirable line breaks and the data is read into an in-memory data structure called a dataframe.

 

 

df=pd.read_csv("/folders/myshortcuts/Dropbox/listings.csv",
              usecols=["host_id","host_name","neighbourhood_group","neighbourhood",
                       "latitude","longitude","room_type","price","minimum_nights",
                      "calculated_host_listings_count","availability_365"])

 

Next I use SASPy to instantiate a SAS session

 

 

sas = saspy.SASsession()

 

I then declare my SAS library where I am going to save the resulting data set.

 

 

sas.saslib("dbox","base","/folders/myshortcuts/Dropbox")

 

Now I can save the pandas dataframe to a SAS data set for further processing in SAS

 

 

airbnb=sas.df2sd(df,"airbnb","dbox")

 

From this point on all my processing could be done using SAS code. There are a number of ways of calling SAS from the JupyterLab python kernel. The easiest is probably the %%SAS magic command which forces all code in the cell to be run by the SAS session. Here's the cell which creates a column in the data set which helps distinguish between single and multiple listing hosts, runs Proc Means and prints the output

 

 

%%SAS

libname dbox "/folders/myshortcuts/Dropbox";

data all_data;
    set dbox.airbnb;
    if calculated_host_listings_count=1 then single=1;
    else if calculated_host_listings_count>1 then single=0;
run;

proc means data=all_data noprint;
    class single room_type neighbourhood_group;
    var price;
    output out=all_stats mean=avg_price;
run;

proc print data=all_stats(obs=20);
run;

 

Here's the output from the Proc Print

 

Proc Means Output.png

 

Now I can run Proc SGPie to generate some Pie Charts showing the percentage of listings by room type for all hosts, single listing hosts and multiple listing hosts. I can use the _type_ variable to distinguish between combinations of class variables.

 

 

%%SAS

title1 "Airbnb Listings by Room Type";
title2 "New York City";
title3 "All Hosts";
footnote1 j=r "Data From: http://insideairbnb.com";
proc sgpie data=all_stats(where=(_type_=2));
    format _freq_ comma10.;
    pie room_type / response=_freq_ maxslices=4
    dataskin=gloss datalabeldisplay=(category percent)
    datalabelloc=callout;
run;

title1 "Airbnb Listings by Room Type";
title2 "New York";
title3 "Single Listing Hosts";
footnote1 j=r "Data From: http://insideairbnb.com";
proc sgpie data=all_stats(where=(single=1 and _type_=6));
    format _freq_ comma10.;
    pie room_type / response=_freq_ maxslices=4
        dataskin=gloss datalabeldisplay=(category percent)
        datalabelloc=callout;
run;

title1 "Airbnb Listings by Room Type";
title2 "New York City";
title3 "Multiple Listing Hosts";
footnote1 j=r "Data From: http://insideairbnb.com";
proc sgpie data=all_stats(where=(single=0 and _type_=6));
    format _freq_ comma10.;
    pie room_type / response=_freq_ maxslices=4
        dataskin=gloss datalabeldisplay=(category percent)
        datalabelloc=callout;
run;

 

 

Pie Chart1.png

 

Pie Chart2.png

 

Pie Chart3.png

 

We can see that for single listing hosts entire home/apartments comprise more than half the listings with private rooms accounting for about 39% of listings. For multiple listing hosts these numbers are almost exactly reversed with private rooms being more prevalent. Hotel rooms are almost entirely the province of multiple listing hosts although the percentages are very small.

 

Now we move onto location

 

 

%%SAS

title1 "Airbnb Listings by Borough";
title2 "New York City";
title3 "All Hosts";
footnote1 j=r "Data From: http://insideairbnb.com";
proc sgpie data=all_stats(where=(_type_=1));
    format _freq_ comma10.;
    pie neighbourhood_group / response=_freq_ maxslices=5
        dataskin=gloss datalabeldisplay=(category percent)
        datalabelloc=callout;
run;

title1 "Airbnb Listings by Borough";
title2 "New York City";
title3 "Single Listing Hosts";
footnote1 j=r "Data From: http://insideairbnb.com";
proc sgpie data=all_stats(where=(single=1 and _type_=5));
    format _freq_ comma10.;
    pie neighbourhood_group / response=_freq_ maxslices=5
        dataskin=gloss datalabeldisplay=(category percent)
        datalabelloc=callout;
run;

title1 "Airbnb Listings by Borough";
title2 "New York City";
title3 "Multiple Listing Hosts";
footnote1 j=r "Data From: http://insideairbnb.com";
proc sgpie data=all_stats(where=(single=0 and _type_=5));
    format _freq_ comma10.;
    pie neighbourhood_group / response=_freq_ maxslices=5
        dataskin=gloss datalabeldisplay=(category percent)
        datalabelloc=callout;
run;

 

 

Pie Chart4.png

 

Pie Chart5.png

 

Pie Chart6.png

 

there's not a huge difference here except for Queens where 16% of multiple listing hosts have properties but only 10% of single listing hosts do.

 

Finally we can chart prices by room type for the different types of host

 

 

%%SAS

title1 "Airbnb Average Prices by Room Type";
title2 "New York City";
title3 "All Hosts";

proc sgplot data=all_stats(where=(_type_=2));
    vbar room_type / response=avg_price
        dataskin=gloss datalabel=avg_price
        categoryorder=respdesc;
    xaxis label="Room Type";
    yaxis label="Average Price US$";
run;

title1 "Airbnb Average Prices by Room Type";
title2 "New York City";
title3 "Single Listing Hosts";

proc sgplot data=all_stats(where=(single=1 and _type_=6));
    vbar room_type / response=avg_price
        dataskin=gloss datalabel=avg_price
        categoryorder=respdesc;
    xaxis label="Room Type";
    yaxis label="Average Price US$";
run;

title1 "Airbnb Average Prices by Room Type";
title2 "New York City";
title3 "Multiple Listing Hosts";

proc sgplot data=all_stats(where=(single=0 and _type_=6));
    vbar room_type / response=avg_price
        dataskin=gloss datalabel=avg_price
        categoryorder=respdesc;
    xaxis label="Room Type";
    yaxis label="Average Price US$";
run;

 

 

Bar Chart1.png

 

Bar Chart2.png

 

Bar Chart3.png

 

We can see here that there are marked differences in prices for hotel rooms and shared rooms where single listing hosts are much more expensive than multiple listing hosts. in contrast private rooms from multiple listing hosts are more expensive on average.

 

It's hard to know what to make of these results without further analysis but it may be that single listing hosts charge more as they are more likely to offer premium rooms than multiple listing hosts, who may concentrate on volume over quality.

 

Now it's Your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

Version history
Last update:
‎08-03-2021 10:20 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags