proc sgplot data=hotel.Hotel_bookings; scatter x=arrival_date_year y=country/ group=hotel; title'guest arrived from countries in three years'; run;
Hi all,
I wanted to find out the guest arrived from various countries ,in three different year years to two hotels. I am not getting the output correctly. Looking for help please!
hotel is my libname
Hotel.bookings is the dataset
arrival_date_year is has from 2015,2016,2017
country - my dataset has various countries inside
hotel- resort hotel and city hotel.
These are the expansion of the code, could anyone please help where i am wrong??
You should show us some examples of what your hotel.hotel_bookings data set actually looks like.
And then describe exactly what you expect the output to look like.
If you mean to do this for individuals then likely this is going to be a very busy chart.
You may want to summarize before plotting.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies meal country Resort Hotel 0 342 2015 July 27 1 0 0 2 0 0 BB PRT Resort Hotel 0 737 2015 July 27 1 0 0 2 0 0 BB PRT Resort Hotel 0 7 2015 July 27 1 0 1 1 0 0 BB GBR Resort Hotel 0 13 2015 July 27 1 0 1 1 0 0 BB GBR Resort Hotel 0 14 2015 July 27 1 0 2 2 0 0 BB GBR Resort Hotel 0 14 2015 July 27 1 0 2 2 0 0 BB GBR Resort Hotel 0 0 2015 July 27 1 0 2 2 0 0 BB PRT Resort Hotel 0 9 2015 July 27 1 0 2 2 0 0 FB PRT Resort Hotel 1 85 2015 July 27 1 0 3 2 0 0 BB PRT Resort Hotel 1 75 2015 July 27 1 0 3 2 0 0 HB PRT Resort Hotel 1 23 2015 July 27 1 0 4 2 0 0 BB PRT Resort Hotel 0 35 2015 July 27 1 0 4 2 0 0 HB PRT Resort Hotel 0 68 2015 July 27 1 0 4 2 0 0 BB USA Resort Hotel 0 18 2015 July 27 1 0 4 2 1 0 HB ESP Resort Hotel 0 37 2015 July 27 1 0 4 2 0 0 BB PRT Resort Hotel 0 68 2015 July 27 1 0 4 2 0 0 BB IRL Resort Hotel 0 37 2015 July 27 1 0 4 2 0 0 BB PRT Resort Hotel 0 12 2015 July 27 1 0 1 2 0 0 BB IRL Resort Hotel 0 0 2015 July 27 1 0 1 2 0 0 BB FRA Resort Hotel 0 7 2015 July 27 1 0 4 2 0 0 BB GBR Resort Hotel 0 37 2015 July 27 1 1 4 1 0 0 BB GBR Resort Hotel 0 72 2015 July 27 1 2 4 2 0 0 BB PRT Resort Hotel 0 72 2015 July 27 1 2 4 2 0 0 BB PRT Resort Hotel 0 72 2015 July 27 1 2 4 2 0 0 BB PRT
This is the few columns in my dataset , from these how cani show the plots for people arrived to both hotels in all three years from various countries in the dataset?
Please help me to solve this!
Thanks
Now describe what you mean by "for people arrived to both hotels in all three years"
There is not way I can see from that data to identify if any particular person arrived at any hotel in any given year.
So do you mean totals of some sort? That will require some sort of summary and filter likely.
I might guess that you want to display the total by hotel by year. Scatter plots will not summarize data. You would have to do that prior to plotting the data. And if you mean "people" to be a total of adults, children and babies you will need to sum those prior to plotting as well.
maybe something like (untested as data step not provided and you only show one "hotel" value so incomplete example)
data temp; set hotel.hotel_bookings; people = sum(adults,children,babies); run; proc summary data=temp nway; class hotel country arrival_date_year; var people; output out =work.plot (drop=_type_ _freq_) sum=; run; proc sgplot data=work.plot; scatter x=arrival_date_year y=people/ group=hotel datalabel=country; title'guest arrived from countries in three years'; run;
And does the Is_cancelled variable have any role in this process?
Thanks for making me clear! Sorry i didnt explain my variables properly.The variable is_canceled contains 0 an 1 where 0 is bookings that are not canceled and 1 is canceled bookings. Hotel variables contains value city hotel and resort hotel.Actually my dataset has more than 1lakhs observations thats why i couldn't post here.
My question is, is it possible to show in plots the number of people arrived in three years for both hotels??
Hi
When i tried your code i got the output like this.
May i know the explanation of this line in the code please:
output out =work.plot (drop=_type_ _freq_) sum=;
Can i use the library i created insted of work library in the code? And i am not clear with this (drop=_type_ _freq_) sum=;
Could you please explain??
Thanks in advance!
Because you have discrete variables, I suggest a bar chart instead of a scatter plot. You can either use a stacked bar chart or a cluster bar chart. The stacked bars are probably better if you have many countries. For more information, see "Bar Charts with Stacked and Cluster Groups."
You don't say how you want the data displayed, so I chose two charts (one for each type of hotel) that shows the number of visitors from each country for each year. If you want the data displayed in some other way, the code can be modified:
/* Create sample data. I use a frequency variable (FREQ), but
the bar chart will aggregate if the data set contains
one observation per guest. */
data bookings;
call streaminit(1);
length country $15 hotel $6;
do arrival_date_year = 2015 to 2017;
do country = "US", "UK", "China", "Japan";
do hotel = "City", "Resort";
Freq = rand("Poisson", 100);
output;
end;
end;
end;
run;
proc sort data=bookings;
by hotel;
run;
title'Guest arrived from countries in three years';
proc sgplot data=bookings;
by hotel;
vbar arrival_date_year / response=Freq group=country
groupdisplay=stack seglabel;
xaxis display=(nolabel);
yaxis grid;
run;
proc sgplot data=bookings;
by hotel;
vbar arrival_date_year / response=Freq group=country
groupdisplay=cluster;
xaxis display=(nolabel);
yaxis grid;
run;
Thanks for the solution but i dont understand the data step u provided. As the dataset is already imported into sas then why to create this step here?
data bookings;
call streaminit(1);
length country $15 hotel $6;
do arrival_date_year = 2015 to 2017;
do country = "US", "UK", "China", "Japan";
do hotel = "City", "Resort";
Freq = rand("Poisson", 100);
output;
end;
Because I don't have access to your data and you didn't provide data in a format that I could use. You can ignore the DATA step. It is for me and others who do not have access to your data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.