BookmarkSubscribeRSS Feed
Amali6
Quartz | Level 8

Hi all,

 I have tried the proc corr for the first time, to find relationship between two variables in my dataset. I used the following code and i got this output. I don't understand is this the increasing trend or decreasing mainly I wanted to check

ods graphics on;
title'To find relation between variables';
proc corr data=hotel.Hotel_bookings plots(MAXPOINTS=NONE)=all;
var lead_time booking_changes;
run;
ods graphics off;

whether is the right output please? 

Amali6_0-1590172504686.png

Amali6_1-1590172566402.png

Can someone please explain is this the right plots for this variables and this how it looks like? 

 

 

Thank you

15 REPLIES 15
PGStats
Opal | Level 21

What this statistic is telling you is that there is a very weak, but not statistically significant, tendency of booking changes to increase with lead time.

 

The scatter graph might be more visually informative if you added some jitter to the discrete booking change values and made the aspect ratio of the graph closer to a square.

PG
Reeza
Super User
I suspect it's only statistically significant because you have a large number of observations, 119330 obs.

However, there is no clear linear relationship present between lead time and hotel bookings given the graph shown.
Amali6
Quartz | Level 8

Thank you very much for the response.  I am trying to understand the correlation so i wanted to know in this scatter graph the p-value is 0.9590 how this is very weak, sorry i cant understand on what aspect to say when a plot correlation graph is strong or weak. I tried with some other variables in my dataset but the output looks like a line between two variables in between. Also could you please explain why the minimum value is 0. 

 

Thank you

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

There is no observable relationship between the two variables. And not significant.

But be careful: with so many observations, it is easy to find significant correlations even when there is no meaningful or predictable relationship.

Amali6
Quartz | Level 8
Thank you. since mine is big dataset, the plots are showing very close and i cant understand clearly. Could you please explain how to say how to say there is relationship and not related?

Thanks.
Reeza
Super User

There is no linear relationship between your two variables.

Reeza
Super User

Your correlation is 0.00015 so nearly zero and your p-value is 0.9590. 

This means you have no linear correlation and it's not statistically significant. 

Amali6
Quartz | Level 8
Thank you very much for explaining. I have tried with other variables in there i got a line between left and right side of the graph, can you help what kind of relationship to say in that case please?

Thanks
Reeza
Super User

@Amali6 wrote:
Thank you very much for explaining. I have tried with other variables in there i got a line between left and right side of the graph, can you help what kind of relationship to say in that case please?

Thanks

Not without seeing the graphs and numbers.

Amali6
Quartz | Level 8

Sorry this is what i got.

Amali6_0-1590183712363.png

Thanks 

ballardw
Super User

You might try SPEARMAN correlation which is more concerned with direction of change than magnitude.

A Spearman correlation close to 1 means that as one variable increases in value so does the other, or a value close to -1 means that when one variable increases the other decreases.

 

You can see if this is interesting by adding the option SPEARMAN to the Proc Corr statement.

 

Many times the plots the statistics procs generate are not the clearest.

Try this with your data

proc sgplot data=hotel.hotel_bookings;
   scatter x=lead_time y=booking_changes / 
                      markerattrs=(symbol=circlefilled size=3pt) 
                      transparency=.9 
   ;
run;

The transparency setting close to 1 means that the markers will be very faint. But when multiples are drawn in the same location the color density gets stronger. So sometimes you can see underlying patterns inside the data.

Amali6
Quartz | Level 8
Thank you. Can you explain the condition on correlation how to predict that two variables are correlated or not please?
Amali6
Quartz | Level 8
Thank you very much this helps a lot!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1479 views
  • 7 likes
  • 5 in conversation