BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tester_777
Calcite | Level 5

Firstly, I just started using SAS two weeks ago. Apologies if this question is basic. Secondly, I am using Enterprise Guide 7.1. Thirdly, I am not getting errors.

 

I am trying to correlate four independent variables (IV) agaisnt a depedent variable (DV). I know that the independent variables have zeroes and I don't want to include those values when the Pearson correlation coefficients are calculated. 

 

The dependent variable contains only non negative rational numbers, and so do the independent variables (include zeroes though).

 

This is what I am running:

 

ODS GRAPHICS ON;

PROC SORT DATA=WORK.mf15126(KEEP= DV IV1 IV2 IV3 IV4 ProductCode) OUT=WORK.SORTTempTableSorted ; BY ProductCode; RUN; PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; BY ProductCode; WHERE ProductCode eq "blah"; VAR DV; WITH IV1 IV2 IV3 IV4; RUN;

That spits some values. But when I try the same code without EXCLNPWGT, the correlation coefficients are the same. Also, the scatterplot shows me that SAS is considering zeros for the IVs for the plotting.

 

I the went to R, separated all my DV, IV pairs into different dfs, then drop rows with zeroes, then run the corr function, and the results were different.

 

Can someone please tell me if I am doing something wrong in the code (or if I am not using the proc corr the way I am supposed to)?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:

PROC CORR DATA=WORK.SORTTempTableSorted
	PLOTS=SCATTER
	PEARSON
	EXCLNPWGT
	VARDEF=DF
	;
	WHERE ProductCode eq "blah" and IV1>0;
	VAR DV;
	WITH IV1 ;
RUN;

If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.

 

Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).

 

 

Without seeing actual data, actual output and desired output it is hard to say what else may be going on.

 

Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

View solution in original post

2 REPLIES 2
ballardw
Super User

To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:

PROC CORR DATA=WORK.SORTTempTableSorted
	PLOTS=SCATTER
	PEARSON
	EXCLNPWGT
	VARDEF=DF
	;
	WHERE ProductCode eq "blah" and IV1>0;
	VAR DV;
	WITH IV1 ;
RUN;

If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.

 

Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).

 

 

Without seeing actual data, actual output and desired output it is hard to say what else may be going on.

 

Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

tester_777
Calcite | Level 5

 

Thanks for that explanation. That's exactly what I did in R. I was hoping SAS had a more automatic way of doing that but I guess not. 

 

I was trying to graph different variables as part of my EDA to verify some sort of correlation before inputting the dataframe into the algorithm. 

 

Also, thanks for the tip with the By statement!

 

Thank you!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1052 views
  • 2 likes
  • 2 in conversation