Firstly, I just started using SAS two weeks ago. Apologies if this question is basic. Secondly, I am using Enterprise Guide 7.1. Thirdly, I am not getting errors.
I am trying to correlate four independent variables (IV) agaisnt a depedent variable (DV). I know that the independent variables have zeroes and I don't want to include those values when the Pearson correlation coefficients are calculated.
The dependent variable contains only non negative rational numbers, and so do the independent variables (include zeroes though).
This is what I am running:
ODS GRAPHICS ON;
PROC SORT DATA=WORK.mf15126(KEEP= DV IV1 IV2 IV3 IV4 ProductCode) OUT=WORK.SORTTempTableSorted ; BY ProductCode; RUN; PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; BY ProductCode; WHERE ProductCode eq "blah"; VAR DV; WITH IV1 IV2 IV3 IV4; RUN;
That spits some values. But when I try the same code without EXCLNPWGT, the correlation coefficients are the same. Also, the scatterplot shows me that SAS is considering zeros for the IVs for the plotting.
I the went to R, separated all my DV, IV pairs into different dfs, then drop rows with zeroes, then run the corr function, and the results were different.
Can someone please tell me if I am doing something wrong in the code (or if I am not using the proc corr the way I am supposed to)?
Thank you!
To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:
PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; WHERE ProductCode eq "blah" and IV1>0; VAR DV; WITH IV1 ; RUN;
If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.
Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).
Without seeing actual data, actual output and desired output it is hard to say what else may be going on.
Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:
PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; WHERE ProductCode eq "blah" and IV1>0; VAR DV; WITH IV1 ; RUN;
If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.
Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).
Without seeing actual data, actual output and desired output it is hard to say what else may be going on.
Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Thanks for that explanation. That's exactly what I did in R. I was hoping SAS had a more automatic way of doing that but I guess not.
I was trying to graph different variables as part of my EDA to verify some sort of correlation before inputting the dataframe into the algorithm.
Also, thanks for the tip with the By statement!
Thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.