turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Proc Corr not excluding rows with zeroes

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

4 weeks ago

Firstly, I just started using SAS two weeks ago. Apologies if this question is basic. Secondly, I am using Enterprise Guide 7.1. Thirdly, I am not getting errors.

I am trying to correlate four independent variables (IV) agaisnt a depedent variable (DV). I know that the independent variables have zeroes and I don't want to include those values when the Pearson correlation coefficients are calculated.

The dependent variable contains only non negative rational numbers, and so do the independent variables (include zeroes though).

This is what I am running:

ODS GRAPHICS ON;

PROC SORT DATA=WORK.mf15126(KEEP= DV IV1 IV2 IV3 IV4 ProductCode) OUT=WORK.SORTTempTableSorted ; BY ProductCode; RUN; PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; BY ProductCode; WHERE ProductCode eq "blah"; VAR DV; WITH IV1 IV2 IV3 IV4; RUN;

That spits some values. But when I try the same code without EXCLNPWGT, the correlation coefficients are the same. Also, the scatterplot shows me that SAS is considering zeros for the IVs for the plotting.

I the went to R, separated all my DV, IV pairs into different dfs, then drop rows with zeroes, then run the corr function, and the results were different.

Can someone please tell me if I am doing something wrong in the code (or if I am not using the proc corr the way I am supposed to)?

Thank you!

Accepted Solutions

Solution

3 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

4 weeks ago

To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:

PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; WHERE ProductCode eq "blah" and IV1>0; VAR DV; WITH IV1 ; RUN;

If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.

Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).

Without seeing actual data, actual output and desired output it is hard to say what else may be going on.

Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

All Replies

Solution

3 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

4 weeks ago

To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:

PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; WHERE ProductCode eq "blah" and IV1>0; VAR DV; WITH IV1 ; RUN;

If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.

Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).

Without seeing actual data, actual output and desired output it is hard to say what else may be going on.

Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Thanks for that explanation. That's exactly what I did in R. I was hoping SAS had a more automatic way of doing that but I guess not.

I was trying to graph different variables as part of my EDA to verify some sort of correlation before inputting the dataframe into the algorithm.

Also, thanks for the tip with the By statement!

Thank you!