Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Proc Corr not excluding rows with zeroes

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-28-2017 03:29 PM
(962 views)

Firstly, I just started using SAS two weeks ago. Apologies if this question is basic. Secondly, I am using Enterprise Guide 7.1. Thirdly, I am not getting errors.

I am trying to correlate four independent variables (IV) agaisnt a depedent variable (DV). I know that the independent variables have zeroes and I don't want to include those values when the Pearson correlation coefficients are calculated.

The dependent variable contains only non negative rational numbers, and so do the independent variables (include zeroes though).

This is what I am running:

ODS GRAPHICS ON;

PROC SORT DATA=WORK.mf15126(KEEP= DV IV1 IV2 IV3 IV4 ProductCode) OUT=WORK.SORTTempTableSorted ; BY ProductCode; RUN; PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; BY ProductCode; WHERE ProductCode eq "blah"; VAR DV; WITH IV1 IV2 IV3 IV4; RUN;

That spits some values. But when I try the same code without EXCLNPWGT, the correlation coefficients are the same. Also, the scatterplot shows me that SAS is considering zeros for the IVs for the plotting.

I the went to R, separated all my DV, IV pairs into different dfs, then drop rows with zeroes, then run the corr function, and the results were different.

Can someone please tell me if I am doing something wrong in the code (or if I am not using the proc corr the way I am supposed to)?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:

PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; WHERE ProductCode eq "blah" and IV1>0; VAR DV; WITH IV1 ; RUN;

If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.

Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).

Without seeing actual data, actual output and desired output it is hard to say what else may be going on.

Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To exclude values from calculations you would need to assign a value of missing instead of 0. Or you could exclude rows with zero for the offending variable. If you use this approach you would want to do one variable at a time:

PROC CORR DATA=WORK.SORTTempTableSorted PLOTS=SCATTER PEARSON EXCLNPWGT VARDEF=DF ; WHERE ProductCode eq "blah" and IV1>0; VAR DV; WITH IV1 ; RUN;

If you exclude on two variables you would likely remove valid values for one or the other for Corr calculations.

Note that if you have a Where ProductCode = 'blah' then the BY ProductCode is meaningless. By really is more useful with 2 or more levels fo the BY variable (especially in procs).

Without seeing actual data, actual output and desired output it is hard to say what else may be going on.

Likely to work an example by hand you don't want to work with may rows of data and may only want one of the IV variables.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for that explanation. That's exactly what I did in R. I was hoping SAS had a more automatic way of doing that but I guess not.

I was trying to graph different variables as part of my EDA to verify some sort of correlation before inputting the dataframe into the algorithm.

Also, thanks for the tip with the By statement!

Thank you!

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.