SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Why does this program shrink the sample size and what is the correct alternative?

Reply
Frequent Contributor
Posts: 75

Why does this program shrink the sample size and what is the correct alternative?

After running this program:

data temp; set temp; where VAR1 ne . and VAR2 ne .;

VAR3 = VAR1 - VAR2; run;

the dataset temp only consists of observations where VAR1 and VAR2 have non-missing values. But my intention is just to generate VAR3 with valid values (i.e., if we include missing VAR1 or VAR2 in the calculation, the generated VAR3 will be missing as well).

So as the title says, why does this program shrink the sample size and what is the correct alternative?

Contributor
Posts: 39

Re: Why does this program shrink the sample size and what is the correct alternative?

Hi,

You should use WHERE ALSO condition, where can check only one condition. you program should be

DATA TEMP;

SET TEMP;

WHERE VAR1 NE .;

WHERE ALSO VAR2 NE .;

VAR3 = VAR1 - VAR2;

RUN;

As your code deletes missings values in VAR1 only, missing values in VAR2 will be retained, so you might ended up with missing values in VAR2, this results missing values in VAR3 as well.

Where also - sasCommunity

Frequent Contributor
Posts: 75

Re: Why does this program shrink the sample size and what is the correct alternative?

I run WHERE ALSO but for some reason the result is the same.

Super User
Posts: 6,962

Re: Why does this program shrink the sample size and what is the correct alternative?

Your where condition excludes certain records where values are missing, so the dataset size will shrink if such missing values are present. Works as designed.

If you don't want to shrink the dataset, omit the where clause.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 858

Re: Why does this program shrink the sample size and what is the correct alternative?

I would use:

DATA TEMP;

SET TEMP;

     WHERE VAR1 NE . and

                  VAR2 NE .;

VAR3 = VAR1 - VAR2;

RUN;

if you have any missing values this will most definitely shrink your output compared to the input,

depending on your needs you may be able to use:

data temp2;

set temp;

var3 = coalesce(var1,0)-coalesce(var2,0;

run;

Ask a Question
Discussion stats
  • 4 replies
  • 340 views
  • 3 likes
  • 4 in conversation