After running this program:
data temp; set temp; where VAR1 ne . and VAR2 ne .;
VAR3 = VAR1 - VAR2; run;
the dataset temp only consists of observations where VAR1 and VAR2 have non-missing values. But my intention is just to generate VAR3 with valid values (i.e., if we include missing VAR1 or VAR2 in the calculation, the generated VAR3 will be missing as well).
So as the title says, why does this program shrink the sample size and what is the correct alternative?
Hi,
You should use WHERE ALSO condition, where can check only one condition. you program should be
DATA TEMP;
SET TEMP;
WHERE VAR1 NE .;
WHERE ALSO VAR2 NE .;
VAR3 = VAR1 - VAR2;
RUN;
As your code deletes missings values in VAR1 only, missing values in VAR2 will be retained, so you might ended up with missing values in VAR2, this results missing values in VAR3 as well.
I run WHERE ALSO but for some reason the result is the same.
Your where condition excludes certain records where values are missing, so the dataset size will shrink if such missing values are present. Works as designed.
If you don't want to shrink the dataset, omit the where clause.
I would use:
DATA TEMP;
SET TEMP;
WHERE VAR1 NE . and
VAR2 NE .;
VAR3 = VAR1 - VAR2;
RUN;
if you have any missing values this will most definitely shrink your output compared to the input,
depending on your needs you may be able to use:
data temp2;
set temp;
var3 = coalesce(var1,0)-coalesce(var2,0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.