I have a list of 50 variables.
I did a proc freq to do counts of missing/not missing values for each variable individually.
I want to do counts of missing/not missing values for multiple variables simultaneously.
I think I need to use a macro.
What I want is something like this:
proc format; value missfmt .,0 ='Missing' other='Not Missing';
proc freq data=have; format var1 missfmt.; tables var1 / missing missprint nocum;
then I want to add var2 and do a proc freq but the not missing would be that both var1 and var2 are not missing while missing would be that either var1 or var2 is missing.
then I want to add var3, and so on until var50.
My idea is to see how the number of observations decreases as one extra variable is added.
Any suggestions on how to do this?
First point is that your test is not whether the values is missing or not. Instead you are testing if the value is TRUE or not. SAS will treat missing or zero as FALSE and any other value as TRUE.
So you want to count the number of observations that have TRUE values for V1 and then TRUE values for V1 and V2. Looks like a simple candidate for a DO loop.
Let's make up some data and try it.
data have ;
input v1-v4 ;
cards;
1 2 3 4
1 0 6 7
. 8 9 10
1 2 0 5
6 7 8 9
;
data want;
set have end=eof;
array vars v1-v4;
array nonmiss[4] ;
do index=1 to dim(vars) while (vars[index]);
nonmiss[index]+1;
end;
if eof;
keep nonmiss: ;
run;
Results:
OBS nonmiss1 nonmiss2 nonmiss3 nonmiss4 1 4 3 2 2
If you have to use PROC FREQ, then I agree that you will need a macro for this. However, PROC FREQ isn't needed, you can do most of this in a data step with ARRAYs. If you just add the values of var1+var2, the value will be missing if either var1 or var2 will is missing. So this gives you the consecutive ability to count missing across all columns. This code is UNTESTED. You will likely get a lot of notes or warnings in the log about missing values.
data temporary;
set have;
array v var1-var50;
do i=2 to dim(v);
do j=1 to i-1;
v(i)=v(i)+v(j);
end;
end;
drop i j;
run;
proc summary data=temporary;
var v1-v50;
output out=missings nmiss=/autoname;
run;
If you need to count zero as missing, you would add this as the first line in the DO J loop
if v(j)=0 then call missing(v(j));
You can get the result you want something like this;
data want;
array counts(*) count1-count50;
retain count1-count50 0;
set have end=done;
array vars(*) var1-var50;
do i=1 to dim(vars) while(vars(i) not in(.,0));
counts(i)+1;
end;
if done;
total=_N_;
keep total count1-count50;
run;
There will be just one row in the output. The variable TOTAL will contain the total number of observations, the variables COUNT1-COUNT50 will contain the number of observations with no missing values for 1,2... up to all 50 variables. If you want the number of observations with missing values as well, subtract the counts from the total.
First point is that your test is not whether the values is missing or not. Instead you are testing if the value is TRUE or not. SAS will treat missing or zero as FALSE and any other value as TRUE.
So you want to count the number of observations that have TRUE values for V1 and then TRUE values for V1 and V2. Looks like a simple candidate for a DO loop.
Let's make up some data and try it.
data have ;
input v1-v4 ;
cards;
1 2 3 4
1 0 6 7
. 8 9 10
1 2 0 5
6 7 8 9
;
data want;
set have end=eof;
array vars v1-v4;
array nonmiss[4] ;
do index=1 to dim(vars) while (vars[index]);
nonmiss[index]+1;
end;
if eof;
keep nonmiss: ;
run;
Results:
OBS nonmiss1 nonmiss2 nonmiss3 nonmiss4 1 4 3 2 2
How can I do this if I want to mix both numeric and character variables?
FYI - this post may have some interesting visualization and analysis options for you to explore your patterns of missing data.
https://blogs.sas.com/content/iml/2017/11/29/visualize-patterns-missing-values.html
@Satori wrote:
I have a list of 50 variables.
I did a proc freq to do counts of missing/not missing values for each variable individually.
I want to do counts of missing/not missing values for multiple variables simultaneously.
I think I need to use a macro.
What I want is something like this:
proc format; value missfmt .,0 ='Missing' other='Not Missing';
proc freq data=have; format var1 missfmt.; tables var1 / missing missprint nocum;
then I want to add var2 and do a proc freq but the not missing would be that both var1 and var2 are not missing while missing would be that either var1 or var2 is missing.
then I want to add var3, and so on until var50.
My idea is to see how the number of observations decreases as one extra variable is added.
Any suggestions on how to do this?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.