How can I count the number of non-missing values in a variable by group using proc sql?
Here is the code that count non-missing values for one variable:
proc sql;
create table WANT_1 as
select count(VARIABLE) as NON_MISSING
from HAVE;
quit;
And here is the code that counts the number of observations by group:
proc sql;
create table WANT_2 as
select count(case when (GROUP=1) then COUNT_ID end) as GROUP
from HAVE;
quit;
But I can manage to combine the two.
So I would like to count the number of distinct ids with non-missing var for group 1.
proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;
Of if the data is sorted by ID then skip the SQL and just use normal SAS code.
data _null_;
if eof then put count= ;
set have end=eof;
by id;
where group=1 and not missing(var);
count + first.id;
run;
Like below?
data have;
call streaminit(5);
do i= 1 to 10;
do group_var=1 to 2;
variable=rand('integer',1,10);
if variable in (1,10) then call missing(variable);
output;
end;
end;
stop;
run;
proc sql;
/* create table WANT_1 as */
select
group_var
,count(variable) as NON_MISSING
,count(distinct variable) as NON_MISSING_distinct_vals
from HAVE
group by group_var;
quit;
What does GROUP=1 have to do with the original question?
To count BY a group use the GROUP BY clause in your query to tell it how to define the groups.
To count distinct values use the DISTICT keyword inside the COUNT() function.
So if your grouping is by GROUPVAR and you want to count how many non-missing value and how many distinct values there are of VAR then the code could look like this:
proc sql;
create table want as
select groupvar
, count(*) as nobs
, count(var) as n_non_missing_var
, count(distinct var) as n_distinct_var
from have
group by groupvar
;
quit;
@trevand wrote:
@Tom @Patrick I would like to count the number of distinct non-missing observations for the first group. So let's say you have three groups, group would equal 1, 2, or 3. So then I would like to count only when group=1.
What is the MEANING of only counting when GROUP=1?
What about the observations that have GROUP=2 or GROUP=3?
If you only want to count the observations that have GROUP=1 then add a WHERE condition to the query to eliminate the other observations from being used.
And what are you counting? What do you mean by "distinct non-missing observations"? Do you mean if you have two observations that have the same values for all variables they should count only once?
So something like:
data have;
input group var1-var4 ;
cards;
1 1 2 3 4
1 1 2 3 4
1 1 2 3 4
;
Would count as only 1.
But something like
data have;
input group var1-var4 ;
cards;
1 1 2 3 5
1 1 2 6 4
1 1 2 6 5
;
would count as 3 since since there are three distinct observations, even though no individual variable has more than 2 distinct values.
@Tom sorry I should have been more clear. I have three variables: group, id, and var. So I would like to count the number of distinct ids with non-missing var for group 1. Here are two examples:
Here group 1 would 3 observations that have a non-missing ids.
group | id | var |
1 | 1 | 10 |
1 | 2 | 12 |
1 | 3 | 13 |
1 | 4 | . |
Here group 1 would have only one non missing observations:
group | id | var |
1 | 1 | 10 |
1 | 1 | 12 |
1 | 1 | 13 |
1 | 1 | . |
So I would like to count the number of distinct ids with non-missing var for group 1.
proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;
Of if the data is sorted by ID then skip the SQL and just use normal SAS code.
data _null_;
if eof then put count= ;
set have end=eof;
by id;
where group=1 and not missing(var);
count + first.id;
run;
Here group 1 would 3 observations that have a non-missing ids.
group id var 1 1 10 1 2 12 1 3 13 1 4 .
No, all of the values of ID are not missing in both of your examples.
In this first one you have 4 DISTINCT values of ID. If you eliminate the observations that have a missing value of VAR then that count is reduced to 3 distinct values of ID.
Here group 1 would have only one non missing observations:
group id var 1 1 10 1 1 12 1 1 13 1 1 .
In this example you have only one DISTINCT value of ID. Eliminating the observations with missing values of VAR will not change the count in this case.
Not sure about your last comment but this code:
proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;
gets exactly what I need: 3 observations from this data set
group | id | var |
1 | 1 | 10 |
1 | 2 | 12 |
1 | 3 | 13 |
1 | 4 | . |
and 2 observations from this data set
group | id | var |
1 | 1 | 10 |
1 | 4 | 12 |
1 | 4 | 13 |
1 | 4 | . |
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.