Solved: proc sql count non missing observations in variable by group

trevand · Posted 02-03-2025 06:35 PM

How can I count the number of non-missing values in a variable by group using proc sql?

Here is the code that count non-missing values for one variable:

proc sql;
create table WANT_1 as
select count(VARIABLE) as NON_MISSING
from HAVE;
quit;

And here is the code that counts the number of observations by group:

proc sql;
create table WANT_2 as
select count(case when (GROUP=1) then COUNT_ID end) as GROUP
from HAVE;
quit;

But I can manage to combine the two.

Tom · Posted 02-03-2025 11:29 PM

So I would like to count the number of distinct ids with non-missing var for group 1.

proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;

Of if the data is sorted by ID then skip the SQL and just use normal SAS code.

data _null_;
  if eof then put count= ;
  set have end=eof;
  by id;
  where group=1 and not missing(var);
  count + first.id;
run;

View solution in original post

Patrick · Posted 02-03-2025 07:24 PM

Like below?

data have;
	call streaminit(5);
	do i= 1 to 10;
		do group_var=1 to 2;
			variable=rand('integer',1,10);
			if variable in (1,10) then call missing(variable);
			output;
		end;
	end;
	stop;
run;

proc sql;
/* create table WANT_1 as */
	select 
		group_var
		,count(variable) as NON_MISSING
		,count(distinct variable) as NON_MISSING_distinct_vals
	from HAVE
	group by group_var;
quit;

Tom · Posted 02-03-2025 08:32 PM

What does GROUP=1 have to do with the original question?

To count BY a group use the GROUP BY clause in your query to tell it how to define the groups.

To count distinct values use the DISTICT keyword inside the COUNT() function.

So if your grouping is by GROUPVAR and you want to count how many non-missing value and how many distinct values there are of VAR then the code could look like this:

proc sql;
create table want as
select groupvar
     , count(*) as nobs
     , count(var) as n_non_missing_var
     , count(distinct var) as n_distinct_var
from have
group by groupvar
;
quit;

trevand · Posted 02-03-2025 10:29 PM

@Tom @Patrick I would like to count the number of distinct non-missing observations for the first group. So let's say you have three groups, group would equal 1, 2, or 3. So then I would like to count only when group=1.

Tom · Posted 02-03-2025 11:09 PM

@trevand wrote:

@Tom @Patrick I would like to count the number of distinct non-missing observations for the first group. So let's say you have three groups, group would equal 1, 2, or 3. So then I would like to count only when group=1.

What is the MEANING of only counting when GROUP=1?

What about the observations that have GROUP=2 or GROUP=3?

If you only want to count the observations that have GROUP=1 then add a WHERE condition to the query to eliminate the other observations from being used.

And what are you counting? What do you mean by "distinct non-missing observations"? Do you mean if you have two observations that have the same values for all variables they should count only once?

So something like:

data have;
  input group var1-var4 ;
cards;
1 1 2 3 4
1 1 2 3 4
1 1 2 3 4
;

Would count as only 1.

But something like

data have;
  input group var1-var4 ;
cards;
1 1 2 3 5
1 1 2 6 4
1 1 2 6 5
;

would count as 3 since since there are three distinct observations, even though no individual variable has more than 2 distinct values.

trevand · Posted 02-03-2025 11:18 PM

@Tom sorry I should have been more clear. I have three variables: group, id, and var. So I would like to count the number of distinct ids with non-missing var for group 1. Here are two examples:

Here group 1 would 3 observations that have a non-missing ids.

group	id	var
1	1	10
1	2	12
1	3	13
1	4	.

Here group 1 would have only one non missing observations:

group	id	var
1	1	10
1	1	12
1	1	13
1	1	.

Tom · Posted 02-03-2025 11:29 PM

So I would like to count the number of distinct ids with non-missing var for group 1.

proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;

Of if the data is sorted by ID then skip the SQL and just use normal SAS code.

data _null_;
  if eof then put count= ;
  set have end=eof;
  by id;
  where group=1 and not missing(var);
  count + first.id;
run;

Tom · Posted 02-03-2025 11:46 PM

Here group 1 would 3 observations that have a non-missing ids.

group id var

1 1 10

1 2 12

1 3 13

1 4 .

No, all of the values of ID are not missing in both of your examples.

In this first one you have 4 DISTINCT values of ID. If you eliminate the observations that have a missing value of VAR then that count is reduced to 3 distinct values of ID.

Here group 1 would have only one non missing observations:

group id var

1 1 10

1 1 12

1 1 13

1 1 .

In this example you have only one DISTINCT value of ID. Eliminating the observations with missing values of VAR will not change the count in this case.

trevand · Posted 02-04-2025 09:58 AM

Not sure about your last comment but this code:

proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;

gets exactly what I need: 3 observations from this data set

group	id	var
1	1	10
1	2	12
1	3	13
1	4	.

and 2 observations from this data set

group	id	var
1	1	10
1	4	12
1	4	13
1	4	.

proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Re: proc sql count non missing observations in variable by group

Registration is open