BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
trevand
Obsidian | Level 7

How can I count the number of non-missing values in a variable by group using proc sql?

 

Here is the code that count non-missing values for one variable:

 

proc sql;
create table WANT_1 as
select count(VARIABLE) as NON_MISSING
from HAVE;
quit;

 And here is the code that counts the number of observations by group:

 

proc sql;
create table WANT_2 as
select count(case when (GROUP=1) then COUNT_ID end) as GROUP
from HAVE;
quit;

But I can manage to combine the two.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

 So I would like to count the number of distinct ids with non-missing var for group 1.

proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;

Of if the data is sorted by ID then skip the SQL and just use normal SAS code.

data _null_;
  if eof then put count= ;
  set have end=eof;
  by id;
  where group=1 and not missing(var);
  count + first.id;
run;

View solution in original post

8 REPLIES 8
Patrick
Opal | Level 21

Like below?

data have;
	call streaminit(5);
	do i= 1 to 10;
		do group_var=1 to 2;
			variable=rand('integer',1,10);
			if variable in (1,10) then call missing(variable);
			output;
		end;
	end;
	stop;
run;

proc sql;
/* create table WANT_1 as */
	select 
		group_var
		,count(variable) as NON_MISSING
		,count(distinct variable) as NON_MISSING_distinct_vals
	from HAVE
	group by group_var;
quit;
Tom
Super User Tom
Super User

What does GROUP=1 have to do with the original question?

 

To count BY a  group use the GROUP BY clause in your query to tell it how to define the groups.

To count distinct values use the DISTICT keyword inside the COUNT() function.

So if your grouping is by GROUPVAR and you want to count how many non-missing value and how many distinct values there are of VAR then the code could look like this:

proc sql;
create table want as
select groupvar
     , count(*) as nobs
     , count(var) as n_non_missing_var
     , count(distinct var) as n_distinct_var
from have
group by groupvar
;
quit;
trevand
Obsidian | Level 7

@Tom @Patrick I would like to count the number of distinct non-missing observations for the first group. So let's say you have three groups, group would equal 1, 2, or 3. So then I would like to count only when group=1.

 

Tom
Super User Tom
Super User

@trevand wrote:

@Tom @Patrick I would like to count the number of distinct non-missing observations for the first group. So let's say you have three groups, group would equal 1, 2, or 3. So then I would like to count only when group=1.

 


What is the MEANING of only counting when GROUP=1?

What about the observations that have GROUP=2 or GROUP=3?

If you only want to count the observations that have GROUP=1 then add a WHERE condition to the query to eliminate the other observations from being used.

 

And what are you counting?  What do you mean by "distinct non-missing observations"?  Do you mean if you have two observations that have the same values for all variables they should count only once?

So something like:

data have;
  input group var1-var4 ;
cards;
1 1 2 3 4
1 1 2 3 4
1 1 2 3 4
;

Would count as only 1.

But something like

data have;
  input group var1-var4 ;
cards;
1 1 2 3 5
1 1 2 6 4
1 1 2 6 5
;

would count as 3 since since there are three distinct observations, even though no individual variable has more than 2 distinct values.

trevand
Obsidian | Level 7

@Tom sorry I should have been more clear. I have three variables: group, id, and var. So I would like to count the number of distinct ids with non-missing var for group 1. Here are two examples: 

 

Here group 1 would 3 observations that have a non-missing ids.

 

group id var
1 1 10
1 2 12
1 3 13
1 4

 

Here group 1 would have only one non missing observations:

 

group id var
1 1 10
1 1 12
1 1 13
1 1

 

Tom
Super User Tom
Super User

 So I would like to count the number of distinct ids with non-missing var for group 1.

proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;

Of if the data is sorted by ID then skip the SQL and just use normal SAS code.

data _null_;
  if eof then put count= ;
  set have end=eof;
  by id;
  where group=1 and not missing(var);
  count + first.id;
run;
Tom
Super User Tom
Super User

Here group 1 would 3 observations that have a non-missing ids.

 

group id var
1 1 10
1 2 12
1 3 13
1 4

 

 

No, all of the values of ID are not missing in both of your examples.

In this first one you have 4 DISTINCT values of ID.  If you eliminate the observations that have a missing value of VAR then that count is reduced to 3 distinct values of ID.

 

Here group 1 would have only one non missing observations:

 

group id var
1 1 10
1 1 12
1 1 13
1 1

 

In this example you have only one DISTINCT value of ID.  Eliminating the observations with missing values of VAR will not change the count in this case.

 

trevand
Obsidian | Level 7

Not sure about your last comment but this code:

 

proc sql;
select count(distinct ID)
from have
where group=1 and not missing(var)
;
quit;

gets exactly what I need:  3 observations from this data set

 

group id var
1 1 10
1 2 12
1 3 13
1 4

 

and 2 observations from this data set

 

group id var
1 1 10
1 4 12
1 4 13
1 4

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 519 views
  • 0 likes
  • 3 in conversation