DATA Step, Macro, Functions and more

count number of observations with certain values

Accepted Solution Solved
Reply
Super Contributor
Posts: 312
Accepted Solution

count number of observations with certain values

proc sort data=mydata;
by age;
run;

data count;
set mydata;
by age;
zero=0;
hundred=0;
if age <0 then zero+1;
else if age >100 then hundred+1;
run;

with above code, I can get zero and hundred as two new variables added in the data set. I actually only want to have a simple number of how many cases with age <0 and how many with age>100. How to do that?

Thanks!


Accepted Solutions
Solution
‎01-08-2016 04:26 PM
Super Contributor
Posts: 312

Re: count number of observations with certain values

proc sql;
select count(*) as N_obs
from mydata
where age >100;
quit;

this works

View solution in original post


All Replies
Solution
‎01-08-2016 04:26 PM
Super Contributor
Posts: 312

Re: count number of observations with certain values

proc sql;
select count(*) as N_obs
from mydata
where age >100;
quit;

this works
Trusted Advisor
Posts: 1,115

Re: count number of observations with certain values

Great that you found a good solution using PROC SQL.

 

Please note that your data step approach could be simplified considerably:

  • You don't actually perform BY group processing. Hence, the BY statement in the data step and, most importantly, the (possibly resource-intensive) PROC SORT step could be omitted.
  • The initialization of ZERO and HUNDRED to 0 is redundant. It is implicitly done if you use the "variable+increment" syntax (sum statement).
  • If you like your code very short, you could make use of the fact that logical expressions evaluate to 0 (if they are false) or 1 (if they are true).

With these suggestions your initial code could have looked like this:

data count;
set mydata;
zero+(age<0);
hundred+(age>100);
run;

You should be aware that not only negative values, but also missing values of AGE would satisfy the condition AGE<0. If you want to restrict your count to negative values, the third line should read zero+(.<age<0) or, to exclude special missing values as well, zero+(.z<age<0).

 

Of course, you don't need to create a new (possibly large) dataset COUNT to count those observations. You could avoid this with a data step as the following:

data _null_;
set mydata end=last;
zero+(age<0);
hundred+(age>100);
if last then put zero= hundred=;
run;

By default, the PUT statement would write the desired numbers to the log (not to the output window like PROC SQL).

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 491 views
  • 1 like
  • 2 in conversation