turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- count number of observations with certain values

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2016 04:21 PM

```
proc sort data=mydata;
by age;
run;
data count;
set mydata;
by age;
zero=0;
hundred=0;
if age <0 then zero+1;
else if age >100 then hundred+1;
run;
```

with above code, I can get zero and hundred as two new variables added in the data set. I actually only want to have a simple number of how many cases with age <0 and how many with age>100. How to do that?

Thanks!

Accepted Solutions

Solution

01-08-2016
04:26 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to fengyuwuzu

01-08-2016 04:23 PM

proc sql;

select count(*) as N_obs

from mydata

where age >100;

quit;

this works

select count(*) as N_obs

from mydata

where age >100;

quit;

this works

All Replies

Solution

01-08-2016
04:26 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to fengyuwuzu

01-08-2016 04:23 PM

proc sql;

select count(*) as N_obs

from mydata

where age >100;

quit;

this works

select count(*) as N_obs

from mydata

where age >100;

quit;

this works

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to fengyuwuzu

01-09-2016 10:07 AM

Great that you found a good solution using PROC SQL.

Please note that your data step approach could be simplified considerably:

- You don't actually perform BY group processing. Hence, the BY statement in the data step and, most importantly, the (possibly resource-intensive) PROC SORT step could be omitted.
- The initialization of ZERO and HUNDRED to 0 is redundant. It is implicitly done if you use the "
*variable*+*increment*" syntax (sum statement). - If you like your code very short, you could make use of the fact that logical expressions evaluate to 0 (if they are false) or 1 (if they are true).

With these suggestions your initial code could have looked like this:

```
data count;
set mydata;
zero+(age<0);
hundred+(age>100);
run;
```

You should be aware that not only negative values, but also missing values of AGE would satisfy the condition AGE<0. If you want to restrict your count to negative values, the third line should read zero+(.<age<0) or, to exclude special missing values as well, zero+(.z<age<0).

Of course, you don't need to create a new (possibly large) dataset COUNT to count those observations. You could avoid this with a data step as the following:

```
data _null_;
set mydata end=last;
zero+(age<0);
hundred+(age>100);
if last then put zero= hundred=;
run;
```

By default, the PUT statement would write the desired numbers to the log (not to the output window like PROC SQL).