- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have a dataset which has a variable that by default is null, but is changed to one of 6 or 7 category names should an incident occur.
I want to count the number of records in each category, including the default null category. Here's my code.
proc sql;
create table no_in_cats as
select
var1,
count(var1) as no_in_cat
from dataset
group by var1;
quit;
I get output like this:
var1 no_in_cat
0
A 65
B 46596
C 3229
D 26
E 104
F 102317
I know that the number of nulls should be the largest number by far, so these results are concerning!
Any idea about where I'm going wrong?
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Its hard to say, can't see any data. Your sql:
proc sql; create table no_in_cats as select var1,count(*) as no_in_cat from dataset group by var1; quit;
Seems fine (though wouldn't call a dataset "dataset"). Perhaps post some test data in the form of a datastep so we have something to see.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Its hard to say, can't see any data. Your sql:
proc sql; create table no_in_cats as select var1,count(*) as no_in_cat from dataset group by var1; quit;
Seems fine (though wouldn't call a dataset "dataset"). Perhaps post some test data in the form of a datastep so we have something to see.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The * you have in your count function is not in my code, I changed that and as a result my code works! Why is this?
Thanks.
P.S. I agree with your point about datasets, I changed the names in my code for the post 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I believe it is because nulls are not counted:
https://www.techonthenet.com/sql/count.php
I always tend to use * - or check all variables - to do a count of rows in case blanks are present.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
COUNT(varname) will count the number of non-missing values for a variable. To just count observations use COUNT(*). I have seen some people put a constant there instead of the *. So COUNT(1). Although COUNT(5) would work just the same as COUNT(1).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I believe that when you put a number in like that it noramlly reflects the logical position of the variable in the dataset, so:
group by 1,3,4;
Groups by variable which is found in position 1 of the data, and then by position 3 variable, then by position 4 variable. I would avoid such notation as a change to the data can drastically alter code output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use position numbers in GROUP BY or ORDER BY clauses. Not in aggregate function calls.
Test code
data test;
set sashelp.class ;
if _n_ in (1,3) then call missing(age);
if _n_ in (2,3,4) then call missing(height);
run;
proc sql;
select
count(*) as count_star
, count(1) as count_1
, count(1000) as count_1000
, count(.) as count_missing
, count(' ') as count_blank
, count(name) as count_name
, count(age) as count_age
, count(height) as count_height
from test
;
quit;