Hi all,
I have a dataset which has a variable that by default is null, but is changed to one of 6 or 7 category names should an incident occur.
I want to count the number of records in each category, including the default null category. Here's my code.
proc sql;
create table no_in_cats as
select
var1,
count(var1) as no_in_cat
from dataset
group by var1;
quit;
I get output like this:
var1 no_in_cat
0
A 65
B 46596
C 3229
D 26
E 104
F 102317
I know that the number of nulls should be the largest number by far, so these results are concerning!
Any idea about where I'm going wrong?
Thanks
Its hard to say, can't see any data. Your sql:
proc sql; create table no_in_cats as select var1,count(*) as no_in_cat from dataset group by var1; quit;
Seems fine (though wouldn't call a dataset "dataset"). Perhaps post some test data in the form of a datastep so we have something to see.
Its hard to say, can't see any data. Your sql:
proc sql; create table no_in_cats as select var1,count(*) as no_in_cat from dataset group by var1; quit;
Seems fine (though wouldn't call a dataset "dataset"). Perhaps post some test data in the form of a datastep so we have something to see.
Well, I believe it is because nulls are not counted:
https://www.techonthenet.com/sql/count.php
I always tend to use * - or check all variables - to do a count of rows in case blanks are present.
COUNT(varname) will count the number of non-missing values for a variable. To just count observations use COUNT(*). I have seen some people put a constant there instead of the *. So COUNT(1). Although COUNT(5) would work just the same as COUNT(1).
I believe that when you put a number in like that it noramlly reflects the logical position of the variable in the dataset, so:
group by 1,3,4;
Groups by variable which is found in position 1 of the data, and then by position 3 variable, then by position 4 variable. I would avoid such notation as a change to the data can drastically alter code output.
You can use position numbers in GROUP BY or ORDER BY clauses. Not in aggregate function calls.
Test code
data test;
set sashelp.class ;
if _n_ in (1,3) then call missing(age);
if _n_ in (2,3,4) then call missing(height);
run;
proc sql;
select
count(*) as count_star
, count(1) as count_1
, count(1000) as count_1000
, count(.) as count_missing
, count(' ') as count_blank
, count(name) as count_name
, count(age) as count_age
, count(height) as count_height
from test
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.