Hi all,
I have a dataset which has a variable that by default is null, but is changed to one of 6 or 7 category names should an incident occur.
I want to count the number of records in each category, including the default null category. Here's my code.
proc sql;
create table no_in_cats as
select
var1,
count(var1) as no_in_cat
from dataset
group by var1;
quit;
I get output like this:
var1 no_in_cat
0
A 65
B 46596
C 3229
D 26
E 104
F 102317
I know that the number of nulls should be the largest number by far, so these results are concerning!
Any idea about where I'm going wrong?
Thanks
Its hard to say, can't see any data. Your sql:
proc sql; create table no_in_cats as select var1,count(*) as no_in_cat from dataset group by var1; quit;
Seems fine (though wouldn't call a dataset "dataset"). Perhaps post some test data in the form of a datastep so we have something to see.
Its hard to say, can't see any data. Your sql:
proc sql; create table no_in_cats as select var1,count(*) as no_in_cat from dataset group by var1; quit;
Seems fine (though wouldn't call a dataset "dataset"). Perhaps post some test data in the form of a datastep so we have something to see.
Well, I believe it is because nulls are not counted:
https://www.techonthenet.com/sql/count.php
I always tend to use * - or check all variables - to do a count of rows in case blanks are present.
COUNT(varname) will count the number of non-missing values for a variable. To just count observations use COUNT(*). I have seen some people put a constant there instead of the *. So COUNT(1). Although COUNT(5) would work just the same as COUNT(1).
I believe that when you put a number in like that it noramlly reflects the logical position of the variable in the dataset, so:
group by 1,3,4;
Groups by variable which is found in position 1 of the data, and then by position 3 variable, then by position 4 variable. I would avoid such notation as a change to the data can drastically alter code output.
You can use position numbers in GROUP BY or ORDER BY clauses. Not in aggregate function calls.
Test code
data test;
set sashelp.class ;
if _n_ in (1,3) then call missing(age);
if _n_ in (2,3,4) then call missing(height);
run;
proc sql;
select
count(*) as count_star
, count(1) as count_1
, count(1000) as count_1000
, count(.) as count_missing
, count(' ') as count_blank
, count(name) as count_name
, count(age) as count_age
, count(height) as count_height
from test
;
quit;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.