How do i aggregate and count a variable where X < 2 using SAS Datastep?
I have the following data:
data driver;
MAD_ID='AAA';
MAD_CLS='BLUE';
MAD_YEAR='2024';
MAD_AGE = 26;
MAD_EXPYR = 7;
MAD_DRVEXPYR1=1;
output;
MAD_ID='AAA';
MAD_CLS='BLUE';
MAD_YEAR='2024';
MAD_AGE = 21;
MAD_EXPYR = 2;
MAD_DRVEXPYR1=1;
output;
MAD_ID='AAA';
MAD_CLS='BLUE';
MAD_YEAR='2024';
MAD_AGE = 56;
MAD_EXPYR = 15;
MAD_DRVEXPYR1=5;
output;
MAD_ID='BBB';
MAD_CLS='GREY';
MAD_YEAR='2025';
MAD_AGE = 18;
MAD_EXPYR = 1;
MAD_DRVEXPYR1=1;
output;
run;
However, the result I am getting with the following SQL command is not what I want to achieve.
proc sql;
create table want as
select *
, count(1) as MAD_DRVEXPYR1_cnt
from driver
group by MAD_ID, MAD_CLS, MAD_YEAR
having MAD_DRVEXPYR1 <2;
quit;
It shows MAD_DRVEXPYR1 count as 3 for the first ID + CLS + YEAR but as we can see in the sample data, there is only 2 row of record for the first ID + CLS + YEAR.
Which part of my logic is wrong?
Also, can i achieve the same thing using SAS Datastep?
Use where instead of having. Having is used on the aggregated data.
proc sql;
create table want as
select *
, count(*) as MAD_DRVEXPYR1_cnt
from driver
where MAD_DRVEXPYR1 <2
group by MAD_ID, MAD_CLS, MAD_YEAR
;
quit;
@StickyRoll wrote:
How do i aggregate and count a variable where X < 2 using SAS Datastep?
Also, can i achieve the same thing using SAS Datastep?
Probably you could do this in a DATA step, but it is a lot easier using PROC FREQ to do counting or PROC SUMMARY to compute the N in each group.
proc summary data=driver nway;
where mad_drvexpyr1<2;
class mad_id mad_cls mad_year;
var mad_age;
output out=counts(drop=_type_ _freq_) n=mad_drvexpyr1_cnt;
run;
Hi @PaigeMiller Thanks for the Proc Freq suggestion. It works. However, may i find out on the "var" part? mad_age is from the source but is not used for the computation. The variable that I am looking at is actually "MAD_DRVEXPYR1" instead. Do i just substitute it with "MAD_DRVEXPYR1" for VAR?
Yes. But what is it you want to count? Your original SQL code was counting the number of observations, whether they were missing or not. Is that what you want? If so the _FREQ_ automatic variable will have that count. The N statistic of PROC SUMMARY (also known as PROC MEANS) will count the number of non missing values, like using count(varname) in SQL code will do. If you don't have any missing values then the automatic variable _FREQ_ will be the same as the N statistic.
But that variable seems to already be a type of count, if I am guessing the meaning properly. Do you instead want to take the SUM of the years of experience?
Or perhaps use that variable as another CLASS variable to get counts by different levels of experience? Perhaps using a format bin the years into a small number of categories, like low and high.
@StickyRoll wrote:
Hi @PaigeMiller Thanks for the Proc Freq suggestion. It works. However, may i find out on the "var" part? mad_age is from the source but is not used for the computation. The variable that I am looking at is actually "MAD_DRVEXPYR1" instead. Do i just substitute it with "MAD_DRVEXPYR1" for VAR?
I am just counting rows, so it doesn't matter which numeric variable I choose to do the counting of rows. MAD_AGE would give the same answer for this data set as any other numeric variable. (Assuming there are no missing values)
You don't have to work so hard to make sample data.
After all you are using SAS now. Take advantage of that fact.
data driver;
input MAD_ID :$3. MAD_CLS :$4. MAD_YEAR :$4. MAD_AGE MAD_EXPYR MADDRVEXPYR1 ;
cards;
AAA BLUE 2024 26 7 1
AAA BLUE 2024 21 2 1
AAA BLUE 2024 56 15 5
BBB GREY 2025 18 1 1
;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.