BookmarkSubscribeRSS Feed
StickyRoll
Fluorite | Level 6

How do i aggregate and count a variable where X < 2 using SAS Datastep?

 

I have the following data:

 

data driver;
	MAD_ID='AAA';
	MAD_CLS='BLUE';
	MAD_YEAR='2024';
	MAD_AGE = 26;
	MAD_EXPYR = 7;
	MAD_DRVEXPYR1=1;
	output;

	MAD_ID='AAA';
	MAD_CLS='BLUE';
	MAD_YEAR='2024';
	MAD_AGE = 21;
	MAD_EXPYR = 2;
	MAD_DRVEXPYR1=1;
	output;

	MAD_ID='AAA';
	MAD_CLS='BLUE';
	MAD_YEAR='2024';
	MAD_AGE = 56;
	MAD_EXPYR = 15;
	MAD_DRVEXPYR1=5;
	output;

	MAD_ID='BBB';
	MAD_CLS='GREY';
	MAD_YEAR='2025';
	MAD_AGE = 18;
	MAD_EXPYR = 1;
	MAD_DRVEXPYR1=1;
	output;
run;

 

 

However, the result I am getting with the following SQL command is not what I want to achieve.

 

proc sql;
	create table want as
	select *
	, count(1) as MAD_DRVEXPYR1_cnt
	from driver
	group by MAD_ID, MAD_CLS, MAD_YEAR
	having MAD_DRVEXPYR1 <2;

quit;

 

 

 

It shows MAD_DRVEXPYR1 count as 3 for the first ID + CLS + YEAR but as we can see in the sample data, there is only 2 row of record for the first ID + CLS + YEAR.

 

Which part of my logic is wrong?

 

Also, can i achieve the same thing using SAS Datastep?

6 REPLIES 6
rudfaden
Lapis Lazuli | Level 10

Use where instead of having. Having is used on the aggregated data.


proc sql;
	create table want as
	select *
	, count(*) as MAD_DRVEXPYR1_cnt
	from driver
    	where MAD_DRVEXPYR1 <2
	group by MAD_ID, MAD_CLS, MAD_YEAR
;

quit;
PaigeMiller
Diamond | Level 26

@StickyRoll wrote:

How do i aggregate and count a variable where X < 2 using SAS Datastep?

 

Also, can i achieve the same thing using SAS Datastep?


Probably you could do this in a DATA step, but it is a lot easier using PROC FREQ to do counting or PROC SUMMARY to compute the N in each group.

 

proc summary data=driver nway;
    where mad_drvexpyr1<2;
    class mad_id mad_cls mad_year;
    var mad_age;
    output out=counts(drop=_type_ _freq_) n=mad_drvexpyr1_cnt;
run;

 

--
Paige Miller
StickyRoll
Fluorite | Level 6

Hi @PaigeMiller  Thanks for the Proc Freq suggestion. It works. However, may i find out on the "var" part? mad_age is from the source but is not used for the computation. The variable that I am looking at is actually "MAD_DRVEXPYR1" instead. Do i just substitute it with "MAD_DRVEXPYR1" for VAR?

Tom
Super User Tom
Super User

Yes.  But what is it you want to count?  Your original SQL code was counting the number of observations, whether they were missing or not.  Is that what you want? If so the _FREQ_ automatic variable will have that count.  The N statistic of PROC SUMMARY (also known as PROC MEANS) will count the number of non missing values, like using count(varname) in SQL code will do.  If you don't have any missing values then the automatic variable _FREQ_ will be the same as the N statistic.  

 

But that variable seems to already be a type of count, if I am guessing the meaning properly.  Do you instead want to take the SUM of the years of experience? 

 

Or perhaps use that variable as another CLASS variable to get counts by different levels of experience? Perhaps using a format bin the years into a small number of categories, like low and high.

PaigeMiller
Diamond | Level 26

@StickyRoll wrote:

Hi @PaigeMiller  Thanks for the Proc Freq suggestion. It works. However, may i find out on the "var" part? mad_age is from the source but is not used for the computation. The variable that I am looking at is actually "MAD_DRVEXPYR1" instead. Do i just substitute it with "MAD_DRVEXPYR1" for VAR?


I am just counting rows, so it doesn't matter which numeric variable I choose to do the counting of rows. MAD_AGE would give the same answer for this data set as any other numeric variable. (Assuming there are no missing values)

--
Paige Miller
Tom
Super User Tom
Super User

You don't have to work so hard to make sample data.  

After all you are using SAS now.  Take advantage of that fact.

data driver;
  input MAD_ID :$3. MAD_CLS :$4. MAD_YEAR :$4. MAD_AGE MAD_EXPYR MADDRVEXPYR1 ;
cards;
AAA BLUE 2024 26  7 1
AAA BLUE 2024 21  2 1
AAA BLUE 2024 56 15 5
BBB GREY 2025 18  1 1
;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1751 views
  • 1 like
  • 4 in conversation