I have aggregated my data to get averages by Month, where I take a date value and use the put function to format the date in a monthyear format:
put(Date, monyy7.) as Month
I want the data set to be grouped by that, AND ordered by that in sequential order. However, since this is a character format it is ordering it in alphabetical order. Is there a numeric equivalent for this, or another way to have my aggregated data in the proper order? I need to have the months spelled out or at least abbreviated as this format does.
I've tried using ORDER BY Date (and leaving it out of the select to get the aggregation), but this gives me duplicates even if I use DISTINCT. I've also tried using 2 steps by creating the month variable in an unaggregated set and ordering by Date, but that order does not carry over when I create an aggregated set from it.
Don't use PUT, just use the formatted numeric value. For example:
data have;
format date mmddyy10. value dollar10.;
do Date='01JAN2024'd to '10JUL2024'd by 7;
value+month(date);
output;
end;
run;
proc means data=have sum maxdec=2;
class date;
format date monyy7.;
var value;
run;
Result:
Analysis Variable : value | ||
---|---|---|
date | N Obs | Sum |
JAN24 | 5 | 15.00 |
FEB24 | 4 | 40.00 |
MAR24 | 4 | 82.00 |
APR24 | 5 | 185.00 |
MAY24 | 4 | 230.00 |
JUN24 | 4 | 320.00 |
JUL24 | 2 | 199.00 |
I found a solution, but I am open to alternatives.
Select distinct
Month */the mmyy7 format*/,
avg(count) as AVG_CNT
from my_data
group by month
ORDER BY year(Date), Month(Date);
quit;
Don't use PUT, just use the formatted numeric value. For example:
data have;
format date mmddyy10. value dollar10.;
do Date='01JAN2024'd to '10JUL2024'd by 7;
value+month(date);
output;
end;
run;
proc means data=have sum maxdec=2;
class date;
format date monyy7.;
var value;
run;
Result:
Analysis Variable : value | ||
---|---|---|
date | N Obs | Sum |
JAN24 | 5 | 15.00 |
FEB24 | 4 | 40.00 |
MAR24 | 4 | 82.00 |
APR24 | 5 | 185.00 |
MAY24 | 4 | 230.00 |
JUN24 | 4 | 320.00 |
JUL24 | 2 | 199.00 |
You could use the yymm7 format instead of monyy7. With yymm7, the alphabetical sort of the character values is in chronological order.
proc sql ;
select put(date,yymm7.) as month, mean(y) as mean
from have
group by month
order by month
;
quit ;
But as @SASJedi , the PROC MEANS approach is even simpler.
I agree, use PROC MEANS here, and not PROC SQL. Another benefit is that learning PROC MEANS is a very valuable thing to do, and in my opinion, SQL should not be your first choice for computing statistics.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.