I have two queries in series. In the first query I pull date in, but format it as year quarter ( DATE FORMAT=YYQ. AS Date_Year_Qtr).
The next query I'm trying to sum up the sales and was hoping to group by my new Date_Year_Qtr record. Apparently since it is just formatted as the year qtr it is still splitting my rows by the actual date itself. Any way around this?
PROC SQL;
CREATE TABLE WORK.Final_Sum AS
SELECT
Date_Year_Qtr,
SUM(sales) AS sales
FROM WORK.Table_2
GROUP BY Fill_Date_Year_Qtr
;
QUIT;
To use the formatted YYQ values, you could compute the sums using PROC SUMMARY or PROC MEANS. It won't work in PROC SQL, which will not use the formatted values of the GROUP BY variable to create sums.
proc summary data=table_2 nway;
class fill_date_year_qtr;
var sales;
output out=final_sum sum=;
run;
In general, I think it is a mistake to resort to PROC SQL for even simple statistics, and learning PROC SUMMARY is very valuable for many reasons. There are occasions when SQL does just as good a job as PROC SUMMARY, but these are few; this is just one example of where PROC SUMMARY works better.
You use Date_Year_Qtr in the SELECT, but Fill_Date_Year_Qtr in the GROUP BY. Such will always lead to an automatic remerge, where all observations from the dataset will be kept.
To get a sum for a formatted categorical variable, use PROC MEANS/SUMMARY.
In SQL, you would need to use the PUT function to create a new variable with the format, and use that new variable in the GROUP BY; use no other variable in the SELECT except those that are results of SQL summary functions.
Hi Kurt, that's my fault I was trying to simplify my example and meant to have the select and group by statement the same. They truly are the same in the real life situation. Could you give more details around creating a new variable with the put function?
proc sql;
create table final_sum as
select
put(date_year_qtr,yyq.) as date_year_qtr,
sum(sales) as sales
from table_2
group by calculated date_year_qtr
;
quit;
Convert the values to a single day in that quarter. For example by using the INTNX() function.
create table final_sum as
select intnx('qtr',Date_Year_Qtr,0) as Date_Year_Qtr format=YYQ6.
, sum(sales) as sales
from table_2
group by 1
;
Note it would be much simpler to use PROC SUMMARY
proc summary nway data=table_2;
class date_year_qtr;
var sales;
output out=final_sum sum=;
run;
But the actual value of DATE_YEAR_QTR is not guaranteed to be the first day of the quarter.
Thanks all, I'll look around and see if I can find some good tutorials on PROC SUMMARY.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.