I want to calculate a month over month percentage change for volumes using the below data where I would only be looking at two months each time. Is there a Proc SQL statement that can help? I am struggling on grouping by the month of the date, which I need in order to calculate MoM percent change.
Input:
Date Record_Number
20230101 10
20230102 245
20230121 39
20230201 11
20230202 202
Output:
Percent_Change
(27.55%)
*Both Date and Record_Number are of numeric format.
data have1;
set have;
date1=input(put(date,8.),yymmdd8.);
format date1 yymmn6.;
run;
proc summary data=have1 nway;
class date1;
var record_number;
output out=_sums_(drop=_:) sum=;
run;
data want;
merge _sums_(firstobs=2 rename=(record_number=record_number2) drop=date1) _sums_;
mom_pct_chg=(record_number2-record_number)/record_number;
format mom_pct_chg percent10.2;
run;
The variable DATE is not a date variable, according to SAS. It looks like a date to humans, but SAS thinks it is some integer. Dates in SAS are the number of days since 01JAN1960, and representing dates as number of days since 01JAN1960 allows you to use all the hard work SAS has done to figure out dates and months and leap years and whatever, so you don't have to code it yourself.
Thus I turn the variable DATE into DATE1 which is now a valid SAS date, the number of days since 01JAN1960, and assign it the format yymmn6. so it still looks like 202301 (or whatever) and now PROC SUMMARY can group by this month value. From there, you compute the sums, and in the next step do the division.
I'm sure it can be done in SQL, but you would also be wise to learn other methods, such as the above. I suspect using DATA steps and PROC SUMMARY will run faster than SQL on your large data set but of course, I haven't tried to confirm that. The "I must use SQL for everything" approach is not one I recommend, see Maxim 10.
Please explain the logic and calculations used here. I don't see how you get (27.55%) from this data.
In words, explain how to do the grouping by months that you mention.
I want to calculate a month over month percentage change for volumes using the below data
What variable is the volume?
You can try like that:
data have;
infile CARDS4;
input date yymmdd8. Record_Number;
format date yymmdd10.;
cards4;
20230101 10
20230102 245
20230121 39
20230201 11
20230202 202
;;;;
run;
proc print;
run;
proc sql;
create table aggregate as
select distinct
year(date) as y
, month(date) as m
, year(intnx("Month",date,-1)) as py
, month(intnx("Month",date,-1)) as pm
, sum(Record_Number) as sum_of_Record_Number
from
have
group by
year(date)
, month(date)
;
quit;
proc print data=aggregate;
run;
proc sql;
create table want as
select distinct
cur.y
,cur.m
,cur.sum_of_Record_Number as sum_cur
,prev.sum_of_Record_Number as sum_prev
,(cur.sum_of_Record_Number - prev.sum_of_Record_Number)/cur.sum_of_Record_Number as percent_change format percentN10.2
from
aggregate as cur
join
aggregate as prev
on
cur.y=prev.py
and
cur.m=prev.pm
;
quit;
proc print data=want;
run;
but I would use the data step if I were you:
proc sql;
create table aggregate2 as
select distinct
year(date) as y
, month(date) as m
, sum(Record_Number) as sum_of_Record_Number
from
have
group by
year(date)
, month(date)
order by
1,2
;
quit;
proc print data=aggregate2;
run;
data want2;
set aggregate2;
percent_change = (lag(sum_of_Record_Number)-sum_of_Record_Number)/lag(sum_of_Record_Number);
format percent_change percentN10.2;
if percent_change;
run;
proc print data=want2;
run;
Bart
@RedUser77 wrote:
The data is stored on tables that I am querying from so ideally I would like to set up a proc sql statement querying the above.
If the source tables are stored in SAS then there is no reason not to use a SAS datastep.
If these are tables in a database then also consider explicit passthrough SQL as many databases got SQL extensions like windowing functions that allow for such calculations.
Take the sum of Record_Number by the month of Date and then calculate the percent change.
For January (01): Sum of Record_Number = 294
For February (02): Sum of Record_Number = 213
Percent_Change: 100*((CURRENT - PRIOR)/(PRIOR))
Percent_Change = ((213 - 294)/(294)) = -27.55
The Date variable is YYYYMMDD.
data have1;
set have;
date1=input(put(date,8.),yymmdd8.);
format date1 yymmn6.;
run;
proc summary data=have1 nway;
class date1;
var record_number;
output out=_sums_(drop=_:) sum=;
run;
data want;
merge _sums_(firstobs=2 rename=(record_number=record_number2) drop=date1) _sums_;
mom_pct_chg=(record_number2-record_number)/record_number;
format mom_pct_chg percent10.2;
run;
The variable DATE is not a date variable, according to SAS. It looks like a date to humans, but SAS thinks it is some integer. Dates in SAS are the number of days since 01JAN1960, and representing dates as number of days since 01JAN1960 allows you to use all the hard work SAS has done to figure out dates and months and leap years and whatever, so you don't have to code it yourself.
Thus I turn the variable DATE into DATE1 which is now a valid SAS date, the number of days since 01JAN1960, and assign it the format yymmn6. so it still looks like 202301 (or whatever) and now PROC SUMMARY can group by this month value. From there, you compute the sums, and in the next step do the division.
I'm sure it can be done in SQL, but you would also be wise to learn other methods, such as the above. I suspect using DATA steps and PROC SUMMARY will run faster than SQL on your large data set but of course, I haven't tried to confirm that. The "I must use SQL for everything" approach is not one I recommend, see Maxim 10.
It's not clear why SQL is advantageous for this task. If the data are sorted by date, then a single data step does what you need:
data have;
input date yymmdd8. Record_Number;
format date yymmddn8.;
datalines;
20230101 10
20230102 245
20230121 39
20230201 11
20230202 202
run;
data want (keep=date prior_sum current_sum pct_change);
merge have
have (firstobs=2 keep=date rename=(date=nxt_date))
end=end_of_merge;
date=intnx('month',date,0,'end'); ** Convert each date to end_of_month;
if date^=lag(date) then current_sum=0;** New month? Reset sum to zero ;
current_sum+record_number;
if nxt_date>date or end_of_merge=1; **Subsetting if: of end-of-month or end_of_merge;
prior_sum=lag(current_sum);
if prior_sum ^=. then pct_change=(current_sum-prior_sum)/prior_sum;
format pct_change percent7.1;
run;
Personally, I would advise people not to write their own DATA step code to perform calculations (like sums) across groups that can easily be obtained from PROC MEANS or PROC SUMMARY. For beginners, writing DATA step code to do this can be very difficult, harder than using PROC MEANS or PROC SUMMARY. SAS has done the hard work to get these calculations correct; there's no need for a user to replicate these calculations in a DATA step.
Furthermore, the DATA step written by users, if not done properly (but not the code by @mkeintz), could fail if there are missing values (we have seen this happen). This is another reason to use PROC MEANS or PROC SUMMARY, these PROCs handle missings properly.
Hi @RedUser77
I know you have multiple answers to go through, and here is yet another one
data have;
infile CARDS4;
input date yymmdd8. Record_Number;
format date yymmdd10.;
cards4;
20230101 10
20230102 245
20230121 39
20230201 11
20230202 202
20230301 41
20230321 59
20230302 92
;;;;
run;
/* Create a view to add the month variable */
DATA HAVE_V/VIEW=HAVE_V;
set have;
month = put(date,yymon8.);
RUN;
/* Use the DOW loop to calculate monthly totals and Percent changes */
DATA WANT(KEEP=MONTH MON_TOT PCT_CHANGE);
mon_tot = 0;
prev_mon_tot = 0;
/* calculate monthly totals */
DO UNTIL (LAST.MONTH);
SET HAVE_V;
BY MONTH NOTSORTED;
mon_tot + Record_Number;
END;
/* calculate monthly Percent changes */
prev_mon_tot = lag(mon_tot);
pct_change = ifn((prev_mon_tot > 0),100*((mon_tot - prev_mon_tot)/(prev_mon_tot)),0);
OUTPUT;
FORMAT pct_change 8.2;
RUN;
Note: this solution assumes the original input table/dataset is ordered by date.
Hope this helps,
Ahmed
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.