Hi,
I have a large dataset (20 year period) in the following format:
month | numerator | denominator |
Jan-16 | 10 | 200 |
Feb-16 | 15 | 300 |
Mar-16 | 20 | 300 |
Apr-16 | 10 | 150 |
May-16 | 5 | 100 |
Jun-16 | 30 | 350 |
Jul-16 | 20 | 400 |
Aug-16 | 25 | 350 |
Sep-16 | 15 | 200 |
Oct-16 | 40 | 450 |
Nov-16 | 10 | 200 |
Dec-16 | 40 | 400 |
I need to transform to a quarterly view in the following way.
Quarter_end | Rate |
Mar-16 | 22.5% |
Jun-16 | 30.0% |
Sep-16 | 15.0% |
Dec-16 | 20.0% |
Mar-16 (quarter end) = sum of numerator in Jan-16, Feb-16 and Mar-16 divided by denominator in Jan-16 (quarter start)
= (10+15+20) / 200
The logic for the other quarters follow in the same way.
I am currently doing this in Excel but it can get messy.
Is there a more efficient way of doing this in SAS?
Thanks in advance
Same update, to take into account the year:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data have_q;
set have;
Year = year(month);
Quarter = qtr(month);
run;
data denominator (keep= year quarter denominator) last_month (keep=year quarter month);
set have_q;
by year quarter;
if first.quarter then output denominator;
if last.quarter then output last_month;
run;
proc means data=have_q sum noprint;
var numerator;
by year quarter;
output out=numerator (drop=_:) sum=sum_numerator;
run;
data want (keep= month Rate rename=(month=Quarter_end));
merge numerator denominator last_month;
format Rate percent8.1;
by year quarter;
Rate = sum_numerator / denominator;
run;
Hi @PetePatel
Here is one approach to achieve this, using a data step:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
;
run;
data have_q;
set have;
Quarter = qtr(month);
run;
data want (keep= Quarter_end Rate);
set have_q;
by Quarter;
format month monyy. Rate percent8.1;
retain numerator_sum;
retain denominator_sum;
if first.Quarter then do;
numerator_sum=0;
denominator_sum=0;
end;
numerator_sum + numerator;
if first.Quarter then denominator_sum + denominator;
Rate = numerator_sum / denominator_sum;
if last.Quarter then output;
rename month=Quarter_end;
run;
Thanks Ed, this is really close.
It works for the one year period I provided but as soon as you add in more quarters it fails.
So if I add on the below to datalines
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
Is there a way around this?
Hi @PetePatel
I have added two statements in the program to take into account the year:
Best,
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data have_q;
set have;
Year = year(month);
Quarter = qtr(month);
run;
data want (keep= Quarter_end Rate);
set have_q;
by Year Quarter;
format month monyy. Rate percent8.1;
retain numerator_sum;
retain denominator_sum;
if first.Quarter then do;
numerator_sum=0;
denominator_sum=0;
end;
numerator_sum + numerator;
if first.Quarter then denominator_sum + denominator;
Rate = numerator_sum / denominator_sum;
if last.Quarter then output;
rename month=Quarter_end;
run;
Another option could be to use a proc means:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
;
run;
data have_q;
set have;
quarter = qtr(month);
run;
data denominator (keep=quarter denominator) last_month (keep=quarter month);
set have_q;
by quarter;
if first.quarter then output denominator;
if last.quarter then output last_month;
run;
proc means data=have_q sum noprint;
var numerator;
by quarter;
output out=numerator (drop=_:) sum=sum_numerator;
run;
data want (keep= month Rate rename=(month=Quarter_end));
merge numerator denominator last_month;
format Rate percent8.1;
by quarter;
Rate = sum_numerator / denominator;
run;
Same update, to take into account the year:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data have_q;
set have;
Year = year(month);
Quarter = qtr(month);
run;
data denominator (keep= year quarter denominator) last_month (keep=year quarter month);
set have_q;
by year quarter;
if first.quarter then output denominator;
if last.quarter then output last_month;
run;
proc means data=have_q sum noprint;
var numerator;
by year quarter;
output out=numerator (drop=_:) sum=sum_numerator;
run;
data want (keep= month Rate rename=(month=Quarter_end));
merge numerator denominator last_month;
format Rate percent8.1;
by year quarter;
Rate = sum_numerator / denominator;
run;
Works perfectly, thank you.
Is this for reporting purposes or do you need a SAS data set?
For reporting but required in SAS to merge onto other datasets.
Hi @PetePatel Nice to see the question has been answered in very legible terms. Good morning, Assuming your sample represents your sample and is neatly sorted by calendar month, this presents a case for nice application of groupformats akin to what we are used to in procedures. One would think, why not use a procedure, well datastep is more of a work horse than readymeals, the options plethora are staggering
data have;
infile datalines ;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data want;
call missing(Quarter_end);
do until(last._m);
set have(rename=month=_m);
by _m groupformat;
format _m yyq6.;
Quarter_end=_m;
if first._m then _d=denominator;
_n=sum(numerator,_n);
end;
Rate=_n/_d;
format rate percent8.2 Quarter_end monyy7.;
keep Quarter_end rate;
run;
All the best!
Thank you novinosrin, I didn't know you could use a groupformat in such a way.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.