- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
I have some credit card approved data, along when card was approved Decision Year and decision month, I have 1M aprox credit card data , and I am trying to calculate few metrics like-
YTD of Current Year,
YTD of Previous Year,
Month over Year Current,
Month over Year Prev
Current Month total,
Prev Month Total.
Data Sample;
input card_no Dec_yr Dec_month$ Approved;
datalines;
123 2019 JAN 1
124 2019 Jan 1
125 2019 Feb 1
126 2019 Feb 1
127 2019 Mar 1
128 2019 Mar 1
100 2020 JAN 1
200 2020 Jan 1
300 2020 Feb 1
400 2020 Feb 1
500 2020 Mar 1
600 2020 Mar 1
700 2020 Mar 1
;
run;
I wanted to summarize data on month basis, for example if Select MAR month- output should be like this
YTD_2020 YDT_2019 MOY_2020 MOY_2019 Prev_month_2020 Current_month_2020
7 6 3 2 2 3
another example if Select Feb month- output should be like this
YTD_2020 YDT_2019 MOY_2020 MOY_2019 Prev_month_2020 Current_month_2020
4 4 2 2 2 2
Hope , I have made myself clear about my requirement, please comment if need any clarification.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the difference between "month over year current" and "current month total"?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure how you are getting your MOY results, but for the other results you need to get cumulative sums within both month and year, but you don't have a numeric date or month. So, first compute the numeric date and ensure your data is ordered by year and month. Next, compute the cumulative sums and store only the records with the relevant sums.
This is where things get dicey, because you say you want to select a month, like MAR or FEB in your examples. That makes it sound like you only want to produce summary information on a particular point in time. There is probably an easier way to do this, but given your particular structure - I used SQL to pull the values individually from the year/month combinations and assemble it into a single record. (You can update this to include your MOY calculations too.) However, if what you want is a summary that produces this for every combination of year and month in your data set, this would need to be modified since its a quick and dirty method for getting a single result This could easily be placed into a macro to make it easier for a user to call and the SQL could be extended to grab different views, but this should get you started.
data have;
input card_no Dec_yr Dec_month $ Approved;
monthn = month(input(cats('01',Dec_month,Dec_yr),date9.)); *compute your numeric month;
cards;
123 2019 JAN 1
124 2019 Jan 1
125 2019 Feb 1
126 2019 Feb 1
127 2019 Mar 1
128 2019 Mar 1
100 2020 JAN 1
200 2020 Jan 1
300 2020 Feb 1
400 2020 Feb 1
500 2020 Mar 1
600 2020 Mar 1
700 2020 Mar 1
;
run;
proc sort data= have;
by dec_yr monthn;
run;
data want;
set have;
by dec_yr monthn;
if first.monthn then mtd = 0;*reset monthly counts each time you see a new month;
mtd+approved;*increment monthly counts;
if first.dec_yr then ytd = 0;
ytd+approved;
if last.monthn; *output a record only if it contains the sums you're looking for;
run;
%let month = 3;
%let pmonth = %sysevalf(&month - 1);
%let year = 2020;
%let pyear = %sysevalf(&year - 1);
proc sql;
create table summary as
select c.ytd as YTD_current, c.mtd as Month_current,
py.ytd as ytd_previous, pm.mtd as Month_previous
from (select ytd, mtd
from want
where monthn eq &month and dec_yr eq &year) as c,
(select ytd
from want
where monthn eq &month and dec_yr eq &pyear) as py,
(select mtd
from want
where monthn eq &pmonth and dec_yr eq &year) as pm
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have a license for SAS/ETS? If so, PROC TIMESERIES or EXPAND can calculate most of these measures quite conveniently.
You can check your license with the following code:
*what is installed;
proc product_status;run;
*what is licensed;
proc setinit; run;
@LittlesasMaster wrote:
Hi Team,
I have some credit card approved data, along when card was approved Decision Year and decision month, I have 1M aprox credit card data , and I am trying to calculate few metrics like-
YTD of Current Year,
YTD of Previous Year,
Month over Year Current,
Month over Year Prev
Current Month total,
Prev Month Total.
Data Sample;
input card_no Dec_yr Dec_month$ Approved;
datalines;
123 2019 JAN 1
124 2019 Jan 1
125 2019 Feb 1
126 2019 Feb 1
127 2019 Mar 1
128 2019 Mar 1
100 2020 JAN 1
200 2020 Jan 1
300 2020 Feb 1
400 2020 Feb 1
500 2020 Mar 1
600 2020 Mar 1
700 2020 Mar 1
;
run;
I wanted to summarize data on month basis, for example if Select MAR month- output should be like this
YTD_2020 YDT_2019 MOY_2020 MOY_2019 Prev_month_2020 Current_month_2020
7 6 3 2 2 3
another example if Select Feb month- output should be like this
YTD_2020 YDT_2019 MOY_2020 MOY_2019 Prev_month_2020 Current_month_2020
4 4 2 2 2 2
Hope , I have made myself clear about my requirement, please comment if need any clarification.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can avoid sorting and accumulating by first using PROC SUMMARY. Then it's just a matter of reading the PROC SUMMARY output and using LAG functions, as below.
data have;
input card_no Dec_yr Dec_month $ Approved;
dec_month=upcase(dec_month);
monthn = findw('JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC',trim(dec_month),' ','e');
cards;
123 2019 JAN 1
124 2019 Jan 1
125 2019 Feb 1
126 2019 Feb 1
127 2019 Mar 1
128 2019 Mar 1
100 2020 JAN 1
200 2020 Jan 1
300 2020 Feb 1
400 2020 Feb 1
500 2020 Mar 1
600 2020 Mar 1
700 2020 Mar 1
;
proc summary data=have nway;
class dec_yr monthn dec_month ;
var approved;
output out=need (drop=_TYPE_ _FREQ_) sum=current_month;
run;
data want;
set need;
by dec_yr;
month_lastyear=lag3(current_month);
if first.dec_yr then ytd=0;
ytd+current_month;
ytd_lastyear=lag3(ytd);
if not missing(ytd_lastyear);
run;
The whole point of this is to take advantage of SAS's strengths - PROC SUMMARY is one of them. Just think of SAS as a tool first, programming language second.
- The proc summary has CLASS MONTHN DEC_MONTH statement, which not only specifies the classification scheme of the output statement, but also sorts the output by DEC_YR/MONTHN/DEC_MONTH.
- In the data want step, I use LAG3, since you only provide 3 months of data per year. But if you have data for every month, use lag12.
- the subsetting if statement "if not missing(yrd_last year)" avoids outputting data for the first year, which has no prior year.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------