Hi SAS Forum,
Could you please help me to convert this hard code to a macro.
I am a macro beginner.
/*Below are the hard coded statements*/
data Cycle_FEB_TO_MAR_2010;
set RAW_DATA_SET_3_sorted;
if '1FEB2010'd le current_date le '30MAR2010'd;
run;
/*This was how I tried to convert the above into a macro as I need to repeat the above same code 24 times*/
%let t_1='1FEB2010'd;
%let t_2='31MAR2010'd;
data Cycle_%substr(&t_1,2,4)_TO_%substr(&t_2,3,5)_%substr(&t_1,5,8);
set RAW_DATA_SET_3_sorted;
if &t_1 le current_date le &t_2;
run;
Problem:
I warning comes and get stuck.
WARNING: Argument 3 to macro function %SUBSTR is out of range.
Could any one help?
Thanks
Mirisage
I wasn't sure of how to calculate "Balance %" (percent of which total?) but else below code should do what you asked for. You might want to give the datasets different names now that everything is done in one go.
data RAW_DATA_SET_3;
inFormat Current_date date9.;
input Bank_number Account_number $ 4-12 Current_date Product $ 24-36 Balance Arrears_Band $43-49;
format Current_date date9.;
cards;
10 1111111 31MAR2010 Personal Loan 555 30 - 60
10 1111111 30APR2010 Personal Loan 200 Current
10 1111111 31MAY2010 Personal Loan 108 1 - 30
30 1111111 31DEC2010 Res. Mortgage 800 Current
30 1111111 31JAN2011 Res. Mortgage 600 90 +
30 1111111 28FEB2011 Res. Mortgage 600 NPNA
30 1111111 17MAR2011 Res. Mortgage 600 writoff
20 333333333 15MAR2010 Personal OD 700 NPNA
20 333333333 30APR2010 Personal OD 30 NPNA
20 333333333 31MAY2010 Personal OD 2 NPNA
20 333333333 30JUN2010 Personal OD 1 NPNA
10 444444444 28FEB2010 Personal Loan 900 1 - 30
10 444444444 31MAR2010 Personal Loan 800 60 - 90
10 444444444 31MAY2010 Personal Loan 20 NPNA
10 444444444 01JUL2010 Personal Loan 25 NPNA
10 444444444 30AUG2010 Personal Loan 32 NPNA
70 666666666 07FEB2010 Personal OD 999 Current
70 666666666 15MAR2010 Personal OD 666 Current
40 777777 08FEB2010 Res. Mortgage 1877 Current
40 777777 28MAR2010 Res. Mortgage 905 Current
80 777777 01FEB2010 Personal Loan 3000 90 +
;
run;
/* create format for date cycle */
proc sql;
create view v_cyclefmt as
select distinct
'Cycle2Month' as fmtname
,'n' as type
,current_date as start
,catx(' ',put(intnx('month',current_date,-1,'b'),monname.),put(intnx('month',current_date,-1,'b'),year4.),'to',put(current_date,monname.)) as label
from RAW_DATA_SET_3
;
quit;
proc format cntlin=v_cyclefmt;
run;
/* create format for sort order of Arrears_Band */
proc format ;
invalue ArrearsCode
'CURRENT' =1
'1-30' =2
'30-60' =3
'60-90' =4
'90+' =5
'NPNA' =6
'WRITOFF' =7
OTHER =99
;
value ArrearsText
1='Current'
2='1 - 30'
3='30 - 60'
4='60 - 90'
5='90 +'
6='NPNA'
7='writoff'
99='misscoded!'
;
run;
/*Sorting*/
proc sort data= RAW_DATA_SET_3 out=RAW_DATA_SET_3_sorted;
by bank_number account_number current_date;
run;
data Cycle_FEB_TO_MAR_2010_iso_;
set RAW_DATA_SET_3_sorted;
by bank_number account_number current_date;
Arrears_Band_CD=input(upcase(compress(Arrears_Band)),ArrearsCode.);
prior_Arrears_Band_CD = lag( Arrears_Band_CD );
if intck('month',lag(current_date),current_date) ne 1 then call missing(prior_Arrears_Band_CD);
if first.account_number or missing(prior_Arrears_Band_CD) then delete;
run;
options missing=' ';
proc tabulate data= Cycle_FEB_TO_MAR_2010_iso_ order=formatted;
format current_date Cycle2Month. Arrears_Band_CD prior_Arrears_Band_CD ArrearsText.;
label current_date='Cycle';
class current_date / order=unformatted ;
class prior_Arrears_Band_CD / order=unformatted;
class Arrears_Band_CD / order=unformatted;
var balance;
table current_date, prior_Arrears_Band_CD='', Arrears_Band_CD*(n balance*sum)/ box=prior_Arrears_Band_CD printmiss;
Title 'Succession of O/S Balance ($ and %) and Accounts (#)';
run;
The reason for this error is that you the 3rd argument in a substr() function defines the length of the substring you want to read and not the end position.
substr(<source string>,<start position>,<length>).
%substr(&t_1,5,8); instructs SAS to read string '1FEB2010'd starting from position 5 and then reading the next 8 characters. Your string is not long enough for this so you're getting an error.
I assume that's what you're actually after: Cycle_%substr(&t_1,3,3)_TO_%substr(&t_2,4,3)_%substr(&t_1,6,4)
In case you just want to split up a large data set into smaller datasets based on date ranges: I can think of more straightforward ways of doing this. Is this what you want to do and are your cycles just consecutive 2 month ranges?
I don't know why you need to split up your data into several smaller datasets. It is often better/easier to code if you leave everything in one big dataset and then subset the data later on as you need it or to use by group processing in procedures (instead of calling them x-times with different datasets).
You could for examle create a cycle date variable which you then use in the by group (or class statement) of procedures. Just to give you the general idea of how to create such a group variable below some sample code:
data RAW_DATA_SET_3_sorted;
format current_date date9.;
do current_date='01Jan2010'd to '3Sep2012'd by 27;
output;
end;
run;
data want;
set RAW_DATA_SET_3_sorted;
by current_date;
format CycleStartDate CycleEndDate date9.;
length CycleDSname $21;
CycleStartDate=intnx('month2.2',current_date,0,'b');
CycleEndDate=intnx('month2.2',current_date,0,'e');
CycleDSname=
cats('Cycle_',put(CycleStartDate,monname3.),'_To_'
,put(CycleEndDate,monname3.),'_',year(CycleStartDate)
);
run;
Mirisage,
Does the following do what you are trying to accomplish?:
data RAW_DATA_SET_3_sorted;
input current_date date9.;
cards;
1jan2010
10jan2010
15feb2010
31mar2010
1apr2010
;
%let t_1=1FEB2010;
%let t_2=31MAR2010;
data Cycle_%sysfunc(substr(&t_1,%sysfunc(anyalpha(&t_1)),3))_TO_%sysfunc(substr(&t_2,%sysfunc(anyalpha(&t_2)),3))_%sysfunc(substr(&t_1,%sysfunc(anyalpha(&t_1))+3,4));
set RAW_DATA_SET_3_sorted;
if "&t_1"d le current_date le "&t_2"d;
run;
Hi Patrick and Art,
Many thanks to both of you.
I think this is the good time to show how inefficiently I have achieved what I wanted to achieve.
Using the data set (shown at the very bottom, can be run readily), I wanted to create 23 Markov Matrices/Tables for consecutive months, commencing Feb 2010 to Mar 2010 pair, culminating Dec 2011 to Jan 2012 pair.
/*This is the first Table I generated using my novice code provided further down*/
Succession of O/S Balance ($ and %) and Accounts (#) from Feb 2010 to Mar 2010
prior_arrears_band | Areeras_Band | |||||
Current | 60 - 90 | |||||
N | Balance | Balance | N | Balance | Balance | |
Sum | % | Sum | % | |||
Current | 2 | 1571 | ? | . | . | ? |
1 - 30 | . | . | ? | 1 | 800 | ? |
/*This is the second Table I generated*/
Succession of O/S Balance ($ and %)) and Accounts (#) from Mar 2010 to Apr 2010
prior_arrears_band | Areeras_Band | |||||
Current | 60 - 90 | |||||
N | Balance | Balance | N | Balance | Balance | |
Sum | % | Sum | % | |||
30 - 60 | 1 | 200 | ? | . | . | ? |
NPNA | . | . | ? | 1 | 30 | ? |
Next, ..... Apr 2010 to May 2010 table
May 2010 to Jun 2010 table
Jun 2010 to Jul 2010 table
……………………………………………..
…………………………………………………
……………………………………………….
Dec 2011 to Jan 2012 table (this is the last table)
This is the inefficient (but hopefully correct) approach I have taken. (Data set is provided at the very bottom that can be readily run).
/*Sorting*/
proc sort data= RAW_DATA_SET_3 out=RAW_DATA_SET_3_sorted;
by bank_number account_number current_date;
run;
/*Isolating only accounts occur any day between 1Feb2010 and 31Mar2010 (consecutive pairs of 2 months). Note that some accounts may not occur in pairs*/
data Cycle_FEB_TO_MAR_2010;
set RAW_DATA_SET_3_sorted;
if '1FEB2010'd le current_date le '31MAR2010'd;
run;
/*sorting again*/
proc sort data=Cycle_FEB_TO_MAR_2010 out= Cycle_FEB_TO_MAR_2010_;
by bank_number account_number current_date;
run;
/*Getting rid of single records. In other words, isolating only consecutive pairs */
data Cycle_FEB_TO_MAR_2010_iso;
set Cycle_FEB_TO_MAR_2010_;
by bank_number account_number current_date;
if not (first.account_number and last.account_number);
run;
/*getting the preceeding month's arrears_band into the succeeding month's row*/
data Cycle_FEB_TO_MAR_2010_iso;
set Cycle_FEB_TO_MAR_2010_iso;
prior_arrears_band = lag( arrears_band );
run;
/*Removing Preceeding month rows, or in other words, retaining succeeding month's row*/
/*This retained row now has both preceeding month's arrers band and succeeding month's
arrears band*/
data Cycle_FEB_TO_MAR_2010_iso_;
set Cycle_FEB_TO_MAR_2010_iso;
if '1MAR2010'd le current_date le '31MAR2010'd;
run;
/*User defined format to control the column and row title orders*/
proc format ;
value $ fmt (multilabel notsorted)
'Current'='Current'
'1 - 30'='1 - 30'
'30 - 60'='30 - 60'
'60 - 90'='60 - 90'
'90 +' = '90 +'
'NPNA'='NPNA'
'writoff' = 'writoff'
;
run;
/*This proc tabulate works but row title order not work*/
proc tabulate data= Cycle_FEB_TO_MAR_2010_iso_ order=formatted;
class prior_arrears_band /PRELOADFMT order=data; /*row CLASS statement*/
class arrears_band /PRELOADFMT order=data; /*column CLASS statement*/
var balance;
table prior_arrears_band='', arrears_band*(n balance*sum)/ box=prior_arrears_band;
format arrears_band $fmt.; /*applying the earlier defined format*/
Title 'Succession of O/S Balance ($ and %) and Accounts (#) from Feb 2010 to Mar 2010';
run;
This is the final first table I generated (except balance % columns) using my novice code.
Succession of O/S Balance ($ and %) and Accounts (#) from Feb 2010 to Mar 2010
prior_arrears_band | Arreras_Band | |||||
Current | 60 - 90 | |||||
N | Balance | Balance | N | Balance | Balance | |
Sum | % | Sum | % | |||
Current | 2 | 1571 | ? | . | . | ? |
1 - 30 | . | . | ? | 1 | 800 | ? |
Above markov table says:
Next I have to repeat this exercise for the Mar 2010 to Apr 2010 cycle and so on, culminating Dec 2011 to Jan 2012 cycle.
I know no body can read all these things, but appreciate if any help to make my code efficent, if time permits.
This is dataset used.
data RAW_DATA_SET_3;
inFormat Current_date date9.;
input Bank_number Account_number $ 4-12 Current_date Product $ 24-36 Balance Arrears_Band $43-49;
format Current_date date9.;
cards;
10 1111111 31MAR2010 Personal Loan 555 30 - 60
10 1111111 30APR2010 Personal Loan 200 Current
10 1111111 31MAY2010 Personal Loan 108 1 - 30
30 1111111 31DEC2010 Res. Mortgage 800 Current
30 1111111 31JAN2011 Res. Mortgage 600 90 +
30 1111111 28FEB2011 Res. Mortgage 600 NPNA
30 1111111 17MAR2011 Res. Mortgage 600 writoff
20 333333333 15MAR2010 Personal OD 700 NPNA
20 333333333 30APR2010 Personal OD 30 NPNA
20 333333333 31MAY2010 Personal OD 2 NPNA
20 333333333 30JUN2010 Personal OD 1 NPNA
10 444444444 28FEB2010 Personal Loan 900 1 - 30
10 444444444 31MAR2010 Personal Loan 800 60 - 90
10 444444444 31MAY2010 Personal Loan 20 NPNA
10 444444444 01JUL2010 Personal Loan 25 NPNA
10 444444444 30AUG2010 Personal Loan 32 NPNA
70 666666666 07FEB2010 Personal OD 999 Current
70 666666666 15MAR2010 Personal OD 666 Current
40 777777 08FEB2010 Res. Mortgage 1877 Current
40 777777 28MAR2010 Res. Mortgage 905 Current
80 777777 01FEB2010 Personal Loan 3000 90 +
;
run;
Thanks
Mirisae
I wasn't sure of how to calculate "Balance %" (percent of which total?) but else below code should do what you asked for. You might want to give the datasets different names now that everything is done in one go.
data RAW_DATA_SET_3;
inFormat Current_date date9.;
input Bank_number Account_number $ 4-12 Current_date Product $ 24-36 Balance Arrears_Band $43-49;
format Current_date date9.;
cards;
10 1111111 31MAR2010 Personal Loan 555 30 - 60
10 1111111 30APR2010 Personal Loan 200 Current
10 1111111 31MAY2010 Personal Loan 108 1 - 30
30 1111111 31DEC2010 Res. Mortgage 800 Current
30 1111111 31JAN2011 Res. Mortgage 600 90 +
30 1111111 28FEB2011 Res. Mortgage 600 NPNA
30 1111111 17MAR2011 Res. Mortgage 600 writoff
20 333333333 15MAR2010 Personal OD 700 NPNA
20 333333333 30APR2010 Personal OD 30 NPNA
20 333333333 31MAY2010 Personal OD 2 NPNA
20 333333333 30JUN2010 Personal OD 1 NPNA
10 444444444 28FEB2010 Personal Loan 900 1 - 30
10 444444444 31MAR2010 Personal Loan 800 60 - 90
10 444444444 31MAY2010 Personal Loan 20 NPNA
10 444444444 01JUL2010 Personal Loan 25 NPNA
10 444444444 30AUG2010 Personal Loan 32 NPNA
70 666666666 07FEB2010 Personal OD 999 Current
70 666666666 15MAR2010 Personal OD 666 Current
40 777777 08FEB2010 Res. Mortgage 1877 Current
40 777777 28MAR2010 Res. Mortgage 905 Current
80 777777 01FEB2010 Personal Loan 3000 90 +
;
run;
/* create format for date cycle */
proc sql;
create view v_cyclefmt as
select distinct
'Cycle2Month' as fmtname
,'n' as type
,current_date as start
,catx(' ',put(intnx('month',current_date,-1,'b'),monname.),put(intnx('month',current_date,-1,'b'),year4.),'to',put(current_date,monname.)) as label
from RAW_DATA_SET_3
;
quit;
proc format cntlin=v_cyclefmt;
run;
/* create format for sort order of Arrears_Band */
proc format ;
invalue ArrearsCode
'CURRENT' =1
'1-30' =2
'30-60' =3
'60-90' =4
'90+' =5
'NPNA' =6
'WRITOFF' =7
OTHER =99
;
value ArrearsText
1='Current'
2='1 - 30'
3='30 - 60'
4='60 - 90'
5='90 +'
6='NPNA'
7='writoff'
99='misscoded!'
;
run;
/*Sorting*/
proc sort data= RAW_DATA_SET_3 out=RAW_DATA_SET_3_sorted;
by bank_number account_number current_date;
run;
data Cycle_FEB_TO_MAR_2010_iso_;
set RAW_DATA_SET_3_sorted;
by bank_number account_number current_date;
Arrears_Band_CD=input(upcase(compress(Arrears_Band)),ArrearsCode.);
prior_Arrears_Band_CD = lag( Arrears_Band_CD );
if intck('month',lag(current_date),current_date) ne 1 then call missing(prior_Arrears_Band_CD);
if first.account_number or missing(prior_Arrears_Band_CD) then delete;
run;
options missing=' ';
proc tabulate data= Cycle_FEB_TO_MAR_2010_iso_ order=formatted;
format current_date Cycle2Month. Arrears_Band_CD prior_Arrears_Band_CD ArrearsText.;
label current_date='Cycle';
class current_date / order=unformatted ;
class prior_Arrears_Band_CD / order=unformatted;
class Arrears_Band_CD / order=unformatted;
var balance;
table current_date, prior_Arrears_Band_CD='', Arrears_Band_CD*(n balance*sum)/ box=prior_Arrears_Band_CD printmiss;
Title 'Succession of O/S Balance ($ and %) and Accounts (#)';
run;
Hi Patrick,
This is a tremondous help as well as a very useful new knowledge for me.
Thank you very much again!.
Regards
Mirisage
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.