Hi,
I am trying to clean my data based on below items. Could you help me to write code for this?
quarter | account | status | DEFault | |
Q1 | 230 | AC | 0 | |
Q2 | 230 | CL | 0 | Keep |
Q3 | 230 | CL | 0 | Keep |
Q4 | 230 | CL | 0 | Remove |
Q1 | 230 | CL | 0 | Remove |
Q2 | 230 | CL | 0 | Remove |
Q3 | 230 | CL | 0 | Remove |
Q4 | 230 | CL | 0 | Remove |
quarter | account | status | DEFault | ||
Q1 | 230 | AC | 0 | ||
Q2 | 230 | CL | 0 | Keep | |
Q2 | 230 | CL | 0 | Keep | |
Q3 | 230 | CL | 0 | Keep | |
Q1 | 230 | CL | 0 | Keep | |
Q2 | 230 | CL | 1 | Keep | |
Q3 | 230 | CL | 0 | Remve | Already default |
Q4 | 230 | CL | 0 | Remove | Already default |
I suggest two data steps in succession, the first one looks for any default value, the second one creates the final result:
data have;
input year quarter $ account $ status $ default;
cards;
1 Q1 230 AC 0
1 Q2 230 CL 0
1 Q3 230 CL 0
1 Q4 230 CL 0
2 Q1 230 CL 0
2 Q2 230 CL 0
2 Q3 230 CL 0
2 Q4 230 CL 0
1 Q1 231 AC 0
1 Q2 231 CL 0
1 Q3 231 CL 0
1 Q4 231 CL 0
2 Q1 231 CL 0
2 Q2 231 CL 1
2 Q3 231 CL 0
2 Q4 231 CL 0
;
run;
data
intermediate
default (keep=account)
;
set have;
by account year quarter; * year and quarter only to make the step fail if dataset is not sorted correctly;
retain def_flag;
if first.account then def_flag = 0;
if default
then do;
output intermediate; * we want to keep that one as the last;
if not def_flag then output default;
def_flag = 1;
end;
if not def_flag then output intermediate;
drop def_flag;
run;
data want;
merge
intermediate (in=a)
default (in=b)
;
by account;
retain counter;
if a;
oldstat = lag(status);
if first.account then counter = .;
else if oldstat = 'AC' and status = 'CL'
then counter = 1;
else if status = 'CL'
then counter + 1;
if counter < 3 or b;
drop counter oldstat;
run;
proc print data=want noobs;
run;
Result:
year quarter account status default 1 Q1 230 AC 0 1 Q2 230 CL 0 1 Q3 230 CL 0 1 Q1 231 AC 0 1 Q2 231 CL 0 1 Q3 231 CL 0 1 Q4 231 CL 0 2 Q1 231 CL 0 2 Q2 231 CL 1
Take note:
Thank you very much, this is very helpful. I notice that I have this below conversion, so I have duplicate account, quarters now. How I can deal with you code for making data quarterly?
transform the data into quarterly end in month 2,5,8,11
12,1,2-Q1
3,4,5 -Q2
6,7,8 -Q3
9,10,11 -Q4
year quarter account status default
1 Q1 230 AC 0
1 Q2 230 CL 0
1 Q3 230 CL 0
1 Q1 231 AC 0
1 Q2 231 CL 0
1 Q3 231 CL 0
1 Q4 231 CL 0
2 Q1 231 CL 0
2 Q2 231 CL 1
Please provide example data in a data step, like I showed you.
You have to divide your code to 2 parts.
data yor_data;
set yor_data;
if _n_=1 then flagship='Keep';
retain flagship;
by quarter account;
if first.account then flagship='Keep';
if flagship ne 'Remove' then do
if DEFault=1 then
flag= 'keep'
flagship='Remove'
end
else
flag= 'Remove'
flagship='Remove'
end
run;
Write a proc sql to group by account and get sum of Default and based on that output create a format. If Account have a default then it is on one category else the account in some default categoryThis is a tricky code. Put your mind for 1 hour, it will be fixed. I do not have the SAS Engine with me to compile the code that's why you are getting this code skeleton. Sorry for the issues. Hope it helps.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.