Hi,
I am trying to clean my data based on below items. Could you help me to write code for this?
quarter | account | status | DEFault | |
Q1 | 230 | AC | 0 | |
Q2 | 230 | CL | 0 | Keep |
Q3 | 230 | CL | 0 | Keep |
Q4 | 230 | CL | 0 | Remove |
Q1 | 230 | CL | 0 | Remove |
Q2 | 230 | CL | 0 | Remove |
Q3 | 230 | CL | 0 | Remove |
Q4 | 230 | CL | 0 | Remove |
quarter | account | status | DEFault | ||
Q1 | 230 | AC | 0 | ||
Q2 | 230 | CL | 0 | Keep | |
Q2 | 230 | CL | 0 | Keep | |
Q3 | 230 | CL | 0 | Keep | |
Q1 | 230 | CL | 0 | Keep | |
Q2 | 230 | CL | 1 | Keep | |
Q3 | 230 | CL | 0 | Remve | Already default |
Q4 | 230 | CL | 0 | Remove | Already default |
I suggest two data steps in succession, the first one looks for any default value, the second one creates the final result:
data have;
input year quarter $ account $ status $ default;
cards;
1 Q1 230 AC 0
1 Q2 230 CL 0
1 Q3 230 CL 0
1 Q4 230 CL 0
2 Q1 230 CL 0
2 Q2 230 CL 0
2 Q3 230 CL 0
2 Q4 230 CL 0
1 Q1 231 AC 0
1 Q2 231 CL 0
1 Q3 231 CL 0
1 Q4 231 CL 0
2 Q1 231 CL 0
2 Q2 231 CL 1
2 Q3 231 CL 0
2 Q4 231 CL 0
;
run;
data
intermediate
default (keep=account)
;
set have;
by account year quarter; * year and quarter only to make the step fail if dataset is not sorted correctly;
retain def_flag;
if first.account then def_flag = 0;
if default
then do;
output intermediate; * we want to keep that one as the last;
if not def_flag then output default;
def_flag = 1;
end;
if not def_flag then output intermediate;
drop def_flag;
run;
data want;
merge
intermediate (in=a)
default (in=b)
;
by account;
retain counter;
if a;
oldstat = lag(status);
if first.account then counter = .;
else if oldstat = 'AC' and status = 'CL'
then counter = 1;
else if status = 'CL'
then counter + 1;
if counter < 3 or b;
drop counter oldstat;
run;
proc print data=want noobs;
run;
Result:
year quarter account status default 1 Q1 230 AC 0 1 Q2 230 CL 0 1 Q3 230 CL 0 1 Q1 231 AC 0 1 Q2 231 CL 0 1 Q3 231 CL 0 1 Q4 231 CL 0 2 Q1 231 CL 0 2 Q2 231 CL 1
Take note:
Thank you very much, this is very helpful. I notice that I have this below conversion, so I have duplicate account, quarters now. How I can deal with you code for making data quarterly?
transform the data into quarterly end in month 2,5,8,11
12,1,2-Q1
3,4,5 -Q2
6,7,8 -Q3
9,10,11 -Q4
year quarter account status default
1 Q1 230 AC 0
1 Q2 230 CL 0
1 Q3 230 CL 0
1 Q1 231 AC 0
1 Q2 231 CL 0
1 Q3 231 CL 0
1 Q4 231 CL 0
2 Q1 231 CL 0
2 Q2 231 CL 1
Please provide example data in a data step, like I showed you.
You have to divide your code to 2 parts.
data yor_data;
set yor_data;
if _n_=1 then flagship='Keep';
retain flagship;
by quarter account;
if first.account then flagship='Keep';
if flagship ne 'Remove' then do
if DEFault=1 then
flag= 'keep'
flagship='Remove'
end
else
flag= 'Remove'
flagship='Remove'
end
run;Write a proc sql to group by account and get sum of Default and based on that output create a format. If Account have a default then it is on one category else the account in some default categoryThis is a tricky code. Put your mind for 1 hour, it will be fixed. I do not have the SAS Engine with me to compile the code that's why you are getting this code skeleton. Sorry for the issues. Hope it helps.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.