Hi Colleagues,
In this data set, I need to separate the records where Arrears_Band at least once has “writoff” value.
data have;
informat Current_date date9.;
Input Current_date Account_number 11-13 Arrears_Band $ 15-25;
Format Current_date date9.;
cards;
31MAY2011 111 NPNA
30JUN2011 111 writoff
31JUL2011 111 NPNA
31AUG2011 111 NPNA
30SEP2011 111 NPNA
31OCT2011 111 NPNA
30NOV2011 111 NPNA
31DEC2011 111 NPNA
31JAN2012 111 NPNA
31DEC2011 222 NPNA
31JAN2012 222 NPNA
30NOV2010 333 NPNA
31DEC2010 333 NPNA
31JAN2011 333 NPNA
28FEB2011 333 NPNA
31MAR2011 333 writoff
30APR2011 333 NPNA
31MAY2011 333 NPNA
30JUN2011 333 NPNA
31JUL2011 333 NPNA
31AUG2011 333 NPNA
30SEP2011 333 NPNA
31OCT2011 333 NPNA
30NOV2011 333 NPNA
31DEC2011 333 NPNA
31JAN2012 333 NPNA
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
30JUN2010 444 Current
31JUL2010 444 Current
31AUG2010 444 Current
30SEP2010 444 Current
31OCT2010 444 Current
30NOV2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30APR2011 444 90 +
31MAY2011 444 90 +
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
30SEP2011 444 NPNA
31OCT2011 444 NPNA
30NOV2011 444 NPNA
31DEC2011 444 NPNA
31JAN2012 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31JAN2012 666 writoff
31JAN2012 777 NPNA
;
run;
/*Answer*/
The data set I want is this.
Current_date Account_number Arrears_Band
31MAY2011 111 NPNA
30JUN2011 111 writoff
31JUL2011 111 NPNA
31AUG2011 111 NPNA
30SEP2011 111 NPNA
31OCT2011 111 NPNA
30NOV2011 111 NPNA
31DEC2011 111 NPNA
31JAN2012 111 NPNA
30NOV2010 333 NPNA
31DEC2010 333 NPNA
31JAN2011 333 NPNA
28FEB2011 333 NPNA
31MAR2011 333 writoff
30APR2011 333 NPNA
31MAY2011 333 NPNA
30JUN2011 333 NPNA
31JUL2011 333 NPNA
31AUG2011 333 NPNA
30SEP2011 333 NPNA
31OCT2011 333 NPNA
30NOV2011 333 NPNA
31DEC2011 333 NPNA
31JAN2012 333 NPNA
31JAN2012 666 writoff
Question:
I did below but it obviously selects only the specific records where Arrears_Band =writoff which is not I want.
Data want;
Set have;
If Arrears_Band =’writoff’;
Run;
Could anybody help me?
Thanks
Neil
This is one of the classic topics on forum. In general, you would need two passes, no matter what kind of approach:
data have;
informat Current_date date9.;
Input Current_date Account_number Arrears_Band :$15.;
Format Current_date date9.;
cards;
31MAY2011 111 NPNA
30JUN2011 111 writoff
31JUL2011 111 NPNA
31AUG2011 111 NPNA
30SEP2011 111 NPNA
31OCT2011 111 NPNA
30NOV2011 111 NPNA
31DEC2011 111 NPNA
31JAN2012 111 NPNA
31DEC2011 222 NPNA
31JAN2012 222 NPNA
30NOV2010 333 NPNA
31DEC2010 333 NPNA
31JAN2011 333 NPNA
28FEB2011 333 NPNA
31MAR2011 333 writoff
30APR2011 333 NPNA
31MAY2011 333 NPNA
30JUN2011 333 NPNA
31JUL2011 333 NPNA
31AUG2011 333 NPNA
30SEP2011 333 NPNA
31OCT2011 333 NPNA
30NOV2011 333 NPNA
31DEC2011 333 NPNA
31JAN2012 333 NPNA
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
30JUN2010 444 Current
31JUL2010 444 Current
31AUG2010 444 Current
30SEP2010 444 Current
31OCT2010 444 Current
30NOV2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30APR2011 444 90 +
31MAY2011 444 90 +
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
30SEP2011 444 NPNA
31OCT2011 444 NPNA
30NOV2011 444 NPNA
31DEC2011 444 NPNA
31JAN2012 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31JAN2012 666 writoff
31JAN2012 777 NPNA
;
run;
/*SQL, different construct comparing to Linlin's suggetion*/
proc sql;
select * from have group by account_number having sum(Arrears_Band='writoff')>=1;quit;
/*Interlace stacking*/
data want;
set have (in=up) have;
by account_number;
if first.account_number then call missing(flag);
if up and Arrears_Band='writoff' then flag+1;
if not up and flag>=1 then output;
drop flag;
run;
/*Classic 2X DOW*/
data want;
do until (last.account_number);
set have;
by account_number;
if Arrears_Band='writoff' then flag+1;
end;
do until (last.account_number);
set have;
by account_number;
if flag >=1 then output;
end;
call missing(flag);
drop flag;
run;
/*Merge*/
data want;
merge have (where=(Arrears_Band='writoff') in=a) have (in=b);
by account_number;
if a and b;
run;
/*Hash()*/
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have', multidata:'y');
h.definekey('account_number');
h.definedata(all:'y');
h.definedone();
end;
do until (last.account_number);
set have;
by account_number;
if Arrears_Band='writoff' then flag+1;
end;
if flag>=1 then do;
rc=h.find();
do rc=0 by 0 while (rc=0);
output;
rc=h.find_next();
end;
end;
call missing(flag);
drop flag rc;
run;
Haikuo
proc sql;
create table want as
select * from have
where account_number in (select account_number from have where Arrears_Band='writoff');
quit;
This is one of the classic topics on forum. In general, you would need two passes, no matter what kind of approach:
data have;
informat Current_date date9.;
Input Current_date Account_number Arrears_Band :$15.;
Format Current_date date9.;
cards;
31MAY2011 111 NPNA
30JUN2011 111 writoff
31JUL2011 111 NPNA
31AUG2011 111 NPNA
30SEP2011 111 NPNA
31OCT2011 111 NPNA
30NOV2011 111 NPNA
31DEC2011 111 NPNA
31JAN2012 111 NPNA
31DEC2011 222 NPNA
31JAN2012 222 NPNA
30NOV2010 333 NPNA
31DEC2010 333 NPNA
31JAN2011 333 NPNA
28FEB2011 333 NPNA
31MAR2011 333 writoff
30APR2011 333 NPNA
31MAY2011 333 NPNA
30JUN2011 333 NPNA
31JUL2011 333 NPNA
31AUG2011 333 NPNA
30SEP2011 333 NPNA
31OCT2011 333 NPNA
30NOV2011 333 NPNA
31DEC2011 333 NPNA
31JAN2012 333 NPNA
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
30JUN2010 444 Current
31JUL2010 444 Current
31AUG2010 444 Current
30SEP2010 444 Current
31OCT2010 444 Current
30NOV2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30APR2011 444 90 +
31MAY2011 444 90 +
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
30SEP2011 444 NPNA
31OCT2011 444 NPNA
30NOV2011 444 NPNA
31DEC2011 444 NPNA
31JAN2012 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31JAN2012 666 writoff
31JAN2012 777 NPNA
;
run;
/*SQL, different construct comparing to Linlin's suggetion*/
proc sql;
select * from have group by account_number having sum(Arrears_Band='writoff')>=1;quit;
/*Interlace stacking*/
data want;
set have (in=up) have;
by account_number;
if first.account_number then call missing(flag);
if up and Arrears_Band='writoff' then flag+1;
if not up and flag>=1 then output;
drop flag;
run;
/*Classic 2X DOW*/
data want;
do until (last.account_number);
set have;
by account_number;
if Arrears_Band='writoff' then flag+1;
end;
do until (last.account_number);
set have;
by account_number;
if flag >=1 then output;
end;
call missing(flag);
drop flag;
run;
/*Merge*/
data want;
merge have (where=(Arrears_Band='writoff') in=a) have (in=b);
by account_number;
if a and b;
run;
/*Hash()*/
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have', multidata:'y');
h.definekey('account_number');
h.definedata(all:'y');
h.definedone();
end;
do until (last.account_number);
set have;
by account_number;
if Arrears_Band='writoff' then flag+1;
end;
if flag>=1 then do;
rc=h.find();
do rc=0 by 0 while (rc=0);
output;
rc=h.find_next();
end;
end;
call missing(flag);
drop flag rc;
run;
Haikuo
proc sql;
create table want as
select * from have
group by Account_number
having sum(lowcase(Arrears_Band)='writoff')>0
order by Account_number,current_date;
quit;
proc print data=want;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.