Hi SAS Discussion Forum,
In the data set (belwo), bank_number, Account_number and current_date are the
variables that would uniquely identify a record.
You will see account_number 111 and 777 in two places but they are unique as they belong to 2 different banks.
data a;
informat Current_date date9.;
Input Current_date Account_number $12-14 Arrears_Band $17-23 bank_number;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA 10
31MAR2010 111 Current 10
31MAY2010 111 NPNA 10
31MAR2010 222 NPNA 20
31MAY2010 222 Current 20
31OCT2010 222 Current 20
30JUN2011 222 90 + 20
31JUL2011 222 NPNA 20
28FEB2010 444 30 - 60 30
31MAR2010 444 Current 30
30APR2010 444 30 - 60 50
31MAY2010 444 Current 50
28FEB2010 555 60 - 90 70
30JUN2011 555 90 + 70
31MAR2011 666 60 - 90 80
30JUN2011 666 90 + 80
28FEB2010 777 90 + 90
30JUN2011 777 Current 90
31MAR2011 111 90 + 100
30JUN2011 111 Current 100
31DEC2010 999 Current 10
31JAN2011 999 NPNA 10
28FEB2010 123 NPNA 20
31MAR2010 123 NPNA 20
30APR2010 123 NPNA 20
31DEC2010 777 Current 90
31JAN2011 777 NPNA 90
28FEB2011 777 NPNA 90
31MAR2011 777 NPNA 90
;
run;
/*Question:
I need to remove all the records that do not satisfy the below 2 conditions.
1) For a given 'account_number', when first occurrence of 'current_date'
=28FEB2010, 'Arrears_Band' could take any value
2) For a given 'account_number', when first occurrence of 'current_date'
^=28FEB2010, "Arrears_Band" must be "Current"
Answer:
All Records pertaining to combinations below should be completely removed from our data set using a suitable code (pink color records should be removed from data set).
Account_number Bank_number
222 20
444 50
666 80
111 100
My effort:
I have tried the following code (Bolotin, Haikuo, Dan et
al. gratefully acknowledged) but doesn’t work.
proc sort data=a out=TT;
by bank_number account_number current_date;
run;
data want;
set tt;
by bank_number account_number current_date;
if first.current_date
then _f1 = current_date ne '28FEB2010'd and Arrears_Band
ne 'Current' ;
retain _f1;
if _f1 ne 1 then output;
drop _:;
run;
Could anyone of you please help me to code this.
Thank you
Mirisage
All you need may only involve one small change in your original code:
data want;
set tt;
by bank_number account_number current_date;
if first.account_number
then _f1 = (current_date ne '28FEB2010'd and Arrears_Band
ne 'Current') ;
retain _f1;
if _f1 ne 1 then output;
drop _:;
run;
Haikuo
All you need may only involve one small change in your original code:
data want;
set tt;
by bank_number account_number current_date;
if first.account_number
then _f1 = (current_date ne '28FEB2010'd and Arrears_Band
ne 'Current') ;
retain _f1;
if _f1 ne 1 then output;
drop _:;
run;
Haikuo
If I understand what you mean.
data a; informat Current_date date9.; Input Current_date Account_number $12-14 Arrears_Band $17-23 bank_number; Format Current_date date9.; cards; 28FEB2010 111 NPNA 10 31MAR2010 111 Current 10 31MAY2010 111 NPNA 10 31MAR2010 222 NPNA 20 31MAY2010 222 Current 20 31OCT2010 222 Current 20 30JUN2011 222 90 + 20 31JUL2011 222 NPNA 20 28FEB2010 444 30 - 60 30 31MAR2010 444 Current 30 30APR2010 444 30 - 60 50 31MAY2010 444 Current 50 28FEB2010 555 60 - 90 70 30JUN2011 555 90 + 70 31MAR2011 666 60 - 90 80 30JUN2011 666 90 + 80 28FEB2010 777 90 + 90 30JUN2011 777 Current 90 31MAR2011 111 90 + 100 30JUN2011 111 Current 100 31DEC2010 999 Current 10 31JAN2011 999 NPNA 10 28FEB2010 123 NPNA 20 31MAR2010 123 NPNA 20 30APR2010 123 NPNA 20 31DEC2010 777 Current 90 31JAN2011 777 NPNA 90 28FEB2011 777 NPNA 90 31MAR2011 777 NPNA 90 ; run; data a temp(keep=group); set a ; by bank_number Account_number notsorted; if first.account_number then do; group+1; if Current_date ne '28FEB2010'd and Arrears_Band ne "Current" then output temp; end; output a; run; data want; merge a temp(in=inb); by group; if not inb; run;
Ksharp
Hi Haikuo and Ksharp.
Both of your codes are ideally working for the given sample data set.
However, when I applied these 2 codes to my 1 million data set, one code selected more records than the other.
To check, which code works for my original data set, I have used the procedure below and found that Haikuo's code works as I intended. I was not clever enough to layout the representative small dataset correctly.
Thanks again to both of you for this great help.
Regards
Mirisage
/*Checking if selected records satisfy my two conditions*/
data delete_this;
set want;
by bank_number account_number current_date;
if first.account_number;
run;
proc freq data=delete_this;
tables current_date *Arrears_Band;
run;
/*END. Checking if selected records satisfy my two conditions*/
/*Checking if non-selected records do not satisfy my two conditions*/
/*Sorting original data set*/
proc sort data=a out=b;
by bank_number account_number current_date; /*always I ought to go by this order of vars*/
run;
/*sorting created data set */
proc sort data=want out=c;
by bank_number account_number current_date; /*always I ought to go by this order of vars*/
run;
/*merging to seperate out non selected data set */
data merged;
merge b (in=x)
c (in=y)
;
by bank_number account_number current_date;
if x and not y;
run;
/*by group processing*/
data any_name;
set merged;
by bank_number account_number current_date;
if first.account_number;
run;
/*Checking. I found we are correct*/
/*not a single account had its first currnet_date =28FEB2010*/
/* but not a single account had "current" as its arrers_band*/
/*So, our removing of records is correct*/
proc freq data=any_name;
tables current_date*Arrears_Band;
run;
/*END*/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.