DATA Step, Macro, Functions and more

Data Cleaning Logic

Reply
Occasional Contributor
Posts: 8

Data Cleaning Logic

Hi,

 

I am trying to clean my data based on below items. Could you help me to write code for this? 

 

  1. From "AC" to "CL" status, but never be identified as default=1 where accounts stay two more months and after that I should remove all accounts. Below example my code should delete Q4 row.

quarter

account

status

DEFault

 

Q1

230

AC

0

 

Q2

230

CL

0

Keep

Q3

230

CL

0

Keep

Q4

230

CL

0

Remove

Q1

230

CL

0

Remove

Q2

230

CL

0

Remove

Q3

230

CL

0

Remove

Q4

230

CL

0

Remove

 

  1. Be identified as default in later days. If account closed but have default later days, we should keep those. But after default flag where we see (below example Q2),  Q3 and Q4 quarters should remove.

quarter

account

status

DEFault

  

Q1

230

AC

0

  

Q2

230

CL

0

Keep

 

Q2

230

CL

0

Keep

 

Q3

230

CL

0

Keep

 

Q1

230

CL

0

Keep

 

Q2

230

CL

1

Keep

 

Q3

230

CL

0

Remve

Already default

Q4

230

CL

0

Remove

Already default

 

Super User
Posts: 9,923

Re: Data Cleaning Logic

[ Edited ]

I suggest two data steps in succession, the first one looks for any default value, the second one creates the final result:

data have;
input year quarter $ account $ status $ default;
cards;
1 Q1 230 AC 0
1 Q2 230 CL 0
1 Q3 230 CL 0
1 Q4 230 CL 0
2 Q1 230 CL 0
2 Q2 230 CL 0
2 Q3 230 CL 0
2 Q4 230 CL 0
1 Q1 231 AC 0
1 Q2 231 CL 0
1 Q3 231 CL 0
1 Q4 231 CL 0
2 Q1 231 CL 0
2 Q2 231 CL 1
2 Q3 231 CL 0
2 Q4 231 CL 0
;
run;

data
  intermediate
  default (keep=account)
;
set have;
by account year quarter; * year and quarter only to make the step fail if dataset is not sorted correctly;
retain def_flag;
if first.account then def_flag = 0;
if default
then do;
  output intermediate; * we want to keep that one as the last;
  if not def_flag then output default;
  def_flag = 1;
end;
if not def_flag then output intermediate;
drop def_flag;
run;

data want;
merge
  intermediate (in=a)
  default (in=b)
;
by account;
retain counter;
if a;
oldstat = lag(status);
if first.account then counter = .;
else if oldstat = 'AC' and status = 'CL'
then counter = 1;
else if status = 'CL'
then counter + 1;
if counter < 3 or b;
drop counter oldstat;
run;

proc print data=want noobs;
run;

Result:

year    quarter    account    status    default

  1       Q1         230        AC         0   
  1       Q2         230        CL         0   
  1       Q3         230        CL         0   
  1       Q1         231        AC         0   
  1       Q2         231        CL         0   
  1       Q3         231        CL         0   
  1       Q4         231        CL         0   
  2       Q1         231        CL         0   
  2       Q2         231        CL         1   

Take note:

  • I added a year variable, so that the timeline is made sure of
  • I created a second account so that both of your examples fit into one dataset
  • I used a data step with cards to present example data; please use this technique in the future, so we can recreate your data with a simple copy/paste and submit
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 8

Re: Data Cleaning Logic

Posted in reply to KurtBremser

Thank you very much, this is very helpful. I notice that I have this below conversion, so I have duplicate account, quarters now. How I can deal with you code for making data quarterly?

 

transform the data into quarterly end in month 2,5,8,11

12,1,2-Q1

3,4,5 -Q2

6,7,8 -Q3

9,10,11 -Q4

 

 

year    quarter    account    status    default

  1       Q1         230        AC         0  
  1       Q2         230        CL         0  
  1       Q3         230        CL         0  
  1       Q1         231        AC         0  
  1       Q2         231        CL         0  
  1       Q3         231        CL         0  
  1       Q4         231        CL         0  
  2       Q1         231        CL         0  
  2       Q2         231        CL         1  

Super User
Posts: 9,923

Re: Data Cleaning Logic

Please provide example data in a data step, like I showed you.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Frequent Contributor
Posts: 109

Re: Data Cleaning Logic

[ Edited ]

You have to divide your code to 2 parts.

  1. first mark accounts to delete based on the default flag. A sample code written please fix it before running.
    data yor_data;
    set yor_data;
    if _n_=1 then flagship='Keep'; 
    retain flagship;
    by quarter account;
    if first.account then flagship='Keep'; 
    if flagship ne 'Remove' then do
      if DEFault=1 then 
        flag= 'keep'
        flagship='Remove' 
    end
    else
      flag= 'Remove'
      flagship='Remove'
    end
    run;
    Write a proc sql to group by account and get sum of Default and based on that output create a format. If Account have a default then it is on one category else the account in some default category
  2. Then It is fairly simple you need to write a code similar to above. and you have that format where the account says have a default based on the format skip it from processing as it is already manipulated in above code. And remember to use a accumulator variable for counting when the count is > 2 then put all the flag to delete.

This is a tricky code. Put your mind for 1 hour, it will be fixed. I do not have the SAS Engine with me to compile the code that's why you are getting this code skeleton. Sorry for the issues. Hope it helps.

Ask a Question
Discussion stats
  • 4 replies
  • 122 views
  • 1 like
  • 3 in conversation