BookmarkSubscribeRSS Feed
blue34
Calcite | Level 5

Hi,

 

I am trying to clean my data based on below items. Could you help me to write code for this? 

 

  1. From "AC" to "CL" status, but never be identified as default=1 where accounts stay two more months and after that I should remove all accounts. Below example my code should delete Q4 row.

quarter

account

status

DEFault

 

Q1

230

AC

0

 

Q2

230

CL

0

Keep

Q3

230

CL

0

Keep

Q4

230

CL

0

Remove

Q1

230

CL

0

Remove

Q2

230

CL

0

Remove

Q3

230

CL

0

Remove

Q4

230

CL

0

Remove

 

  1. Be identified as default in later days. If account closed but have default later days, we should keep those. But after default flag where we see (below example Q2),  Q3 and Q4 quarters should remove.

quarter

account

status

DEFault

  

Q1

230

AC

0

  

Q2

230

CL

0

Keep

 

Q2

230

CL

0

Keep

 

Q3

230

CL

0

Keep

 

Q1

230

CL

0

Keep

 

Q2

230

CL

1

Keep

 

Q3

230

CL

0

Remve

Already default

Q4

230

CL

0

Remove

Already default

 

4 REPLIES 4
Kurt_Bremser
Super User

I suggest two data steps in succession, the first one looks for any default value, the second one creates the final result:

data have;
input year quarter $ account $ status $ default;
cards;
1 Q1 230 AC 0
1 Q2 230 CL 0
1 Q3 230 CL 0
1 Q4 230 CL 0
2 Q1 230 CL 0
2 Q2 230 CL 0
2 Q3 230 CL 0
2 Q4 230 CL 0
1 Q1 231 AC 0
1 Q2 231 CL 0
1 Q3 231 CL 0
1 Q4 231 CL 0
2 Q1 231 CL 0
2 Q2 231 CL 1
2 Q3 231 CL 0
2 Q4 231 CL 0
;
run;

data
  intermediate
  default (keep=account)
;
set have;
by account year quarter; * year and quarter only to make the step fail if dataset is not sorted correctly;
retain def_flag;
if first.account then def_flag = 0;
if default
then do;
  output intermediate; * we want to keep that one as the last;
  if not def_flag then output default;
  def_flag = 1;
end;
if not def_flag then output intermediate;
drop def_flag;
run;

data want;
merge
  intermediate (in=a)
  default (in=b)
;
by account;
retain counter;
if a;
oldstat = lag(status);
if first.account then counter = .;
else if oldstat = 'AC' and status = 'CL'
then counter = 1;
else if status = 'CL'
then counter + 1;
if counter < 3 or b;
drop counter oldstat;
run;

proc print data=want noobs;
run;

Result:

year    quarter    account    status    default

  1       Q1         230        AC         0   
  1       Q2         230        CL         0   
  1       Q3         230        CL         0   
  1       Q1         231        AC         0   
  1       Q2         231        CL         0   
  1       Q3         231        CL         0   
  1       Q4         231        CL         0   
  2       Q1         231        CL         0   
  2       Q2         231        CL         1   

Take note:

  • I added a year variable, so that the timeline is made sure of
  • I created a second account so that both of your examples fit into one dataset
  • I used a data step with cards to present example data; please use this technique in the future, so we can recreate your data with a simple copy/paste and submit
blue34
Calcite | Level 5

Thank you very much, this is very helpful. I notice that I have this below conversion, so I have duplicate account, quarters now. How I can deal with you code for making data quarterly?

 

transform the data into quarterly end in month 2,5,8,11

12,1,2-Q1

3,4,5 -Q2

6,7,8 -Q3

9,10,11 -Q4

 

 

year    quarter    account    status    default

  1       Q1         230        AC         0  
  1       Q2         230        CL         0  
  1       Q3         230        CL         0  
  1       Q1         231        AC         0  
  1       Q2         231        CL         0  
  1       Q3         231        CL         0  
  1       Q4         231        CL         0  
  2       Q1         231        CL         0  
  2       Q2         231        CL         1  

Satish_Parida
Lapis Lazuli | Level 10

You have to divide your code to 2 parts.

  1. first mark accounts to delete based on the default flag. A sample code written please fix it before running.
    data yor_data;
    set yor_data;
    if _n_=1 then flagship='Keep'; 
    retain flagship;
    by quarter account;
    if first.account then flagship='Keep'; 
    if flagship ne 'Remove' then do
      if DEFault=1 then 
        flag= 'keep'
        flagship='Remove' 
    end
    else
      flag= 'Remove'
      flagship='Remove'
    end
    run;
    Write a proc sql to group by account and get sum of Default and based on that output create a format. If Account have a default then it is on one category else the account in some default category
  2. Then It is fairly simple you need to write a code similar to above. and you have that format where the account says have a default based on the format skip it from processing as it is already manipulated in above code. And remember to use a accumulator variable for counting when the count is > 2 then put all the flag to delete.

This is a tricky code. Put your mind for 1 hour, it will be fixed. I do not have the SAS Engine with me to compile the code that's why you are getting this code skeleton. Sorry for the issues. Hope it helps.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 914 views
  • 1 like
  • 3 in conversation