Hi SAS Forum,
I have three accounts in this longitudinal dataset. This shows how accounts have aged over 4 months (note 0 is the first month).
Variables TRANSIT, ACCTNUM and age together form an unique record.
data have;
input TRANSIT ACCTNUM age bal stat $ 14 -20;
cards;
10 111 0 220 1-30
10 222 0 450 31-60
20 111 0 300 61+
10 111 1 100 CURRENT
10 222 1 5 61+
20 111 1 75 NPNA
10 111 2 100 CURRENT
10 222 2 300 CURRENT
20 111 2 100 DWO
10 111 3 110 1-30
10 222 3 320 61+
20 111 3 105 DWO
;
RUN;
In transit 10, acctnum 111 started its life in age 0 with “1-30” stat (stat means status).
This account then cycled into “CURRENT” stat when age = 1, and remained in the same stat when age=2 also but deteriorated into “1-30” stat when age=3.
Question;
I need to over write all the subsequent values in variable “stat” with “CURRENT”, after an account once hit “CURRENT”.
So, the final data set I need is this:
data want;
input TRANSIT ACCTNUM age bal stat $ 14 -20;
cards;
10 111 0 220 1-30
10 222 0 450 31-60
20 111 0 300 61+
10 111 1 100 CURRENT
10 222 1 5 61+
20 111 1 75 NPNA
10 111 2 100 CURRENT
10 222 2 300 CURRENT
20 111 2 100 DWO
10 111 3 110 CURRENT /* in "have" dataset, this was 1-30”*/
10 222 3 320 CURRENT /*in "have" dataset, this was 61+*/
20 111 3 105 DWO
;
RUN;
I HAVE tried THIS:
/*sorting*/
proc sort data=have;
by transit acctnum age;
run;
/*Interlace stacking – Haikuo is acknowledged*/
data want;
set have (in=up) have;
by transit acctnum age;
if first.acctnum then call missing(flag);
if up and STAT='CURRENT' then flag+1;
if not up and flag>=1 then output;
drop flag;
run;
But I do not know how to tweak the above code to get the intended results (or create an alternative code).
Could anyone help me?
Thanks
Mirisage
The easiest way to do it is Hash() approach, no presort needed:
data have;
input TRANSIT ACCTNUM age bal stat $ 14 -21;
cards;
10 111 0 220 1-30
10 222 0 450 31-60
20 111 0 300 61+
10 111 1 100 CURRENT
10 222 1 5 61+
20 111 1 75 NPNA
10 111 2 100 CURRENT
10 222 2 300 CURRENT
20 111 2 100 DWO
10 111 3 110 1-30
10 222 3 320 61+
20 111 3 105 DWO
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h();
h.definekey('transit','acctnum');
h.definedata('stat');
h.definedone();
end;
set have;
if stat='CURRENT' THEN RC=h.ref();
rc=h.find();
drop rc;
run;
Haikuo
The easiest way to do it is Hash() approach, no presort needed:
data have;
input TRANSIT ACCTNUM age bal stat $ 14 -21;
cards;
10 111 0 220 1-30
10 222 0 450 31-60
20 111 0 300 61+
10 111 1 100 CURRENT
10 222 1 5 61+
20 111 1 75 NPNA
10 111 2 100 CURRENT
10 222 2 300 CURRENT
20 111 2 100 DWO
10 111 3 110 1-30
10 222 3 320 61+
20 111 3 105 DWO
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h();
h.definekey('transit','acctnum');
h.definedata('stat');
h.definedone();
end;
set have;
if stat='CURRENT' THEN RC=h.ref();
rc=h.find();
drop rc;
run;
Haikuo
Hi Haikuo,
Thank you very much for this code help which works well for my larger data set as well.
If you do not mind, pl let me to ask these 2 questions too but please consider them only if you have time.
1). I need to explain my co-worker how I have changed subsequent values into "CURRENT". Hash() approach is very much frighteneing me. Can we tweak your inteerlace stacking appraoch or any other merging, or proc sql method to generate the same resutls.
2). Although I said over writing the existing "stat" variable, I later found I still need the original "stat" variable. So, can we create a new variable, say "stat_new" instead of overwriting.
Thanks again for your help.
Mirisage
If not using Hash(), then I need to sort your data, and if you really want to maintain the original order, you just need to merge it back.
data have;
input TRANSIT ACCTNUM age bal stat $ 14 -21;
cards;
10 111 0 220 1-30
10 222 0 450 31-60
20 111 0 300 61+
10 111 1 100 CURRENT
10 222 1 5 61+
20 111 1 75 NPNA
10 111 2 100 CURRENT
10 222 2 300 CURRENT
20 111 2 100 DWO
10 111 3 110 1-30
10 222 3 320 61+
20 111 3 105 DWO
;
proc sort data=have out=have;
by TRANSIT ACCTNUM age;
run;
data want;
set have;
by TRANSIT ACCTNUM age;
LENGTH _T $ 8;
retain _t ;
if first.acctnum then _t='';
if stat='CURRENT' THEN _T=STAT;
NEW_STAT=COALESCEC(_T,STAT);
DROP _T;
RUN;
Haikuo
OR using Stack Skill:
data have; input TRANSIT ACCTNUM age bal stat : $20.; cards; 10 111 0 220 1-30 10 222 0 450 31-60 20 111 0 300 61+ 10 111 1 100 CURRENT 10 222 1 5 61+ 20 111 1 75 NPNA 10 111 2 100 CURRENT 10 222 2 300 CURRENT 20 111 2 100 DWO 10 111 3 110 1-30 10 222 3 320 61+ 20 111 3 105 DWO ; run; data want; set have; array x{0:2} $ _temporary_; stat=coalescec(x{mod(_n_,3)},stat); if stat='CURRENT' then x{mod(_n_,3)}=stat; run;
Ksharp
Hi Haikuo,
Thank you very much. Your code is great!
Hi Ksharp,
I applied your code. It runs smoothly but the results are not as intended. Can we tweak your code to generate the same resutls produced by Haikuo to satisfy our condition below.
"Once an account hits "CURRENT" under the "stat" variable, all the subsequent values for "stat" variable should be "CURRENT". And the "so created" new values for "stat" variable have to be included under a new variable called "New_stat".
This is the correct resutls from Haikuo
TRANSIT | ACCTNUM | age | bal | stat | NEW_STAT |
10 | 111 | 0 | 220 | 1 - 30 | 1 - 30 |
10 | 111 | 1 | 100 | CURRENT | CURRENT |
10 | 111 | 2 | 100 | CURRENT | CURRENT |
10 | 111 | 3 | 110 | 1 - 30 | CURRENT |
10 | 222 | 0 | 450 | 31-60 | 31-60 |
10 | 222 | 1 | 5 | 61+ | 61+ |
10 | 222 | 2 | 300 | CURRENT | CURRENT |
10 | 222 | 3 | 320 | 61+ | CURRENT |
20 | 111 | 0 | 300 | 61+ | 61+ |
20 | 111 | 1 | 75 | NPNA | NPNA |
20 | 111 | 2 | 100 | DWO | DWO |
20 | 111 | 3 | 105 | DWO | DWO |
This is Ksharp's results
TRANSIT | ACCTNUM | age | bal | stat |
10 | 111 | 0 | 220 | 1 - 30 |
10 | 111 | 1 | 100 | CURRENT |
10 | 111 | 2 | 100 | CURRENT |
10 | 111 | 3 | 110 | 1 - 30 |
10 | 222 | 0 | 450 | CURRENT |
10 | 222 | 1 | 5 | CURRENT |
10 | 222 | 2 | 300 | CURRENT |
10 | 222 | 3 | 320 | CURRENT |
20 | 111 | 0 | 300 | CURRENT |
20 | 111 | 1 | 75 | CURRENT |
20 | 111 | 2 | 100 | CURRENT |
20 | 111 | 3 | 105 | CURRENT |
Thanks
Mirisage
Hi Mirisage,
Ksharp's code was based on these assumptions
1) The data came in not sorted.
2). You only have 3 accounts (TRANSIT + ACCTNUM combination), or if you know the total number of accounts ahead of time, the code can be tweaked accordingly.
3) No skipping. Meaning every accounts should have same amount of records.
Haikuo
As HaiKuo pointed out , as long as you make sure your data is under these conditions .
And BTW : Did you run my code after sorting the original data?
Hi Haikuo,
Many thanks for these details.
Hi Ksharp,
Yes, I have run your code after sorting the original data.
Next I have run it without sorting, then it produced intended resutls, which are exactly same as Haikuo's resutls.
Thanks again to both of you.
Mirisage
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.