I have a data set below:
data have:
set test;
id_acc account_number account_status information_date ;
1111 999585548 AAA 30Jun2007
1111 999585548 AAA 31July2007
1111 999585544 D1A 31Aug2007
1111 999585547 D2A 30Sep2007
1111 999585540 D1D 31Oct2007
1111 999585544 AAA 30Nov2007
1111 999585547 AAA 31Dec2007
1111 999585540 D1D 31Jan2008
1111 999585544 D2D 28Feb2008
1111 999585547 D3D 31Mar2008
1111 999585540 D4D 30Apr2008
1111 999585540 D5D 31May2008
1111 999585540 AAA 30Jun2008
1111 999585540 AAA 31Jul2008
1111 999595540 D1D 31Aug2008
1111 999587540 LEP 31Oct2008
1111 999585540 LEP 30Nov2008
1112 999505548 AAA 30Jun2007
1112 999505548 AAA 31July2007
1112 999505544 D1A 31Aug2007
1112 999505547 D2A 30Sep2007
1112 999505540 D1D 31Oct2007
1112 999505544 AAA 30Nov2007
1112 999505547 AAA 31Dec2007
1112 999505540 D1D 31Jan2008
1112 999505544 D2D 28Feb2008
1112 999505547 D3D 31Mar2008
1112 999505540 LEP 30Apr2008
1112 999505540 LEP 31May2008
1112 999505540 LEP 30Jun2008
1112 999505540 LEP 31Jul2008
1112 999505540 LEP 31Aug2008
1112 999505540 LEP 31Oct2008
1112 999505540 LEP 30Nov2008
1113 990005548 AAA 30Jun2007
1113 995005548 AAA 31July2007
1113 995050544 D1A 31Aug2007
1113 999500547 D2A 30Sep2007
1113 999500540 AAA 31Oct2007
1113 999500544 D1A 30Nov2007
1113 999580547 AAA 31Dec2007
1113 999580540 AAA 31Jan2008
1113 999580544 D1A 28Feb2008
1113 999580547 D2A 31Mar2008
1113 999580540 AAA 30Apr2008
1113 999580540 D2D 31May2008
1113 999585540 D4D 30Jun2008
1113 999585540 D5D 31Jul2008
1113 999585540 AAA 31Aug2008
1113 999585540 AAA 31Oct2008
1113 999585540 LEP 30Nov2008
1113 999585540 LEP 31DEC2008
I WANT THE OUTPUT AS MENTIONED BELOW:
i want to calculate different spam of account status and flag their information date in the flag variable:
Also the span may increase and depends upon the status of id_acc.(ignore the account number).
When we fine LEP i.e L1_start then we have to keep as it is..no change.
id_acc account_status information_date (d1_start d1_end) (d2_start d2_end) (d3_start d3_end) (d4_start d4_end) (L1_start_end) (a_start L_end)
1111 D1A 31Aug2007 (31Aug2007-31Oct2007) (31Jan2008-31May2008) (31Aug2008-31Aug2008) --- 31Aug2008 --- ----
1112 D1A 31Aug2007 (31Aug2007-31Oct2007) (31Jan2008-31Mar2008) --- --- 30Apr2008 --- -- -
1113 D1A 31Aug2007 (31Aug2007-30Sep2007) (30Nov2007-30Nov2007) (28Feb2008-31Mar2008) (31May2008-31Jul2008) (31Oct2008 30Nov2008)
Can you provide a little more detail, I'm having trouble understanding what you are looking for. Looking at id 1111, account_status AAA can you explain what that would look like in the output?
or maybe all of id_acc 1111, that'll help.
Hi Mark,
for AAA we don't have to worry about it..
We have to find only those informatin date instances for each id where it encounters D and end in D or LEP.or aaa to lep
eg.
for id_acc 1111:
id_acc account_status information_date start_flag1 end_flag1 start_flag_n end_flag_n;
1111 D1A 31Aug2007 31Aug2007 31Oct2007 31Jan2008 31May2008
For other id_acc the flag may be on more than 2 and it can be n.
only three things to consider are:
1-when it starts in D and end in D series(it can go in subsequent D series and end in D series for same or different id_acc)
2-when it starts in D and end in L series.(it is fixed as it will not go out of L series after that)
3-when it starts in AAA and end in L series.(it is fixed as it will not go out of L series after that
Hope it is clear now?
Thanks for Help.
Maybe someone else can make sense of it but I cannot. Here are two things you say that seem contradicting:
for AAA we don't have to worry about it..
in the next sentence...
or aaa to lep
Please provide more information.
Hi Mark,
Below is the same question AS INITIAL WAS CONFUSING..SORRY FOR TAT:
id_acc month1 month2 month3 month4 month5 month6 month7 month8 month9 month10:
111 D1D D2D AAA AAA D1A D2A AAA D1A D2A LEP
222 AAA D1D D2D D3D AAA AAA D1D D2D LEP LEP
333 D1D D2D D3D D4D D5D D6D D7D LEP LEP LEP
Now I want the output in the below and only flag mentioned in last.
id_acc month1 ---- month10 d1_start d1_end d2_start d2_end d3_start d3_end
111 month1 month2 month5 month6 month8 month9
222 month2 month4 month7 month8 ----- ----
333 month1 month7 ---- ---- --- ----
Basically the month1 month2 are information dates for unique month and i need to calculate the amount beween different instances.
The d1_start d1_end d2_start d2_end may go to multiple times(like dn_start dn_end ) till 93 month.
This might be what you want. I'm guessing there is a more elegant way to put it all together but this is what I think you are asking for. If all the variables aren't there you should be able to copy/paste and fill in the blanks, this should be a good start:
data have;
infile cards;
informat information_date date9.;
format information_date date9.;
input id_acc $ account_number $ account_status $ information_date;
cards;
1111 999585548 AAA 30Jun2007
1111 999585548 AAA 31July2007
1111 999585544 D1A 31Aug2007
1111 999585547 D2A 30Sep2007
1111 999585540 D1D 31Oct2007
1111 999585544 AAA 30Nov2007
1111 999585547 AAA 31Dec2007
1111 999585540 D1D 31Jan2008
1111 999585544 D2D 28Feb2008
1111 999585547 D3D 31Mar2008
1111 999585540 D4D 30Apr2008
1111 999585540 D5D 31May2008
1111 999585540 AAA 30Jun2008
1111 999585540 AAA 31Jul2008
1111 999595540 D1D 31Aug2008
1111 999587540 LEP 31Oct2008
1111 999585540 LEP 30Nov2008
1112 999505548 AAA 30Jun2007
1112 999505548 AAA 31July2007
1112 999505544 D1A 31Aug2007
1112 999505547 D2A 30Sep2007
1112 999505540 D1D 31Oct2007
1112 999505544 AAA 30Nov2007
1112 999505547 AAA 31Dec2007
1112 999505540 D1D 31Jan2008
1112 999505544 D2D 28Feb2008
1112 999505547 D3D 31Mar2008
1112 999505540 LEP 30Apr2008
1112 999505540 LEP 31May2008
1112 999505540 LEP 30Jun2008
1112 999505540 LEP 31Jul2008
1112 999505540 LEP 31Aug2008
1112 999505540 LEP 31Oct2008
1112 999505540 LEP 30Nov2008
1113 990005548 AAA 30Jun2007
1113 995005548 AAA 31July2007
1113 995050544 D1A 31Aug2007
1113 999500547 D2A 30Sep2007
1113 999500540 AAA 31Oct2007
1113 999500544 D1A 30Nov2007
1113 999580547 AAA 31Dec2007
1113 999580540 AAA 31Jan2008
1113 999580544 D1A 28Feb2008
1113 999580547 D2A 31Mar2008
1113 999580540 AAA 30Apr2008
1113 999580540 D2D 31May2008
1113 999585540 D4D 30Jun2008
1113 999585540 D5D 31Jul2008
1113 999585540 AAA 31Aug2008
1113 999585540 AAA 31Oct2008
1113 999585540 LEP 30Nov2008
1113 999585540 LEP 31DEC2008
;
run;
proc sort data=have;by id_acc account_status information_date;
data prep;
length new_acc $2.;
set have;
new_acc = substr(account_status,1,2);
run;
data want all;
format d1_start d1_end d2_start d2_end d3_start d3_end mmddyy10.;
retain d1_start d2_start d3_start;
set prep;
by new_acc notsorted;
if id_acc = '1111' then do;
if first.new_acc and new_acc = 'D1' then D1_Start = information_date;
if last.new_acc and new_acc = 'D1' then D1_End = information_date;
if first.new_acc and new_acc = 'D2' then D2_Start = information_date;
if last.new_acc and new_acc = 'D2' then D2_End = information_date;
if first.new_acc and new_acc = 'D3' then D3_Start = information_date;
if last.new_acc and new_acc = 'D3' then D3_End = information_date;
end;
if id_acc = '1112' then do;
if first.new_acc and new_acc = 'D1' then D1_Start = information_date;
if last.new_acc and new_acc = 'D1' then D1_End = information_date;
if first.new_acc and new_acc = 'D2' then D2_Start = information_date;
if last.new_acc and new_acc = 'D2' then D2_End = information_date;
if first.new_acc and new_acc = 'D3' then D3_Start = information_date;
if last.new_acc and new_acc = 'D3' then D3_End = information_date;
end;
if not missing(d1_end) then d1_diff = intck('days',d1_start,d1_end);
if not missing(d2_end) then d1_diff = intck('days',d2_start,d2_end);
if not missing(d3_end) then d1_diff = intck('days',d3_start,d3_end);
output all;
if not missing(d1_end) or not missing(d2_end) or not missing(d3_end) then output want;
run;
Hi Mark,
I have around 5L id_acc and it is impossible to put condition for all.
Thanks for your help.
Here is a solution with a few problems listed at the bottom:
data have;
infile cards;
informat information_date date9.;
format information_date date9.;
input id_acc $ account_number $ account_status $ information_date;
cards;
1111 999585548 AAA 30Jun2007
1111 999585548 AAA 31July2007
1111 999585544 D1A 31Aug2007
1111 999585547 D2A 30Sep2007
1111 999585540 D1D 31Oct2007
1111 999585544 AAA 30Nov2007
1111 999585547 AAA 31Dec2007
1111 999585540 D1D 31Jan2008
1111 999585544 D2D 28Feb2008
1111 999585547 D3D 31Mar2008
1111 999585540 D4D 30Apr2008
1111 999585540 D5D 31May2008
1111 999585540 AAA 30Jun2008
1111 999585540 AAA 31Jul2008
1111 999595540 D1D 31Aug2008
1111 999587540 LEP 31Oct2008
1111 999585540 LEP 30Nov2008
1112 999505548 AAA 30Jun2007
1112 999505548 AAA 31July2007
1112 999505544 D1A 31Aug2007
1112 999505547 D2A 30Sep2007
1112 999505540 D1D 31Oct2007
1112 999505544 AAA 30Nov2007
1112 999505547 AAA 31Dec2007
1112 999505540 D1D 31Jan2008
1112 999505544 D2D 28Feb2008
1112 999505547 D3D 31Mar2008
1112 999505540 LEP 30Apr2008
1112 999505540 LEP 31May2008
1112 999505540 LEP 30Jun2008
1112 999505540 LEP 31Jul2008
1112 999505540 LEP 31Aug2008
1112 999505540 LEP 31Oct2008
1112 999505540 LEP 30Nov2008
1113 990005548 AAA 30Jun2007
1113 995005548 AAA 31July2007
1113 995050544 D1A 31Aug2007
1113 999500547 D2A 30Sep2007
1113 999500540 AAA 31Oct2007
1113 999500544 D1A 30Nov2007
1113 999580547 AAA 31Dec2007
1113 999580540 AAA 31Jan2008
1113 999580544 D1A 28Feb2008
1113 999580547 D2A 31Mar2008
1113 999580540 AAA 30Apr2008
1113 999580540 D2D 31May2008
1113 999585540 D4D 30Jun2008
1113 999585540 D5D 31Jul2008
1113 999585540 AAA 31Aug2008
1113 999585540 AAA 31Oct2008
1113 999585540 LEP 30Nov2008
1113 999585540 LEP 31DEC2008
;
run;
proc sort data=have;by id_acc account_status information_date;
data prep;
length new_acc $2.;
set have;
new_acc = substr(account_status,1,2);
run;
data want all;
format start_date end_date mmddyy10.;
retain start_date ;
set prep;
by new_acc notsorted;
if first.new_acc then start_date = information_date;
if last.new_acc then end_date = information_date;
if not missing(end_date) and new_acc ne 'AA' then do;
date_diff = intck('days',start_date,end_date);
end;
output all;
if not missing(end_date) then output want;
run;
proc sort data=want;by id_acc start_date end_date;
proc transpose data=want out=trans_want;*(drop=_name_);by id_acc account_status;var start_date end_date date_diff;run;
proc sort data=trans_want;by id_acc _name_;
proc transpose data=trans_want out=trans_want2(drop=_name_ );by id_acc;id account_status _name_;var col1;run;
/*
The date_diff and date fields are in the same column after the first transpose, therefore they are the same format and the mmddyy10. format is lost. The columns are out of order compared to your solution, you can do a proc sql and select the columns in the order you want them after this step, if you do that you can also add whatever formatting you want for the dates
*/
Hi Mark,
Thanks for Help..I guess it might help me..
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.