BookmarkSubscribeRSS Feed
pankak
Calcite | Level 5

I have a data set below:

data have:

set test;

id_acc  account_number  account_status       information_date ;

1111    999585548           AAA                  30Jun2007

1111    999585548           AAA                  31July2007

1111    999585544           D1A                  31Aug2007

1111    999585547           D2A                  30Sep2007

1111    999585540           D1D                  31Oct2007

1111    999585544           AAA                  30Nov2007

1111    999585547           AAA                  31Dec2007

1111    999585540           D1D                  31Jan2008

1111    999585544           D2D                  28Feb2008

1111    999585547           D3D                  31Mar2008

1111    999585540           D4D                  30Apr2008

1111    999585540           D5D                  31May2008

1111    999585540           AAA                  30Jun2008

1111    999585540           AAA                  31Jul2008

1111    999595540           D1D                  31Aug2008

1111    999587540           LEP                  31Oct2008

1111    999585540           LEP                  30Nov2008

1112    999505548           AAA                  30Jun2007

1112    999505548           AAA                  31July2007

1112    999505544           D1A                  31Aug2007

1112    999505547           D2A                  30Sep2007

1112    999505540           D1D                  31Oct2007

1112    999505544           AAA                  30Nov2007

1112    999505547           AAA                  31Dec2007

1112    999505540           D1D                  31Jan2008

1112    999505544           D2D                  28Feb2008

1112    999505547           D3D                  31Mar2008

1112    999505540           LEP                  30Apr2008

1112    999505540           LEP                  31May2008

1112    999505540           LEP                  30Jun2008

1112    999505540           LEP                  31Jul2008

1112    999505540           LEP                  31Aug2008

1112    999505540           LEP                  31Oct2008

1112    999505540           LEP                  30Nov2008

1113    990005548           AAA                  30Jun2007

1113    995005548           AAA                  31July2007

1113    995050544           D1A                  31Aug2007

1113    999500547           D2A                  30Sep2007

1113    999500540           AAA                  31Oct2007

1113    999500544           D1A                  30Nov2007

1113    999580547           AAA                  31Dec2007

1113    999580540           AAA                  31Jan2008

1113    999580544           D1A                  28Feb2008

1113    999580547           D2A                  31Mar2008

1113    999580540           AAA                  30Apr2008

1113    999580540           D2D                  31May2008

1113    999585540           D4D                  30Jun2008

1113    999585540           D5D                  31Jul2008

1113    999585540           AAA                  31Aug2008

1113    999585540           AAA                  31Oct2008

1113    999585540           LEP                  30Nov2008

1113    999585540           LEP                  31DEC2008

I WANT THE OUTPUT AS MENTIONED BELOW:

i want to calculate different spam of account status and flag their information date in the flag variable:

Also the span may increase and depends upon the status of id_acc.(ignore the account number).

When we fine LEP i.e L1_start then we have to keep as it is..no change.

id_acc  account_status  information_date (d1_start d1_end)        (d2_start d2_end)              (d3_start d3_end)      (d4_start d4_end)           (L1_start_end)                (a_start L_end)

1111    D1A             31Aug2007            (31Aug2007-31Oct2007)  (31Jan2008-31May2008)  (31Aug2008-31Aug2008)     ---                           31Aug2008                         ---    ----

1112    D1A             31Aug2007            (31Aug2007-31Oct2007)  (31Jan2008-31Mar2008)      ---                    ---                                          30Apr2008                          ---    -- -

1113    D1A             31Aug2007           (31Aug2007-30Sep2007)   (30Nov2007-30Nov2007)  (28Feb2008-31Mar2008)   (31May2008-31Jul2008)                     (31Oct2008  30Nov2008)

9 REPLIES 9
Steelers_In_DC
Barite | Level 11

Can you provide a little more detail, I'm having trouble understanding what you are looking for.  Looking at id 1111, account_status AAA can you explain what that would look like in the output?

Steelers_In_DC
Barite | Level 11

or maybe all of id_acc 1111, that'll help.

pankak
Calcite | Level 5

Hi Mark,

for AAA we don't have to worry about it..

We have to find only those informatin date instances for each id where it encounters D and end in D or LEP.or aaa to lep

eg.

for id_acc 1111:

id_acc  account_status  information_date       start_flag1          end_flag1       start_flag_n end_flag_n;


1111    D1A                      31Aug2007           31Aug2007       31Oct2007        31Jan2008    31May2008


For other id_acc the flag may be on more than 2 and it can be n.


only three things to consider are:


1-when it starts in D and end in D series(it can go in subsequent D series and end in D series for same or different id_acc)                                                                                                                          



2-when it starts in D and end in L series.(it is fixed as it will not go out of L series after that)



3-when it starts in AAA and end in L series.(it is fixed as it will not go out of L series after that





Hope it is clear now?

Thanks for Help.

Steelers_In_DC
Barite | Level 11

Maybe someone else can make sense of it but I cannot.  Here are two things you say that seem contradicting:

for AAA we don't have to worry about it..

in the next sentence...

or aaa to lep

Please provide more information.

pankak
Calcite | Level 5

Hi Mark,

Below is the same question AS INITIAL WAS CONFUSING..SORRY FOR TAT:

id_acc  month1 month2 month3 month4 month5 month6 month7 month8 month9 month10:

111      D1D     D2D     AAA      AAA    D1A     D2A        AAA   D1A     D2A    LEP

222      AAA      D1D    D2D      D3D     AAA    AAA        D1D    D2D    LEP    LEP

333      D1D     D2D    D3D      D4D     D5D     D6D       D7D      LEP    LEP   LEP

Now I want the output in the below and only flag mentioned in last.

id_acc  month1 ---- month10  d1_start    d1_end       d2_start     d2_end    d3_start    d3_end

111                                     month1      month2      month5       month6    month8     month9

222                                      month2     month4       month7       month8    -----          ----

333                                       month1     month7           ----          ----          ---            ----



Basically the month1 month2 are information dates for unique month and i need to calculate the amount beween different instances.


The d1_start    d1_end       d2_start     d2_end    may go to multiple times(like dn_start dn_end ) till 93 month.

Steelers_In_DC
Barite | Level 11

This might be what you want.  I'm guessing there is a more elegant way to put it all together but this is what I think you are asking for.  If all the variables aren't there you should be able to copy/paste and fill in the blanks, this should be a good start:

data have;

infile cards;

informat information_date date9.;

format information_date date9.;

input id_acc $ account_number $ account_status  $ information_date;

cards;

1111    999585548           AAA                  30Jun2007

1111    999585548           AAA                  31July2007

1111    999585544           D1A                  31Aug2007

1111    999585547           D2A                  30Sep2007

1111    999585540           D1D                  31Oct2007

1111    999585544           AAA                  30Nov2007

1111    999585547           AAA                  31Dec2007

1111    999585540           D1D                  31Jan2008

1111    999585544           D2D                  28Feb2008

1111    999585547           D3D                  31Mar2008

1111    999585540           D4D                  30Apr2008

1111    999585540           D5D                  31May2008

1111    999585540           AAA                  30Jun2008

1111    999585540           AAA                  31Jul2008

1111    999595540           D1D                  31Aug2008

1111    999587540           LEP                  31Oct2008

1111    999585540           LEP                  30Nov2008

1112    999505548           AAA                  30Jun2007

1112    999505548           AAA                  31July2007

1112    999505544           D1A                  31Aug2007

1112    999505547           D2A                  30Sep2007

1112    999505540           D1D                  31Oct2007

1112    999505544           AAA                  30Nov2007

1112    999505547           AAA                  31Dec2007

1112    999505540           D1D                  31Jan2008

1112    999505544           D2D                  28Feb2008

1112    999505547           D3D                  31Mar2008

1112    999505540           LEP                  30Apr2008

1112    999505540           LEP                  31May2008

1112    999505540           LEP                  30Jun2008

1112    999505540           LEP                  31Jul2008

1112    999505540           LEP                  31Aug2008

1112    999505540           LEP                  31Oct2008

1112    999505540           LEP                  30Nov2008

1113    990005548           AAA                  30Jun2007

1113    995005548           AAA                  31July2007

1113    995050544           D1A                  31Aug2007

1113    999500547           D2A                  30Sep2007

1113    999500540           AAA                  31Oct2007

1113    999500544           D1A                  30Nov2007

1113    999580547           AAA                  31Dec2007

1113    999580540           AAA                  31Jan2008

1113    999580544           D1A                  28Feb2008

1113    999580547           D2A                  31Mar2008

1113    999580540           AAA                  30Apr2008

1113    999580540           D2D                  31May2008

1113    999585540           D4D                  30Jun2008

1113    999585540           D5D                  31Jul2008

1113    999585540           AAA                  31Aug2008

1113    999585540           AAA                  31Oct2008

1113    999585540           LEP                  30Nov2008

1113    999585540           LEP                  31DEC2008

;

run;

proc sort data=have;by id_acc account_status information_date;

data prep;

length new_acc $2.;

set have;

new_acc = substr(account_status,1,2);

run;

data want all;

format d1_start d1_end d2_start d2_end d3_start d3_end mmddyy10.;

retain d1_start d2_start d3_start;

set prep;

by new_acc notsorted;

if id_acc = '1111' then do;

if first.new_acc and new_acc = 'D1' then D1_Start = information_date;

if last.new_acc  and new_acc = 'D1' then D1_End = information_date;

if first.new_acc and new_acc = 'D2' then D2_Start = information_date;

if last.new_acc  and new_acc = 'D2' then D2_End = information_date;

if first.new_acc and new_acc = 'D3' then D3_Start = information_date;

if last.new_acc  and new_acc = 'D3' then D3_End = information_date;

end;

if id_acc = '1112' then do;

if first.new_acc and new_acc = 'D1' then D1_Start = information_date;

if last.new_acc  and new_acc = 'D1' then D1_End = information_date;

if first.new_acc and new_acc = 'D2' then D2_Start = information_date;

if last.new_acc  and new_acc = 'D2' then D2_End = information_date;

if first.new_acc and new_acc = 'D3' then D3_Start = information_date;

if last.new_acc  and new_acc = 'D3' then D3_End = information_date;

end;

if not missing(d1_end) then d1_diff = intck('days',d1_start,d1_end);

if not missing(d2_end) then d1_diff = intck('days',d2_start,d2_end);

if not missing(d3_end) then d1_diff = intck('days',d3_start,d3_end);

output all;

if not missing(d1_end) or not missing(d2_end) or not missing(d3_end) then output want;

run;

pankak
Calcite | Level 5

Hi Mark,

I have around 5L id_acc and it is impossible to put condition for all.

Thanks for your help.

Steelers_In_DC
Barite | Level 11

Here is a solution with a few problems listed at the bottom:

data have;

infile cards;

informat information_date date9.;

format information_date date9.;

input id_acc $ account_number $ account_status  $ information_date;

cards;

1111    999585548           AAA                  30Jun2007

1111    999585548           AAA                  31July2007

1111    999585544           D1A                  31Aug2007

1111    999585547           D2A                  30Sep2007

1111    999585540           D1D                  31Oct2007

1111    999585544           AAA                  30Nov2007

1111    999585547           AAA                  31Dec2007

1111    999585540           D1D                  31Jan2008

1111    999585544           D2D                  28Feb2008

1111    999585547           D3D                  31Mar2008

1111    999585540           D4D                  30Apr2008

1111    999585540           D5D                  31May2008

1111    999585540           AAA                  30Jun2008

1111    999585540           AAA                  31Jul2008

1111    999595540           D1D                  31Aug2008

1111    999587540           LEP                  31Oct2008

1111    999585540           LEP                  30Nov2008

1112    999505548           AAA                  30Jun2007

1112    999505548           AAA                  31July2007

1112    999505544           D1A                  31Aug2007

1112    999505547           D2A                  30Sep2007

1112    999505540           D1D                  31Oct2007

1112    999505544           AAA                  30Nov2007

1112    999505547           AAA                  31Dec2007

1112    999505540           D1D                  31Jan2008

1112    999505544           D2D                  28Feb2008

1112    999505547           D3D                  31Mar2008

1112    999505540           LEP                  30Apr2008

1112    999505540           LEP                  31May2008

1112    999505540           LEP                  30Jun2008

1112    999505540           LEP                  31Jul2008

1112    999505540           LEP                  31Aug2008

1112    999505540           LEP                  31Oct2008

1112    999505540           LEP                  30Nov2008

1113    990005548           AAA                  30Jun2007

1113    995005548           AAA                  31July2007

1113    995050544           D1A                  31Aug2007

1113    999500547           D2A                  30Sep2007

1113    999500540           AAA                  31Oct2007

1113    999500544           D1A                  30Nov2007

1113    999580547           AAA                  31Dec2007

1113    999580540           AAA                  31Jan2008

1113    999580544           D1A                  28Feb2008

1113    999580547           D2A                  31Mar2008

1113    999580540           AAA                  30Apr2008

1113    999580540           D2D                  31May2008

1113    999585540           D4D                  30Jun2008

1113    999585540           D5D                  31Jul2008

1113    999585540           AAA                  31Aug2008

1113    999585540           AAA                  31Oct2008

1113    999585540           LEP                  30Nov2008

1113    999585540           LEP                  31DEC2008

;

run;

proc sort data=have;by id_acc account_status information_date;

data prep;

length new_acc $2.;

set have;

new_acc = substr(account_status,1,2);

run;

data want all;

format start_date end_date mmddyy10.;

retain start_date ;

set prep;

by new_acc notsorted;

    if first.new_acc then start_date = information_date;

    if last.new_acc then end_date = information_date;

if not missing(end_date) and new_acc ne 'AA' then do;

    date_diff = intck('days',start_date,end_date);

end;

output all;

if not missing(end_date) then output want;

run;

proc sort data=want;by id_acc start_date end_date;

proc transpose data=want out=trans_want;*(drop=_name_);by id_acc account_status;var start_date end_date date_diff;run;

proc sort data=trans_want;by id_acc _name_;

proc transpose data=trans_want out=trans_want2(drop=_name_ );by id_acc;id account_status _name_;var col1;run;

/*

The date_diff and date fields are in the same column after the first transpose, therefore they are the same format and the mmddyy10. format is lost.  The columns are out of order compared to your solution, you can do a proc sql and select the columns in the order you want them after this step, if you do that you can also add whatever formatting you want for the dates

*/

pankak
Calcite | Level 5

Hi Mark,

Thanks for Help..I guess it might help me..

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1178 views
  • 0 likes
  • 2 in conversation