Iterate through variables to identify a change

TRIPLEXXX · Posted 07-11-2014 09:56 AM

I have a dataset in a particualer order and with a sequence no per group, the groups are grouped by an account and I have other varibels which describe the customer as it journeys through a process.

My customer would enter the system and a have a product, the customer can then move from product to poduct over their life, stamping it with a date of the move and the product type. the dataset is as follows

AccountNo change_date ProdCode Prod_id Seq_no

1900001 2003-04-14 R01 2 1

1900001 2003-07-14 R01 2 2

1900001 2006-01-16 R01 2 3

1900001 2006-04-18 R01 2 4

1550002 2006-07-24 XR5 1 1

1550002 2006-09-15 TG7 1 2

1550002 2006-11-17 R01 2 3

1550002 2007-01-19 R01 2 4

What I would like to do firstly is drop all data where the product code does not change for the acocunt no (the prod_id is a bigger bucket for the prod code, i.e. 500 ProdCodes can be bucketed into two ids 1 and 2)

I would then like to be left with only data that has different prod codes (or id's) that have differences in there lifetime (for seq_no 1 to x), 1900001 would be droppped as it contains static codes

Then I would like to be able to find where the prod id follows the exact prod_id sequence of 2 then 1 then 2 (account no 155002 is an example above of this sequence)

I am completely open to any alternitive methods of getting this data, any help would be greatly appreciated!!

thanks

RW9 · Posted 07-11-2014 10:11 AM

Part one could be done with SQL pretty easily:

proc sql;

select A.*

from HAVE A

left join (select distinct ACCOUNT_ID,count(distinct PROD_ID) as CNT from HAVE group by ACCOUNT_ID) B

on A.ACCOUNT_ID=B.ACCOUNT_ID

where CNT > 1;

quit;

For pattern matching, you could try retain statement (or lag):

data want;

set have;

retain lstprod_id;

by account_no;

if first.account_no then lstprod_id=prod_id;

if lstprod_id=1 and prod_id=2 then ...;

run;

Tom · Posted 07-11-2014 01:23 PM

So the last account only has one transition, from 1 to 2. Unless you also want to include the first record?

Here is a method using BY processing. You could do the first step as a VIEW if the data is large.

data have ;

length AccountNo $7 Seq_no 8 change_date $10 prodcode $3 prod_id 8 ;

input AccountNo change_date ProdCode Prod_id Seq_no;

cards;

1900001 2003-04-14 R01 2 1

1900001 2003-07-14 R01 2 2

1900001 2006-01-16 R01 2 3

1900001 2006-04-18 R01 2 4

1550002 2006-07-24 XR5 1 1

1550002 2006-09-15 TG7 1 2

1550002 2006-11-17 R01 2 3

1550002 2007-01-19 R01 2 4

run;

data step1 ;

set have ;

by accountno prodcode notsorted ;

if first.prodcode;

run;

data want;

set step1 ;

by accountno prod_id notsorted ;

if first.accountno and last.accountno then delete;

prevcode = lag(prodcode);

if first.prod_id and not first.accountno ;

put (_all_) (=);

run;

AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7

data want;

set step1 ;

by accountno prod_id notsorted ;

if first.accountno and last.accountno then delete;

prevcode = lag(prodcode);

if first.accountno then call missing(prevcode);

if first.prod_id ;

put (_all_) (=);

run;

AccountNo=1550002 Seq_no=1 change_date=2006-07-24 prodcode=XR5 prod_id=1 prevcode=

AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7

Iterate through variables to identify a change

Re: Iterate through variables to identify a change

Re: Iterate through variables to identify a change

Iterate through variables to identify a change

Re: Iterate through variables to identify a change

Re: Iterate through variables to identify a change

Click image to register for webinar

Classroom Training Available!