DATA Step, Macro, Functions and more

Iterate through variables to identify a change

Reply
New Contributor
Posts: 2

Iterate through variables to identify a change

I have a dataset in a particualer order and with a sequence no per group, the groups are grouped by an account and I have other varibels which describe the customer as it journeys through a process.

My customer would enter the system and a have a product, the customer can then move from product to poduct over their life, stamping it with a date of the move and the product type.  the dataset is as follows

AccountNo       change_date         ProdCode             Prod_id               Seq_no

1900001           2003-04-14           R01                          2                          1

1900001           2003-07-14           R01                          2                          2

1900001           2006-01-16           R01                          2                          3

1900001           2006-04-18           R01                          2                          4

1550002           2006-07-24           XR5                          1                          1

1550002           2006-09-15           TG7                          1                          2

1550002           2006-11-17           R01                          2                          3

1550002           2007-01-19           R01                          2                          4

What I would like to do firstly is drop all data where the product code does not change for the acocunt no (the prod_id is a bigger bucket for the prod code, i.e. 500 ProdCodes can be bucketed into two ids 1 and 2)

I would then like to be left with only data that has different prod codes (or id's) that have differences in there lifetime (for seq_no 1 to x), 1900001 would be droppped as it contains static codes

Then I would like to be able to find where the prod id follows the exact prod_id sequence of 2 then 1 then 2 (account no 155002 is an example above of this sequence)

I am completely open to any alternitive methods of getting this data, any help would be greatly appreciated!!

thanks

Super User
Super User
Posts: 7,392

Re: Iterate through variables to identify a change

Part one could be done with SQL pretty easily:

proc sql;

     select      A.*

     from          HAVE A

     left join      (select distinct ACCOUNT_ID,count(distinct PROD_ID) as CNT from HAVE group by ACCOUNT_ID) B

     on               A.ACCOUNT_ID=B.ACCOUNT_ID

     where        CNT > 1;

quit;

For pattern matching, you could try retain statement (or lag):

data want;

     set have;

     retain lstprod_id;

     by account_no;

     if first.account_no then lstprod_id=prod_id;

     if lstprod_id=1 and prod_id=2 then ...;

run;

Super User
Super User
Posts: 6,498

Re: Iterate through variables to identify a change

So the last account only has one transition, from 1 to 2.  Unless you also want to include the first record?

Here is a method using BY processing.  You could do the first step as a VIEW if the data is large.

data have ;

  length AccountNo $7 Seq_no 8 change_date $10 prodcode $3 prod_id 8 ;

  input AccountNo change_date ProdCode  Prod_id Seq_no;

cards;

1900001 2003-04-14 R01 2 1

1900001 2003-07-14 R01 2 2

1900001 2006-01-16 R01 2 3

1900001 2006-04-18 R01 2 4

1550002 2006-07-24 XR5 1 1

1550002 2006-09-15 TG7 1 2

1550002 2006-11-17 R01 2 3

1550002 2007-01-19 R01 2 4

run;

data step1 ;

  set have ;

  by accountno prodcode notsorted ;

  if first.prodcode;

run;

data want;

  set step1 ;

  by accountno prod_id notsorted ;

  if first.accountno and last.accountno then delete;

  prevcode = lag(prodcode);

  if first.prod_id and not first.accountno ;

  put (_all_) (=);

run;


AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7


data want;

  set step1 ;

  by accountno prod_id notsorted ;

  if first.accountno and last.accountno then delete;

  prevcode = lag(prodcode);

  if first.accountno then call missing(prevcode);

  if first.prod_id ;

  put (_all_) (=);

run;


AccountNo=1550002 Seq_no=1 change_date=2006-07-24 prodcode=XR5 prod_id=1 prevcode=

AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7

Ask a Question
Discussion stats
  • 2 replies
  • 209 views
  • 0 likes
  • 3 in conversation