BookmarkSubscribeRSS Feed
TRIPLEXXX
Calcite | Level 5

I have a dataset in a particualer order and with a sequence no per group, the groups are grouped by an account and I have other varibels which describe the customer as it journeys through a process.

My customer would enter the system and a have a product, the customer can then move from product to poduct over their life, stamping it with a date of the move and the product type.  the dataset is as follows

AccountNo       change_date         ProdCode             Prod_id               Seq_no

1900001           2003-04-14           R01                          2                          1

1900001           2003-07-14           R01                          2                          2

1900001           2006-01-16           R01                          2                          3

1900001           2006-04-18           R01                          2                          4

1550002           2006-07-24           XR5                          1                          1

1550002           2006-09-15           TG7                          1                          2

1550002           2006-11-17           R01                          2                          3

1550002           2007-01-19           R01                          2                          4

What I would like to do firstly is drop all data where the product code does not change for the acocunt no (the prod_id is a bigger bucket for the prod code, i.e. 500 ProdCodes can be bucketed into two ids 1 and 2)

I would then like to be left with only data that has different prod codes (or id's) that have differences in there lifetime (for seq_no 1 to x), 1900001 would be droppped as it contains static codes

Then I would like to be able to find where the prod id follows the exact prod_id sequence of 2 then 1 then 2 (account no 155002 is an example above of this sequence)

I am completely open to any alternitive methods of getting this data, any help would be greatly appreciated!!

thanks

2 REPLIES 2
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Part one could be done with SQL pretty easily:

proc sql;

     select      A.*

     from          HAVE A

     left join      (select distinct ACCOUNT_ID,count(distinct PROD_ID) as CNT from HAVE group by ACCOUNT_ID) B

     on               A.ACCOUNT_ID=B.ACCOUNT_ID

     where        CNT > 1;

quit;

For pattern matching, you could try retain statement (or lag):

data want;

     set have;

     retain lstprod_id;

     by account_no;

     if first.account_no then lstprod_id=prod_id;

     if lstprod_id=1 and prod_id=2 then ...;

run;

Tom
Super User Tom
Super User

So the last account only has one transition, from 1 to 2.  Unless you also want to include the first record?

Here is a method using BY processing.  You could do the first step as a VIEW if the data is large.

data have ;

  length AccountNo $7 Seq_no 8 change_date $10 prodcode $3 prod_id 8 ;

  input AccountNo change_date ProdCode  Prod_id Seq_no;

cards;

1900001 2003-04-14 R01 2 1

1900001 2003-07-14 R01 2 2

1900001 2006-01-16 R01 2 3

1900001 2006-04-18 R01 2 4

1550002 2006-07-24 XR5 1 1

1550002 2006-09-15 TG7 1 2

1550002 2006-11-17 R01 2 3

1550002 2007-01-19 R01 2 4

run;

data step1 ;

  set have ;

  by accountno prodcode notsorted ;

  if first.prodcode;

run;

data want;

  set step1 ;

  by accountno prod_id notsorted ;

  if first.accountno and last.accountno then delete;

  prevcode = lag(prodcode);

  if first.prod_id and not first.accountno ;

  put (_all_) (=);

run;


AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7


data want;

  set step1 ;

  by accountno prod_id notsorted ;

  if first.accountno and last.accountno then delete;

  prevcode = lag(prodcode);

  if first.accountno then call missing(prevcode);

  if first.prod_id ;

  put (_all_) (=);

run;


AccountNo=1550002 Seq_no=1 change_date=2006-07-24 prodcode=XR5 prod_id=1 prevcode=

AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 887 views
  • 0 likes
  • 3 in conversation