BookmarkSubscribeRSS Feed
TRIPLEXXX
Calcite | Level 5

I have a dataset in a particualer order and with a sequence no per group, the groups are grouped by an account and I have other varibels which describe the customer as it journeys through a process.

My customer would enter the system and a have a product, the customer can then move from product to poduct over their life, stamping it with a date of the move and the product type.  the dataset is as follows

AccountNo       change_date         ProdCode             Prod_id               Seq_no

1900001           2003-04-14           R01                          2                          1

1900001           2003-07-14           R01                          2                          2

1900001           2006-01-16           R01                          2                          3

1900001           2006-04-18           R01                          2                          4

1550002           2006-07-24           XR5                          1                          1

1550002           2006-09-15           TG7                          1                          2

1550002           2006-11-17           R01                          2                          3

1550002           2007-01-19           R01                          2                          4

What I would like to do firstly is drop all data where the product code does not change for the acocunt no (the prod_id is a bigger bucket for the prod code, i.e. 500 ProdCodes can be bucketed into two ids 1 and 2)

I would then like to be left with only data that has different prod codes (or id's) that have differences in there lifetime (for seq_no 1 to x), 1900001 would be droppped as it contains static codes

Then I would like to be able to find where the prod id follows the exact prod_id sequence of 2 then 1 then 2 (account no 155002 is an example above of this sequence)

I am completely open to any alternitive methods of getting this data, any help would be greatly appreciated!!

thanks

2 REPLIES 2
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Part one could be done with SQL pretty easily:

proc sql;

     select      A.*

     from          HAVE A

     left join      (select distinct ACCOUNT_ID,count(distinct PROD_ID) as CNT from HAVE group by ACCOUNT_ID) B

     on               A.ACCOUNT_ID=B.ACCOUNT_ID

     where        CNT > 1;

quit;

For pattern matching, you could try retain statement (or lag):

data want;

     set have;

     retain lstprod_id;

     by account_no;

     if first.account_no then lstprod_id=prod_id;

     if lstprod_id=1 and prod_id=2 then ...;

run;

Tom
Super User Tom
Super User

So the last account only has one transition, from 1 to 2.  Unless you also want to include the first record?

Here is a method using BY processing.  You could do the first step as a VIEW if the data is large.

data have ;

  length AccountNo $7 Seq_no 8 change_date $10 prodcode $3 prod_id 8 ;

  input AccountNo change_date ProdCode  Prod_id Seq_no;

cards;

1900001 2003-04-14 R01 2 1

1900001 2003-07-14 R01 2 2

1900001 2006-01-16 R01 2 3

1900001 2006-04-18 R01 2 4

1550002 2006-07-24 XR5 1 1

1550002 2006-09-15 TG7 1 2

1550002 2006-11-17 R01 2 3

1550002 2007-01-19 R01 2 4

run;

data step1 ;

  set have ;

  by accountno prodcode notsorted ;

  if first.prodcode;

run;

data want;

  set step1 ;

  by accountno prod_id notsorted ;

  if first.accountno and last.accountno then delete;

  prevcode = lag(prodcode);

  if first.prod_id and not first.accountno ;

  put (_all_) (=);

run;


AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7


data want;

  set step1 ;

  by accountno prod_id notsorted ;

  if first.accountno and last.accountno then delete;

  prevcode = lag(prodcode);

  if first.accountno then call missing(prevcode);

  if first.prod_id ;

  put (_all_) (=);

run;


AccountNo=1550002 Seq_no=1 change_date=2006-07-24 prodcode=XR5 prod_id=1 prevcode=

AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 877 views
  • 0 likes
  • 3 in conversation