I have a dataset in a particualer order and with a sequence no per group, the groups are grouped by an account and I have other varibels which describe the customer as it journeys through a process.
My customer would enter the system and a have a product, the customer can then move from product to poduct over their life, stamping it with a date of the move and the product type. the dataset is as follows
AccountNo change_date ProdCode Prod_id Seq_no
1900001 2003-04-14 R01 2 1
1900001 2003-07-14 R01 2 2
1900001 2006-01-16 R01 2 3
1900001 2006-04-18 R01 2 4
1550002 2006-07-24 XR5 1 1
1550002 2006-09-15 TG7 1 2
1550002 2006-11-17 R01 2 3
1550002 2007-01-19 R01 2 4
What I would like to do firstly is drop all data where the product code does not change for the acocunt no (the prod_id is a bigger bucket for the prod code, i.e. 500 ProdCodes can be bucketed into two ids 1 and 2)
I would then like to be left with only data that has different prod codes (or id's) that have differences in there lifetime (for seq_no 1 to x), 1900001 would be droppped as it contains static codes
Then I would like to be able to find where the prod id follows the exact prod_id sequence of 2 then 1 then 2 (account no 155002 is an example above of this sequence)
I am completely open to any alternitive methods of getting this data, any help would be greatly appreciated!!
thanks
Part one could be done with SQL pretty easily:
proc sql;
select A.*
from HAVE A
left join (select distinct ACCOUNT_ID,count(distinct PROD_ID) as CNT from HAVE group by ACCOUNT_ID) B
on A.ACCOUNT_ID=B.ACCOUNT_ID
where CNT > 1;
quit;
For pattern matching, you could try retain statement (or lag):
data want;
set have;
retain lstprod_id;
by account_no;
if first.account_no then lstprod_id=prod_id;
if lstprod_id=1 and prod_id=2 then ...;
run;
So the last account only has one transition, from 1 to 2. Unless you also want to include the first record?
Here is a method using BY processing. You could do the first step as a VIEW if the data is large.
data have ;
length AccountNo $7 Seq_no 8 change_date $10 prodcode $3 prod_id 8 ;
input AccountNo change_date ProdCode Prod_id Seq_no;
cards;
1900001 2003-04-14 R01 2 1
1900001 2003-07-14 R01 2 2
1900001 2006-01-16 R01 2 3
1900001 2006-04-18 R01 2 4
1550002 2006-07-24 XR5 1 1
1550002 2006-09-15 TG7 1 2
1550002 2006-11-17 R01 2 3
1550002 2007-01-19 R01 2 4
run;
data step1 ;
set have ;
by accountno prodcode notsorted ;
if first.prodcode;
run;
data want;
set step1 ;
by accountno prod_id notsorted ;
if first.accountno and last.accountno then delete;
prevcode = lag(prodcode);
if first.prod_id and not first.accountno ;
put (_all_) (=);
run;
AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7
data want;
set step1 ;
by accountno prod_id notsorted ;
if first.accountno and last.accountno then delete;
prevcode = lag(prodcode);
if first.accountno then call missing(prevcode);
if first.prod_id ;
put (_all_) (=);
run;
AccountNo=1550002 Seq_no=1 change_date=2006-07-24 prodcode=XR5 prod_id=1 prevcode=
AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.