I have a dataset in a particualer order and with a sequence no per group, the groups are grouped by an account and I have other varibels which describe the customer as it journeys through a process.
My customer would enter the system and a have a product, the customer can then move from product to poduct over their life, stamping it with a date of the move and the product type. the dataset is as follows
AccountNo change_date ProdCode Prod_id Seq_no
1900001 2003-04-14 R01 2 1
1900001 2003-07-14 R01 2 2
1900001 2006-01-16 R01 2 3
1900001 2006-04-18 R01 2 4
1550002 2006-07-24 XR5 1 1
1550002 2006-09-15 TG7 1 2
1550002 2006-11-17 R01 2 3
1550002 2007-01-19 R01 2 4
What I would like to do firstly is drop all data where the product code does not change for the acocunt no (the prod_id is a bigger bucket for the prod code, i.e. 500 ProdCodes can be bucketed into two ids 1 and 2)
I would then like to be left with only data that has different prod codes (or id's) that have differences in there lifetime (for seq_no 1 to x), 1900001 would be droppped as it contains static codes
Then I would like to be able to find where the prod id follows the exact prod_id sequence of 2 then 1 then 2 (account no 155002 is an example above of this sequence)
I am completely open to any alternitive methods of getting this data, any help would be greatly appreciated!!
thanks
Part one could be done with SQL pretty easily:
proc sql;
select A.*
from HAVE A
left join (select distinct ACCOUNT_ID,count(distinct PROD_ID) as CNT from HAVE group by ACCOUNT_ID) B
on A.ACCOUNT_ID=B.ACCOUNT_ID
where CNT > 1;
quit;
For pattern matching, you could try retain statement (or lag):
data want;
set have;
retain lstprod_id;
by account_no;
if first.account_no then lstprod_id=prod_id;
if lstprod_id=1 and prod_id=2 then ...;
run;
So the last account only has one transition, from 1 to 2. Unless you also want to include the first record?
Here is a method using BY processing. You could do the first step as a VIEW if the data is large.
data have ;
length AccountNo $7 Seq_no 8 change_date $10 prodcode $3 prod_id 8 ;
input AccountNo change_date ProdCode Prod_id Seq_no;
cards;
1900001 2003-04-14 R01 2 1
1900001 2003-07-14 R01 2 2
1900001 2006-01-16 R01 2 3
1900001 2006-04-18 R01 2 4
1550002 2006-07-24 XR5 1 1
1550002 2006-09-15 TG7 1 2
1550002 2006-11-17 R01 2 3
1550002 2007-01-19 R01 2 4
run;
data step1 ;
set have ;
by accountno prodcode notsorted ;
if first.prodcode;
run;
data want;
set step1 ;
by accountno prod_id notsorted ;
if first.accountno and last.accountno then delete;
prevcode = lag(prodcode);
if first.prod_id and not first.accountno ;
put (_all_) (=);
run;
AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7
data want;
set step1 ;
by accountno prod_id notsorted ;
if first.accountno and last.accountno then delete;
prevcode = lag(prodcode);
if first.accountno then call missing(prevcode);
if first.prod_id ;
put (_all_) (=);
run;
AccountNo=1550002 Seq_no=1 change_date=2006-07-24 prodcode=XR5 prod_id=1 prevcode=
AccountNo=1550002 Seq_no=3 change_date=2006-11-17 prodcode=R01 prod_id=2 prevcode=TG7
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.