Hi SAS community,
Hoping to get some help on a problem I am having.
DATA:
CX_KEY DATE ACTIVITY CASHIER
1 JAN-1 INSTORE_PURCHASE JANET
1 JAN-2 ONLINE_PURCHASE .COM
1 JAN-10 REFUND JANET
2 JAN-5 ONLINE_PURCHASE .COM
3 JAN-7 ONLINE_BROWSE .COM
3 JAN-9 INSTORE_PURCHASE BOB
3 FEB-3 REFUND JANET
GOAL:
CX_KEY DATE ORIG_ACT ACTIVITY_STR
1 JAN-1 INSTORE_PURCHASE
1 JAN-2 ONLINE_PURCHASE INSTORE_PURCHASE
1 JAN-10 REFUND ONLINE_PURCHASE, INSTORE_PURCHASE
2 JAN-5 ONLINE_PURCHASE
3 JAN-7 ONLINE_BROWSE
3 JAN-9 INSTORE_PURCHASE ONLINE_BROWSE
3 FEB-3 REFUND INSTORE_PURCHASE,ONLINE_BROWSE
So essentially what I want to do is to be able to create a string that will have in a csv format of the corresponding activities that occurred for the customer before the date of the activity. So as an example, for CX_KEY = 1 , on Jan 10th customer came in for a refund, and the activities that led up the refund were an online_purchase and an instore_purchase.
If anyone could help me with this problem, that would be great!
I am using SAS enterprise guide 7.12
data have;
input CX_KEY DATE $ ACTIVITY :$50. CASHIER $;
cards;
1 JAN-1 INSTORE_PURCHASE JANET
1 JAN-2 ONLINE_PURCHASE .COM
1 JAN-10 REFUND JANET
2 JAN-5 ONLINE_PURCHASE .COM
3 JAN-7 ONLINE_BROWSE .COM
3 JAN-9 INSTORE_PURCHASE BOB
3 FEB-3 REFUND JANET
;
data want;
set have;
by cx_key;
retain ACTIVITY_STR ;
length ACTIVITY_STR $100;
ACTIVITY_STR =ifc(not first.cx_key,catx(',',lag(activity),activity_str),' ');
drop cashier;
run;
data have;
input CX_KEY DATE $ ACTIVITY :$50. CASHIER $;
cards;
1 JAN-1 INSTORE_PURCHASE JANET
1 JAN-2 ONLINE_PURCHASE .COM
1 JAN-10 REFUND JANET
2 JAN-5 ONLINE_PURCHASE .COM
3 JAN-7 ONLINE_BROWSE .COM
3 JAN-9 INSTORE_PURCHASE BOB
3 FEB-3 REFUND JANET
;
data want;
set have;
by cx_key;
retain ACTIVITY_STR ;
length ACTIVITY_STR $100;
ACTIVITY_STR =ifc(not first.cx_key,catx(',',lag(activity),activity_str),' ');
drop cashier;
run;
Oh wow thank you so much Novinsrin! I believe this is exactly what I was looking for. Thanks for helping out the newbie like myself.
Quick question, does the lag function only work properly if my data was sorted properly?
The above solution indeed assumes your data is a good representative of your real. The LAG function per se doesn't depend on the data to be sorted however in your case yes it has to be. The reason being Lag returns values from a queue or in other words the previous N execution wherein N=1. So here the previous execution requries the expected values to be in order.
Lag can be return 1 to N lags i.e lag(x) is one execution previous stored in memory and lag2(x) returns results of previous 2 executions and so forth. So this independent functionality of LAG shouldn't be confused with a sort, albeit we certainly require the queues to be in an order that satisfies our needs.
This makes perfect sense, and I appreciate all the help!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.