Hi,
I would like to write sas code (proc-sql). The question is how to create variable VP_Trans (in the table below). The table below is referring to the customers transactions within some period.
I would like to calculate how many VP transactions the customer had based on his last 5 transactions. 5th transaction included
e.g. Customer 1 had 5 VP transactions calculating from transactions happened between 1.2.-11.2
+---------+-------------+-----------+----------+ | cust id | Trans Dates | TransType | Vp_Trans | +---------+-------------+-----------+----------+ | 1 | 2015-02-01 | VP | NA | | 1 | 2015-02-03 | Vp | NA | | 1 | 2015-02-06 | VP | NA | | 1 | 2015-02-06 | Vp | NA | | 1 | 2015-02-11 | Vp | 5 | | 1 | 2015-02-13 | MG | 4 | | 1 | 2015-02-17 | Vp | 4 | | 1 | 2015-02-18 | mg | 3 | | 1 | 2015-02-19 | mg | 2 | | 1 | 2015-02-20 | mg | 1 | | 1 | 2015-02-21 | vp | 2 | | 2 | 2015-02-01 | VP | NA | | 2 | 2015-02-03 | mg | NA | | 2 | 2015-02-06 | mg | NA | | 2 | 2015-02-06 | Vp | NA | | 2 | 2015-02-11 | Vp | 3 | | 2 | 2015-02-13 | MG | 2 | | 2 | 2015-02-17 | Vp | 3 | | 2 | 2015-02-18 | mg | 3 | | 2 | 2015-02-19 | mg | 2 | | 2 | 2015-02-20 | mg | 1 | | 2 | 2015-02-21 | mg | 1 | | 2 | 2015-02-22 | mg | 0 | +---------+-------------+-----------+----------+
Hi @Vendy,
Here's another data step approach (which I think is preferable over PROC SQL for this task), which is suitable if your data contain more than one customer:
data have;
input id date :yymmdd. type :$2.;
format date yymmdd10.;
cards;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
1 2015-02-19 mg
1 2015-02-20 mg
1 2015-02-21 vp
2 2015-02-01 VP
2 2015-02-03 mg
2 2015-02-06 mg
2 2015-02-06 Vp
2 2015-02-11 Vp
2 2015-02-13 MG
2 2015-02-17 Vp
2 2015-02-18 mg
2 2015-02-19 mg
2 2015-02-20 mg
2 2015-02-21 mg
2 2015-02-22 mg
;
proc format;
value VpTfmt
.='NA';
run;
data want;
array tt[4] $2 _temporary_;
set have;
by id date;
if first.id then c=1;
else c+1;
tt[1]=lag1(upcase(type));
tt[2]=lag2(upcase(type));
tt[3]=lag3(upcase(type));
tt[4]=lag4(upcase(type));
if c>=5 then Vp_Trans=sum(tt[1]='VP',tt[2]='VP',tt[3]='VP',tt[4]='VP',upcase(type)='VP');
drop c;
format Vp_Trans VpTfmt.;
run;
proc print data=want;
run;
Edit: Formatted date with YYMMDD10.
Hi.
It would be a good idea to put data in the form of a datastep so I don't have to type it all out.
Minor correction, using functions is simpler and more efficient:
data have; input cust_id Trans_Dates $ TransType $; datalines; 1 2015-02-01 VP 1 2015-02-03 Vp 1 2015-02-06 VP 1 2015-02-06 Vp 1 2015-02-11 Vp 1 2015-02-13 MG 1 2015-02-17 Vp 1 2015-02-18 mg ; run; data want; set have; length tmp $50; tmp=upcase(catx(',',lag1(transtype),lag2(transtype),lag3(transtype),lag4(transtype),lag5(transtype))); if countc(tmp,",") < 4 then vp_trans="NA"; else vp_trans=strip(put(countw(tmp,"VP"),best.)); run;
The code below shows you have to use lag() function to get previous rows data, its then a simple matter of if conditions to get the sum (note I have handled the fewer than 5 results, but you should be able to work it out as lag5() is missing then set to NA.
data have; input cust_id Trans_Dates $ TransType $; datalines; 1 2015-02-01 VP 1 2015-02-03 Vp 1 2015-02-06 VP 1 2015-02-06 Vp 1 2015-02-11 Vp 1 2015-02-13 MG 1 2015-02-17 Vp 1 2015-02-18 mg ; run; data want; set have; vp_trans=0; if upcase(lag1(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag2(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag3(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag4(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag5(transtype))="VP" then vp_trans=sum(vp_trans,1); run;
Well, bit more complicated, could condense this code down about with some thinking, but this should work:
data want; set have; length vp_trans $5; array lst{5} $2.; retain lst1-lst5 curr_id; by cust_id; if first.cust_id then do; vp_trans=""; call missing(of lst{*}); curr_id=1; end; lst{curr_id}=upcase(transtype); curr_id=curr_id+1; if curr_id > 5 then curr_id=1; if cmiss(of lst{*}) then vp_trans="NA"; else vp_trans=strip(put(countw(catx(",",of lst{*}),"VP"),best.)); run;
Hi @Vendy,
Here's another data step approach (which I think is preferable over PROC SQL for this task), which is suitable if your data contain more than one customer:
data have;
input id date :yymmdd. type :$2.;
format date yymmdd10.;
cards;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
1 2015-02-19 mg
1 2015-02-20 mg
1 2015-02-21 vp
2 2015-02-01 VP
2 2015-02-03 mg
2 2015-02-06 mg
2 2015-02-06 Vp
2 2015-02-11 Vp
2 2015-02-13 MG
2 2015-02-17 Vp
2 2015-02-18 mg
2 2015-02-19 mg
2 2015-02-20 mg
2 2015-02-21 mg
2 2015-02-22 mg
;
proc format;
value VpTfmt
.='NA';
run;
data want;
array tt[4] $2 _temporary_;
set have;
by id date;
if first.id then c=1;
else c+1;
tt[1]=lag1(upcase(type));
tt[2]=lag2(upcase(type));
tt[3]=lag3(upcase(type));
tt[4]=lag4(upcase(type));
if c>=5 then Vp_Trans=sum(tt[1]='VP',tt[2]='VP',tt[3]='VP',tt[4]='VP',upcase(type)='VP');
drop c;
format Vp_Trans VpTfmt.;
run;
proc print data=want;
run;
Edit: Formatted date with YYMMDD10.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.