Hi,
I would like to write sas code (proc-sql). The question is how to create variable VP_Trans (in the table below). The table below is referring to the customers transactions within some period.
I would like to calculate how many VP transactions the customer had based on his last 5 transactions. 5th transaction included
e.g. Customer 1 had 5 VP transactions calculating from transactions happened between 1.2.-11.2
+---------+-------------+-----------+----------+
| cust id | Trans Dates | TransType | Vp_Trans |
+---------+-------------+-----------+----------+
| 1 | 2015-02-01 | VP | NA |
| 1 | 2015-02-03 | Vp | NA |
| 1 | 2015-02-06 | VP | NA |
| 1 | 2015-02-06 | Vp | NA |
| 1 | 2015-02-11 | Vp | 5 |
| 1 | 2015-02-13 | MG | 4 |
| 1 | 2015-02-17 | Vp | 4 |
| 1 | 2015-02-18 | mg | 3 |
| 1 | 2015-02-19 | mg | 2 |
| 1 | 2015-02-20 | mg | 1 |
| 1 | 2015-02-21 | vp | 2 |
| 2 | 2015-02-01 | VP | NA |
| 2 | 2015-02-03 | mg | NA |
| 2 | 2015-02-06 | mg | NA |
| 2 | 2015-02-06 | Vp | NA |
| 2 | 2015-02-11 | Vp | 3 |
| 2 | 2015-02-13 | MG | 2 |
| 2 | 2015-02-17 | Vp | 3 |
| 2 | 2015-02-18 | mg | 3 |
| 2 | 2015-02-19 | mg | 2 |
| 2 | 2015-02-20 | mg | 1 |
| 2 | 2015-02-21 | mg | 1 |
| 2 | 2015-02-22 | mg | 0 |
+---------+-------------+-----------+----------+
Hi @Vendy,
Here's another data step approach (which I think is preferable over PROC SQL for this task), which is suitable if your data contain more than one customer:
data have;
input id date :yymmdd. type :$2.;
format date yymmdd10.;
cards;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
1 2015-02-19 mg
1 2015-02-20 mg
1 2015-02-21 vp
2 2015-02-01 VP
2 2015-02-03 mg
2 2015-02-06 mg
2 2015-02-06 Vp
2 2015-02-11 Vp
2 2015-02-13 MG
2 2015-02-17 Vp
2 2015-02-18 mg
2 2015-02-19 mg
2 2015-02-20 mg
2 2015-02-21 mg
2 2015-02-22 mg
;
proc format;
value VpTfmt
.='NA';
run;
data want;
array tt[4] $2 _temporary_;
set have;
by id date;
if first.id then c=1;
else c+1;
tt[1]=lag1(upcase(type));
tt[2]=lag2(upcase(type));
tt[3]=lag3(upcase(type));
tt[4]=lag4(upcase(type));
if c>=5 then Vp_Trans=sum(tt[1]='VP',tt[2]='VP',tt[3]='VP',tt[4]='VP',upcase(type)='VP');
drop c;
format Vp_Trans VpTfmt.;
run;
proc print data=want;
run;
Edit: Formatted date with YYMMDD10.
Hi.
It would be a good idea to put data in the form of a datastep so I don't have to type it all out.
Minor correction, using functions is simpler and more efficient:
data have;
input cust_id Trans_Dates $ TransType $;
datalines;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
;
run;
data want;
set have;
length tmp $50;
tmp=upcase(catx(',',lag1(transtype),lag2(transtype),lag3(transtype),lag4(transtype),lag5(transtype)));
if countc(tmp,",") < 4 then vp_trans="NA";
else vp_trans=strip(put(countw(tmp,"VP"),best.));
run;
The code below shows you have to use lag() function to get previous rows data, its then a simple matter of if conditions to get the sum (note I have handled the fewer than 5 results, but you should be able to work it out as lag5() is missing then set to NA.
data have; input cust_id Trans_Dates $ TransType $; datalines; 1 2015-02-01 VP 1 2015-02-03 Vp 1 2015-02-06 VP 1 2015-02-06 Vp 1 2015-02-11 Vp 1 2015-02-13 MG 1 2015-02-17 Vp 1 2015-02-18 mg ; run; data want; set have; vp_trans=0; if upcase(lag1(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag2(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag3(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag4(transtype))="VP" then vp_trans=sum(vp_trans,1); if upcase(lag5(transtype))="VP" then vp_trans=sum(vp_trans,1); run;
Well, bit more complicated, could condense this code down about with some thinking, but this should work:
data want;
set have;
length vp_trans $5;
array lst{5} $2.;
retain lst1-lst5 curr_id;
by cust_id;
if first.cust_id then do;
vp_trans="";
call missing(of lst{*});
curr_id=1;
end;
lst{curr_id}=upcase(transtype);
curr_id=curr_id+1;
if curr_id > 5 then curr_id=1;
if cmiss(of lst{*}) then vp_trans="NA";
else vp_trans=strip(put(countw(catx(",",of lst{*}),"VP"),best.));
run;
Hi @Vendy,
Here's another data step approach (which I think is preferable over PROC SQL for this task), which is suitable if your data contain more than one customer:
data have;
input id date :yymmdd. type :$2.;
format date yymmdd10.;
cards;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
1 2015-02-19 mg
1 2015-02-20 mg
1 2015-02-21 vp
2 2015-02-01 VP
2 2015-02-03 mg
2 2015-02-06 mg
2 2015-02-06 Vp
2 2015-02-11 Vp
2 2015-02-13 MG
2 2015-02-17 Vp
2 2015-02-18 mg
2 2015-02-19 mg
2 2015-02-20 mg
2 2015-02-21 mg
2 2015-02-22 mg
;
proc format;
value VpTfmt
.='NA';
run;
data want;
array tt[4] $2 _temporary_;
set have;
by id date;
if first.id then c=1;
else c+1;
tt[1]=lag1(upcase(type));
tt[2]=lag2(upcase(type));
tt[3]=lag3(upcase(type));
tt[4]=lag4(upcase(type));
if c>=5 then Vp_Trans=sum(tt[1]='VP',tt[2]='VP',tt[3]='VP',tt[4]='VP',upcase(type)='VP');
drop c;
format Vp_Trans VpTfmt.;
run;
proc print data=want;
run;
Edit: Formatted date with YYMMDD10.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.