BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Vendy
Obsidian | Level 7


Hi,
I would like to write sas code (proc-sql). The question is how to create variable VP_Trans (in the table below). The table below is referring to the customers transactions within some period.
I would like to calculate how many VP transactions the customer had based on his last 5 transactions. 5th transaction included
e.g. Customer 1 had 5 VP transactions calculating from transactions happened between 1.2.-11.2

 

    +---------+-------------+-----------+----------+
    | cust id | Trans Dates | TransType | Vp_Trans |
    +---------+-------------+-----------+----------+
    |       1 | 2015-02-01  | VP        | NA       |
    |       1 | 2015-02-03  | Vp        | NA       |
    |       1 | 2015-02-06  | VP        | NA       |
    |       1 | 2015-02-06  | Vp        | NA       |
    |       1 | 2015-02-11  | Vp        | 5        |
    |       1 | 2015-02-13  | MG        | 4        |
    |       1 | 2015-02-17  | Vp        | 4        |
    |       1 | 2015-02-18  | mg        | 3        |
    |       1 | 2015-02-19  | mg        | 2        |
    |       1 | 2015-02-20  | mg        | 1        |
    |       1 | 2015-02-21  | vp        | 2        |
    |       2 | 2015-02-01  | VP        | NA       |
    |       2 | 2015-02-03  | mg        | NA       |
    |       2 | 2015-02-06  | mg        | NA       |
    |       2 | 2015-02-06  | Vp        | NA       |
    |       2 | 2015-02-11  | Vp        | 3        |
    |       2 | 2015-02-13  | MG        | 2        |
    |       2 | 2015-02-17  | Vp        | 3        |
    |       2 | 2015-02-18  | mg        | 3        |
    |       2 | 2015-02-19  | mg        | 2        |
    |       2 | 2015-02-20  | mg        | 1        |
    |       2 | 2015-02-21  | mg        | 1        |
    |       2 | 2015-02-22  | mg        | 0        |
    +---------+-------------+-----------+----------+
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Vendy,

 

Here's another data step approach (which I think is preferable over PROC SQL for this task), which is suitable if your data contain more than one customer:

data have;
input id date :yymmdd. type :$2.;
format date yymmdd10.;
cards;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
1 2015-02-19 mg
1 2015-02-20 mg
1 2015-02-21 vp
2 2015-02-01 VP
2 2015-02-03 mg
2 2015-02-06 mg
2 2015-02-06 Vp
2 2015-02-11 Vp
2 2015-02-13 MG
2 2015-02-17 Vp
2 2015-02-18 mg
2 2015-02-19 mg
2 2015-02-20 mg
2 2015-02-21 mg
2 2015-02-22 mg
;

proc format;
value VpTfmt
.='NA';
run;

data want;
array tt[4] $2 _temporary_;
set have;
by id date;
if first.id then c=1;
else c+1;
tt[1]=lag1(upcase(type));
tt[2]=lag2(upcase(type));
tt[3]=lag3(upcase(type));
tt[4]=lag4(upcase(type));
if c>=5 then Vp_Trans=sum(tt[1]='VP',tt[2]='VP',tt[3]='VP',tt[4]='VP',upcase(type)='VP');
drop c;
format Vp_Trans VpTfmt.;
run;

proc print data=want;
run;

Edit: Formatted date with YYMMDD10.

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi.

 

It would be a good idea to put data in the form of a datastep so I don't have to type it all out.  

Minor correction, using functions is simpler and more efficient:

 

data have;
  input cust_id Trans_Dates $ TransType $;
datalines;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
;
run;

data want;
  set have;
  length tmp $50;
  tmp=upcase(catx(',',lag1(transtype),lag2(transtype),lag3(transtype),lag4(transtype),lag5(transtype)));
  if countc(tmp,",") < 4 then vp_trans="NA";
  else vp_trans=strip(put(countw(tmp,"VP"),best.));
run;

 

 

The code below shows you have to use lag() function to get previous rows data, its then a simple matter of if conditions to get the sum (note I have handled the fewer than 5 results, but you should be able to work it out as lag5() is missing then set to NA.

data have;
  input cust_id Trans_Dates $ TransType $;
datalines;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
;
run;

data want;
  set have;
  vp_trans=0;
  if upcase(lag1(transtype))="VP" then vp_trans=sum(vp_trans,1);
  if upcase(lag2(transtype))="VP" then vp_trans=sum(vp_trans,1);
  if upcase(lag3(transtype))="VP" then vp_trans=sum(vp_trans,1);
  if upcase(lag4(transtype))="VP" then vp_trans=sum(vp_trans,1);
  if upcase(lag5(transtype))="VP" then vp_trans=sum(vp_trans,1);
run;

 

Vendy
Obsidian | Level 7
Thanks. It looks good. But we need to calculate for each customer separately...
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, bit more complicated, could condense this code down about with some thinking, but this should work:

data want;
  set have;
  length vp_trans $5;
  array lst{5} $2.;
  retain lst1-lst5 curr_id;
  by cust_id;
  if first.cust_id then do;
    vp_trans="";
    call missing(of lst{*});
    curr_id=1;
  end;
  lst{curr_id}=upcase(transtype);
  curr_id=curr_id+1;
  if curr_id > 5 then curr_id=1;
  if cmiss(of lst{*}) then vp_trans="NA";
  else vp_trans=strip(put(countw(catx(",",of lst{*}),"VP"),best.));
run;

 

Vendy
Obsidian | Level 7
Thank you very much.
FreelanceReinh
Jade | Level 19

Hi @Vendy,

 

Here's another data step approach (which I think is preferable over PROC SQL for this task), which is suitable if your data contain more than one customer:

data have;
input id date :yymmdd. type :$2.;
format date yymmdd10.;
cards;
1 2015-02-01 VP
1 2015-02-03 Vp
1 2015-02-06 VP
1 2015-02-06 Vp
1 2015-02-11 Vp
1 2015-02-13 MG
1 2015-02-17 Vp
1 2015-02-18 mg
1 2015-02-19 mg
1 2015-02-20 mg
1 2015-02-21 vp
2 2015-02-01 VP
2 2015-02-03 mg
2 2015-02-06 mg
2 2015-02-06 Vp
2 2015-02-11 Vp
2 2015-02-13 MG
2 2015-02-17 Vp
2 2015-02-18 mg
2 2015-02-19 mg
2 2015-02-20 mg
2 2015-02-21 mg
2 2015-02-22 mg
;

proc format;
value VpTfmt
.='NA';
run;

data want;
array tt[4] $2 _temporary_;
set have;
by id date;
if first.id then c=1;
else c+1;
tt[1]=lag1(upcase(type));
tt[2]=lag2(upcase(type));
tt[3]=lag3(upcase(type));
tt[4]=lag4(upcase(type));
if c>=5 then Vp_Trans=sum(tt[1]='VP',tt[2]='VP',tt[3]='VP',tt[4]='VP',upcase(type)='VP');
drop c;
format Vp_Trans VpTfmt.;
run;

proc print data=want;
run;

Edit: Formatted date with YYMMDD10.

Vendy
Obsidian | Level 7
Thank you very much. It works perfectly.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1638 views
  • 1 like
  • 3 in conversation