Help using Base SAS procedures

Identify third variable through two variable

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 98
Accepted Solution

Identify third variable through two variable

In the following file, I need to identify any "Paid" that happens after the first "Shipped", in terms of both order and time.  For example, for OrderID#1, the third record counts as it occurs after at least one shipment takes place. OrderID#2 does not have any record that counts because "Paid" cannot happen on the same day as "Shipped". For OrderID#4, its third record counts as "Paid" is supposed to occur after at least one "Shipped" and it needs be on the days after "Shipped". The last column "Completed" is my target.

Any suggestions?

OrderIDShippedPaiddateCompleted
1011/1/20140
1101/2/20140
1111/5/20141
2111/2/20140
2101/4/20140
3101/5/20140
3111/6/20141
3101/8/20140
4102/2/20140
4112/2/20140
4012/5/20141
4102/6/20140

Thanks a bunch!

Lizi


Accepted Solutions
Solution
‎11-25-2014 10:57 AM
Respected Advisor
Posts: 3,156

Re: Identify third variable through two variable

A Proc SQL alternative if your data is not too big: (raw data stoled from Ksharp):

data have;

     infile cards expandtabs truncover;

     input OrderID Shipped  Paid  date : mmddyy10.;

     format date mmddyy10.;

     cards;

1     0 1     1/1/2014

1     1 0     1/2/2014

1     1 1     1/5/2014

2     1 1     1/2/2014

2     1 0     1/4/2014

3     1 0     1/5/2014

3     1 1     1/6/2014

3     1 0     1/8/2014

4     1 0     2/2/2014

4     1 1     2/2/2014

4     0 1     2/5/2014

4     1 0     2/6/2014

;

run;

proc sql;

     create table want as

           select *, exists(select * from have where shipped =1 and orderid=a.orderid

                and a.paid=1 having  min(date) < a.date) as completed

           from have a

                order by a.orderid, a.date, a.paid

     ;

quit;

View solution in original post


All Replies
Super User
Posts: 10,018

Re: Identify third variable through two variable

Why 4 shouldn't be

OrderIDShippedPaiddateCompleted
1011/1/20140
1101/2/20140
1111/5/20141
2111/2/20140
2101/4/20140
3101/5/20140
3111/6/20141
3101/8/20140
4102/2/20140
4112/2/20140
4012/5/20141
4102/6/20140


data have;
infile cards expandtabs truncover;
input OrderID Shipped  Paid  date : mmddyy10.;
format date mmddyy10.;
cards;     
1     0     1     1/1/2014
1     1     0     1/2/2014
1     1     1     1/5/2014
2     1     1     1/2/2014
2     1     0     1/4/2014
3     1     0     1/5/2014
3     1     1     1/6/2014
3     1     0     1/8/2014
4     1     0     2/2/2014
4     1     1     2/2/2014
4     0     1     2/5/2014
4     1     0     2/6/2014
;
run;
data want;
 set have;
 by  OrderID;
 retain found flag .;
 if first.OrderID then do;found=.; flag=.;end;
 if found and Paid and not flag then do;Completed=1; flag=1; end;
  else Completed=0;
 if Shipped then found=1;
 drop found flag;
run;

Xia Keshan

Frequent Contributor
Posts: 98

Re: Identify third variable through two variable

Thank you so much, Keshan! But I realized that the time issue is not taken care of in your program. I have to consider both order and time.

Solution
‎11-25-2014 10:57 AM
Respected Advisor
Posts: 3,156

Re: Identify third variable through two variable

A Proc SQL alternative if your data is not too big: (raw data stoled from Ksharp):

data have;

     infile cards expandtabs truncover;

     input OrderID Shipped  Paid  date : mmddyy10.;

     format date mmddyy10.;

     cards;

1     0 1     1/1/2014

1     1 0     1/2/2014

1     1 1     1/5/2014

2     1 1     1/2/2014

2     1 0     1/4/2014

3     1 0     1/5/2014

3     1 1     1/6/2014

3     1 0     1/8/2014

4     1 0     2/2/2014

4     1 1     2/2/2014

4     0 1     2/5/2014

4     1 0     2/6/2014

;

run;

proc sql;

     create table want as

           select *, exists(select * from have where shipped =1 and orderid=a.orderid

                and a.paid=1 having  min(date) < a.date) as completed

           from have a

                order by a.orderid, a.date, a.paid

     ;

quit;

Frequent Contributor
Posts: 98

Re: Identify third variable through two variable

Thanks a lot, Kuo.

Your way is pretty intuitive to me as I am still more familiar with small datahase management than the SAS retain function.

Super User
Posts: 10,018

Re: Identify third variable through two variable

I just realize Shipped and Paid should not be the same day.

data have;
infile cards expandtabs truncover;
input OrderID Shipped  Paid  date : mmddyy10.;
format date mmddyy10.;
cards;     
1     0     1     1/1/2014
1     1     0     1/2/2014
1     1     1     1/5/2014
2     1     1     1/2/2014
2     1     0     1/4/2014
3     1     0     1/5/2014
3     1     1     1/6/2014
3     1     0     1/8/2014
4     1     0     2/2/2014
4     1     1     2/2/2014
4     0     1     2/5/2014
4     1     0     2/6/2014
;
run;
data want;
 set have;
 by  OrderID;
 retain found flag _date .;
 if first.OrderID then do;found=.; flag=.;_date=.;end;
 if found and Paid and not flag and date ne _date then do;Completed=1; flag=1; end;
  else Completed=0;
 if Shipped then do;found=1;_date=date;end;
 drop found flag _date;
run;

Xia Keshan

Frequent Contributor
Posts: 98

Re: Identify third variable through two variable

Keshan, it is more the matter of combination of timing and order than timing only or order only. Say if I have OrderID#5 that has the following shipping and payment timing, the program above would not be able to catch this order (Completed=0)

OrderIDShippedPaiddateCompleted
5106/3/20140
5106/5/20140
5116/5/20141
5106/5/20140

However, as you can see, Order #5 does have Completed=1.

In sum, I am looking for the record per order that indicates "Paid" occurs after (both in terms of order and time) at least one "Shipped" takes place.

Thanks again!

Super User
Posts: 10,018

Re: Identify third variable through two variable

It is more complicated than I image .This time could work.

data have;

infile cards expandtabs truncover;

input OrderID Shipped  Paid  date : mmddyy10.;

format date mmddyy10.;

cards;    

1     0     1     1/1/2014

1     1     0     1/2/2014

1     1     1     1/5/2014

2     1     1     1/2/2014

2     1     0     1/4/2014

3     1     0     1/5/2014

3     1     1     1/6/2014

3     1     0     1/8/2014

4     1     0     2/2/2014

4     1     1     2/2/2014

4     0     1     2/5/2014

4     1     0     2/6/2014

5 1 0 6/3/2014

5 1 0 6/5/2014

5 1 1 6/5/2014

5 1 0 6/5/2014

;

run;

data want;

set have;

by  OrderID;

retain found flag _date .;

if first.OrderID then do;found=.; flag=.;_date=.;end;

if found and Paid and not flag and date ne _date then do;Completed=1; flag=1; end;

  else Completed=0;

if not found and Shipped then do;found=1;_date=date;end;

drop found flag _date;

run;

Xia Keshan

Frequent Contributor
Posts: 98

Re: Identify third variable through two variable

I think you got it right. Thank you for spending so much time, Keshan.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 335 views
  • 6 likes
  • 3 in conversation