DATA Step, Macro, Functions and more

Join Table on Two Variables

Reply
Frequent Contributor
Posts: 77

Join Table on Two Variables

Suppose a customer can have multiple customer accounts and purchase_summary table summarizes the purchase totals at customer account level. Not every customer account would have purchase transactions. I use the following query to create a table which includes only customers which had purchase transactions. This query needs to use purchase_summary table twice. I wonder if it is possible to further simplify this query to use purchase_summary once only.

proc sql;

create table a1 as

select a.*, b.purchase_total

from customer a left join purchase_summary b

on a.customer_id = b.customer_id

and a.customer_account = b.customer_account

where a.customer_id in (select customer_id from purchase_summary)

; quit;

PROC Star
Posts: 7,363

Join Table on Two Variables

If I correctly understand what you want, I don't think your code is doing it.  I tried it with the following test files.

Conversely, I think a simple full join does do what you want.  Take a look at the following:

/* Some test data */

data customer;

  input customer_id customer_account ina;

  cards;

1 1 1

1 2 1

2 1 1

3 1 1

4 1 1

;

data purchase_summary;

  input customer_id customer_account purchase_total;

  cards;

1 1 10

3 1 20

4 1 30

5 1 40

;

/*what you did */

proc sql;

create table a1 as

select a.*, b.purchase_total

from customer a left join purchase_summary b

on a.customer_id = b.customer_id

and a.customer_account = b.customer_account

where a.customer_id in (select customer_id from purchase_summary)

; quit;

/*what I think you want to do*/

proc sql;

  create table a2 as

    select a.*, b.purchase_total

      from customer a, purchase_summary b

        having a.customer_id = b.customer_id

        and a.customer_account = b.customer_account

;

quit;

Trusted Advisor
Posts: 1,300

Join Table on Two Variables

Pretty sure I am not following the concept of your data but if you provided a better outline of the intended input and output I and others could probably be more helpful.

data purchase_summary;

input customer_account purchase_summary;

cards;

123 100.00

124 100.50

125  99.99

126   1.00

127 999.00

;

run;

data customer;

input customer_id customer_account;

cards;

1 123

1 124

2 125

2 126

3 127

;

run;

proc sql;

create table total as

select a.customer_id, sum(purchase_summary) as purchase_total

   from customer a, purchase_summary b

  where a.customer_account=b.customer_account

  group by customer_id;

quit;

Super User
Super User
Posts: 6,500

Join Table on Two Variables

If you only want the ones that are in both datasets then do not use a LEFT JOIN.

proc sql;

  create table a1 as

   select a.*, b.purchase_total

   from customer a 

        , purchase_summary b

   where a.customer_id = b.customer_id

      and a.customer_account = b.customer_account

  ;

quit;

Or translated into normal SAS programming:

data a1;

   merge customer (in=in1) purchase_total(in=in2 keep=customer_id customer_account purchase_total) ;

   by customer_id customer_account ;

   if in1 and in2;

run;

Ask a Question
Discussion stats
  • 3 replies
  • 145 views
  • 0 likes
  • 4 in conversation