Combing tables - Page 2

Pandu2 · Posted 04-25-2022 10:21 AM

Thankyou it worked out really well.

Tom · Posted 04-25-2022 09:13 AM

You will not get missing AMOUNT if you use either SUM(AMOUNT,0) or COALESE(AMOUNT,0). Cannot happen.

If you used select * in PROC SQL then the first variable named AMOUNT that it sees will be the one saved into the dataset. So if you used

select x.*,coalesce(y.amount,0) as amount

And there was already a variable named AMOUNT in the X dataset then that is the one you see and the result of the COALESCE() function call will not make it into the dataset.

Show the code you used.

Make a simplified example that demonstrates the issue. Share it with us using simple data steps to create the two datasets.

Pandu2 · Posted 04-25-2022 09:12 AM

Infact the amount column should have 0 value only if the values of x table doesn't exist in y table. Otherwise the amount column values shouldn't be changed. Thanks.

Tom · Posted 04-25-2022 10:30 AM

@Pandu2 wrote:
Infact the amount column should have 0 value only if the values of x table doesn't exist in y table. Otherwise the amount column values shouldn't be changed. Thanks.

So another reason to not wasting your time trying to shoehorn this into SQL code.

data want;
  merge x y(in=iny);
  by a b c ;
  if not iny then amount=0;
run;

Sajid01 · Posted 04-25-2022 10:52 AM

Hello @Pandu2

You have the two following requirements.

These two tables should be joined based on those 3 fields named a,b,c and it should have a condition like
If the values of the 3 fields in x doesn't exist in the columns of y table then for those values the amount should be equal to 0 instead of excluding them those
who doesn't exist.

While there could be many approaches to solve this problem,
one solution for this is to use a full outer join.
Proc SQL or data step merge cam be used.
As you are using three columns to base your join upon,
one approach is to create a composite key. I have used a simple approach of concatenating the three column values.
More sophisticated approaches can be used.
Your second table i.e., y has missing values for amount, it is better to have them as zero.
(you may need them in arithmetic operations. Using missing values in an expression generates missing values. This can be avoided by having them as zeros rather than keep them missing).
I suggest the following code. Modify to suit your needs.

data x;
input (a b c) (:$4.);
datalines;
1 5 15
2 6 26
3 7 37
4 8 48
;

data y;
infile datalines truncover;
input (a b c) (:$4.) amount :10.;
datalines;
1 5 15 4
2 6 26 5
3 7 36 9
4 8 48
;
data left;
retain key a b c;
set x;
key=catx('-',a,b,c);
run;

data right;
retain key a b c;
set y;
key=catx('-',a,b,c);
run;
proc sort data=left;
by key;
run;
proc sort data=right;
by key;
run;
data want (drop=key);
merge left right;
by key;
amount=coalesce(amount,0);
run;

The output will be like this

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

Re: Combing tables

SAS Innovate 2025: Call for Content