BookmarkSubscribeRSS Feed
Pandu2
Obsidian | Level 7
Hi All,
I hope everyone is doing good. I require some assistance from you all regrading joining two tables based on some condition.

I've two tables x and y in x it has 3 columns which are present in table y as well and an extra column is there for y table named amount . These two tables should be joined based on those 3 fields named a,b,c and it should have a condition like

If the values of the 3 fields in x doesn't exist in the columns of y table then for those values the amount should be equal to 0 instead of excluding them those
who doesn't exist
Example table:
X - table :. Y - table:
a b c a. b. c. amount
1 5 15. 1. 5. 15. 4
2 6 26. 2. 6. 26. 9
3 7 37. 3. 7. 37. 8
4 8 48

Joined table (required table):
a. b. c. amount
1. 5. 15. 4
2. 6. 26. 9
3. 7. 37. 8
4. 8. 48. 0

Ps: I apologise there are no dots in the data, please don't consider them.

Thankyou.
19 REPLIES 19
PaigeMiller
Diamond | Level 26

Please provide the data as SAS data step code. You can type in the SAS data step code yourself, or via these instructions: https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/

--
Paige Miller
Pandu2
Obsidian | Level 7
Data x;
Input a b c $4;
Datalines;
1 5 15
2 6 26
3 7 37
4. 8 48
;
Data y;
Input a b c $4, amount 10.;
Datalines;
1. 5 15 4
2. 6. 26 5
3. 7. 36. 9
4. 8. 48
;
Data joined;
Input a b c $4, amount 10.;
Datalines;
1. 5 15. 4
2. 6. 26. 5
3. 7. 37. 9
4. 8. 48. 0
;

andreas_lds
Jade | Level 19

Please test code before posting.

Pandu2
Obsidian | Level 7
That's my datasets x and y and I require joined dataset
Kurt_Bremser
Super User

Log excerpt:

 76         ;
 77         Data y;
 78         Input a b c $4, amount 10.;
                          _
                          22
                          200
 ERROR 22-322: Syntaxfehler, erwartet wird eines der folgenden: ein Name, eine numerische Konstante, arrayname, #, (, +, -, /, //, 
               ;, @, @@.  
 
 ERROR 200-322: The symbol is not recognized and will be ignored.
 
 79         Datalines;
 
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.Y may be incomplete.  When this step was stopped there were 0 observations and 4 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              632.06k
       OS Memory           23976.00k
       Timestamp           25.04.2022 11:29:39 vorm.
       Step Count                        25  Switch Count  2
       Page Faults                       0
       Page Reclaims                     114
       Page Swaps                        0
       Voluntary Context Switches        11
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 84         ;
 
 
 85         Data joined;
 86         Input a b c $4, amount 10.;
                          _
                          22
                          200
 ERROR 22-322: Syntaxfehler, erwartet wird eines der folgenden: ein Name, eine numerische Konstante, arrayname, #, (, +, -, /, //, 
               ;, @, @@.  
 
 ERROR 200-322: The symbol is not recognized and will be ignored.
 
 87         Datalines;
 
 NOTE: The SAS System stopped processing this step because of errors.

As a very important basic exercise, fix these issues before proceeding.

Tom
Super User Tom
Super User

Just to a merge.

data WANT;
  merge x y;
  by a b c ;
run;

But the values need to actually match.  So the third observations where X has C='37' and Y has C='36.' will not match.  So your output will have 5 observations instead of 4. 

 

Did you really intend that C be defined as a character variable but keep A and B as numeric?

Do  you really want to replace missing values of amount with zero?  If so add some code to the data step. For example like this:

amount = sum(amount,0);

 

Pandu2
Obsidian | Level 7
Actually what's my requirement is there are two tables both x and y. In x it has 3 columns named a,b,c both are numeric I'm extremely sorry it's my bad and y table has 4 columns a,b,c and amount. Both in x,y tables a,b,c column values are same . When joining these tables based on a,b,c columns if a value doesn't exist in x or in y then for that particular value the amount should be considered as 0.
Tom
Super User Tom
Super User

@Pandu2 wrote:
Actually what's my requirement is there are two tables both x and y. In x it has 3 columns named a,b,c both are numeric I'm extremely sorry it's my bad and y table has 4 columns a,b,c and amount. Both in x,y tables a,b,c column values are same . When joining these tables based on a,b,c columns if a doesn't exist in x or in y then for that particular value the amount should be considered as 0.
data WANT;
  merge x y;
  by a b c ;
  amount=sum(amount,0);
run;
Pandu2
Obsidian | Level 7
Thanks alot. Is that possible by using proc SQL?.
Kurt_Bremser
Super User

@Pandu2 wrote:
Thanks alot. Is that possible by using proc SQL?.

Yes. Try it.

Spoiler
Use a FULL JOIN and the COALESCE function for all variables, e.g.
coalesce(x.a,y.a) as a
Coalesce y.amount with 0.
Tom
Super User Tom
Super User

@Pandu2 wrote:
Thanks alot. Is that possible by using proc SQL?.

Yes.  It is just a lot harder to type and understand than basic SAS code.

proc sql;
create table want as 
select coalesce(x.a,y.a) as a
     , coalesce(x.b,y.b) as b
     , coalesce(x.c,y.c) as c
     , coalesce(y.amount,0) as amount
from x full join y
on x.a=y.a and x.b=y.b and x.c=y.c
order by 1,2,3
;

Plus if there are replications of the combinations of the A,B,C key variables then MERGE and FULL JOIN might return different results since how the handle many to many joins is different.

 

If you only want to include the observations that appear in X then it is a little easer to code use a LEFT JOIN.

proc sql;
create table want as 
select x.*
     , coalesce(y.amount,0) as amount
from x left join y
on x.a=y.a and x.b=y.b and x.c=y.c
order by x.a,x.b,x.c
;
Pandu2
Obsidian | Level 7
Thankyou. it is working
Pandu2
Obsidian | Level 7
This didn't work it gave me all blank values in amount column.
Kurt_Bremser
Super User

@Pandu2 wrote:
This didn't work it gave me all blank values in amount column.

Can't be with the data you posted (once I fixed a couple of ERRORs):

data x;
input (a b c) (:$4.);
datalines;
1 5 15
2 6 26
3 7 37
4 8 48
;

data y;
infile datalines truncover;
input (a b c) (:$4.) amount :10.;
datalines;
1 5 15 4
2 6 26 5
3 7 36 9
4 8 48
;

proc sql;
create table want as
  select
    coalesce(x.a,y.a) as a,
    coalesce(x.b,y.b) as b,
    coalesce(x.c,y.c) as c,
    coalesce(y.amount,0) as amount
  from x full join y
  on x.a = y.a and x.b = y.b and x.c = y.c
;
quit;

Result:

a	b	c	amount
1	5	15	4
2	6	26	5
3	7	36	9
3	7	37	0
4	8	48	0

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 1148 views
  • 0 likes
  • 6 in conversation