DATA Step, Macro, Functions and more

How to merge more than one data set and populate the values missing in a variable from another set

Reply
Super Contributor
Posts: 270

How to merge more than one data set and populate the values missing in a variable from another set

Dear,

I am merging three dataset to populate missing 'age' variable values with values two and three sets. With my code the values are not populated.  I written a sql code to reduce my code. Please suggest in my code Thank you very much.

 

 

data one;
input id age;
datalines;
1 50
2 40
3 .
4 30
5 .
6 70
;
data two;
input id age;
datalines;
1 50
2 40
3 80
;
data three;
input id age;
datalines;
4 30
5 60
6 70
;
proc sql;
create table four as
select *
from ( select * from one as a left join two as b
on a.id=b.id) as a left join three as c
on a.id=c.id
order by id;
quit;
Super User
Posts: 10,476

Re: How to merge more than one data set and populate the values missing in a variable from another s

And what do you think the resulting warnings in the log mean?

WARNING: Column named id is duplicated in a select expression (or a view). Explicit references to
         it will be to the first one.
WARNING: Column named id is duplicated in a select expression (or a view). Explicit references to
         it will be to the first one.
WARNING: Variable id already exists on file USER.FOUR.
WARNING: Variable age already exists on file USER.FOUR.
WARNING: Variable id already exists on file USER.FOUR.
WARNING: Variable age already exists on file USER.FOUR.

Since you did nothing in the main (first) select then you told the procedure to only keep the first variable of  the same name.

 

You might want to start again with something like:

proc sql;
create table four as
select *
from ( select * from one as a left join (select id,age as ageb from two) as b
on a.id=b.id) as a left join (select id, age as agec from three) as c
on a.id=c.id
order by id;
quit;

Or two data set UPDATES if the main data that needs changes does not have repeats of the ID variable(s)

 

Super User
Posts: 5,077

Re: How to merge more than one data set and populate the values missing in a variable from another s

You haven't told us enough.

 

What should happen if two of your data sets have different values for AGE for the same ID?  Which AGE do you want?

 

In real life, do any of the data sets contain additional variables?

Ask a Question
Discussion stats
  • 2 replies
  • 91 views
  • 0 likes
  • 3 in conversation