DATA Step, Macro, Functions and more

Merge datasets

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Merge datasets

I have a problem of merge two datasets, 

data A

year        row        col

2017         1            1

2017         1            2

2017         2            1

2017         2            2

2018         1            1

2018         1            2

2018         2            1

2018         2            2

 

data B

row       col       x        y

1           1         a1      a2

1           2         b1      b2

2           1         c1      c2

2           2         d1      d2

 

Trying to combine the two datasets as below:

 

year        row        col          x           y

2017         1            1         a1         a2

2017         1            2         b1         b2

2017         2            1         c1         c2

2017         2            2         d1         d2

2018         1            1         a1         a2

2018         1            2         b1         b2

2018         2            1         c1         c2

2018         2            2         d1         d2

 

 

If I sort A by YEAR ROW, then merge A and B by ROW,

I will only get merged data with 2017, log shows ERROR: BY variables are not properly sorted on dataset A.

 

If I sort A by ROW YEAR, then merge A and B by ROW,

there will be no error, but for the year 2018, x and y values are not right.

 

Please help, thanks!

 


Accepted Solutions
Solution
‎01-03-2018 06:59 PM
Super User
Posts: 23,368

Re: Merge datasets

Posted in reply to leonzheng

Your BY in all the MERGE and SORTS should be:

 

By row col;

 

 

View solution in original post


All Replies
Super User
Posts: 23,368

Re: Merge datasets

Posted in reply to leonzheng

Please show your code and log. 

 

 

Contributor
Posts: 30

Re: Merge datasets

A and B are as I described above.

If I try:

proc sort data=A;

by year row;

run;

 

data C;

merge A B;

by row;

 

C will be 

year        row        col          x           y

2017         1            1         a1         a2

2017         1            2         b1         b2

2017         2            1         c1         c2

2017         2            2         d1         d2

ERROR: BY variables are not properly sorted on dataset A

 

If I try:

proc sort data=A;

by row year;

run;

 

data C;

merge A B;

by row;

 

C will have both 2017 and 2018 but value of x and y in 2018 will not be right, NOTE as below:

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There are 2N observations read from the data set A.

NOTE: There are N observations read from data set B.

NOTE: The data set C has 2N observations.

 

Observation number for C is right, but all values for 2018 is not right.

 

PROC Star
Posts: 504

Re: Merge datasets

Posted in reply to leonzheng

something like this 

proc sql;
create table c as 
select  a.row, a.col, x, y
from A 
left join 
B
on a.row =b.row
and a.col=b.col
group by year ,       a.row  ,      a.col;
Contributor
Posts: 30

Re: Merge datasets

seems not working, and there is no x and y in data set A
PROC Star
Posts: 504

Re: Merge datasets

Posted in reply to leonzheng

this worked for me.

 

data A;

input year        row        col;
datalines;
2017         1            1
2017         1            2
2017         2            1
2017         2            2
2018         1            1
2018         1            2
2018         2            1
2018         2            2
;
run;
 

data B;
input
row       col       x   $      y $;
datalines;
1           1         a1      a2
1           2         b1      b2
2           1         c1      c2
2           2         d1      d2
;
run;


proc sql;
create table c as 
select  a.row, a.col, x, y
from A 
left join 
B
on a.row =b.row
and a.col=b.col
group by year ,       a.row  ,      a.col;
PROC Star
Posts: 1,605

Re: Merge datasets

[ Edited ]
Posted in reply to leonzheng

switch to hash, easy:

 

data A;

input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;

data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;


data want;
if _N_ = 1 then do;
if 0 then set a;
if 0 then set b;
declare hash h(dataset: "b");
h.defineKey('row','col');
h.defineData(all:'yes');
h.defineDone();
end;
set a;
by year row ;
if h.find() ne 0 then call missing(x,y);
run;

Super User
Posts: 6,648

Re: Merge datasets

[ Edited ]
Posted in reply to leonzheng

You're just merging by the wrong variables.  SORT (both data sets) and MERGE: 

 

by row col;

 

*** EDITED:  to correct the BY variables.

Contributor
Posts: 30

Re: Merge datasets

Posted in reply to Astounding
Do you mean sort A by YEAR and ROW,
sort B by ROW and COL
merge A and B by YEAR and ROW?
or by row1 row2 you mean ROW and COL for all sorting and merging
Solution
‎01-03-2018 06:59 PM
Super User
Posts: 23,368

Re: Merge datasets

Posted in reply to leonzheng

Your BY in all the MERGE and SORTS should be:

 

By row col;

 

 

Contributor
Posts: 30

Re: Merge datasets

you are right, thank you!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 347 views
  • 0 likes
  • 5 in conversation