Solved: Re: Merge datasets

leonzheng · Posted 01-03-2018 04:57 PM

I have a problem of merge two datasets,

data A

year row col

2017 1 1

2017 1 2

2017 2 1

2017 2 2

2018 1 1

2018 1 2

2018 2 1

2018 2 2

data B

row col x y

1 1 a1 a2

1 2 b1 b2

2 1 c1 c2

2 2 d1 d2

Trying to combine the two datasets as below:

year row col x y

2017 1 1 a1 a2

2017 1 2 b1 b2

2017 2 1 c1 c2

2017 2 2 d1 d2

2018 1 1 a1 a2

2018 1 2 b1 b2

2018 2 1 c1 c2

2018 2 2 d1 d2

If I sort A by YEAR ROW, then merge A and B by ROW,

I will only get merged data with 2017, log shows ERROR: BY variables are not properly sorted on dataset A.

If I sort A by ROW YEAR, then merge A and B by ROW,

there will be no error, but for the year 2018, x and y values are not right.

Please help, thanks!

Reeza · Posted 01-03-2018 06:33 PM

Your BY in all the MERGE and SORTS should be:

By row col;

View solution in original post

Reeza · Posted 01-03-2018 04:58 PM

Please show your code and log.

leonzheng · Posted 01-03-2018 05:10 PM

A and B are as I described above.

If I try:

proc sort data=A;

by year row;

run;

data C;

merge A B;

by row;

C will be

year row col x y

2017 1 1 a1 a2

2017 1 2 b1 b2

2017 2 1 c1 c2

2017 2 2 d1 d2

ERROR: BY variables are not properly sorted on dataset A

If I try:

proc sort data=A;

by row year;

run;

data C;

merge A B;

by row;

C will have both 2017 and 2018 but value of x and y in 2018 will not be right, NOTE as below:

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There are 2N observations read from the data set A.

NOTE: There are N observations read from data set B.

NOTE: The data set C has 2N observations.

Observation number for C is right, but all values for 2018 is not right.

kiranv_ · Posted 01-03-2018 05:28 PM

something like this

proc sql;
create table c as 
select  a.row, a.col, x, y
from A 
left join 
B
on a.row =b.row
and a.col=b.col
group by year ,       a.row  ,      a.col;

leonzheng · Posted 01-03-2018 05:37 PM

seems not working, and there is no x and y in data set A

kiranv_ · Posted 01-03-2018 05:39 PM

this worked for me.

data A;

input year        row        col;
datalines;
2017         1            1
2017         1            2
2017         2            1
2017         2            2
2018         1            1
2018         1            2
2018         2            1
2018         2            2
;
run;
 

data B;
input
row       col       x   $      y $;
datalines;
1           1         a1      a2
1           2         b1      b2
2           1         c1      c2
2           2         d1      d2
;
run;


proc sql;
create table c as 
select  a.row, a.col, x, y
from A 
left join 
B
on a.row =b.row
and a.col=b.col
group by year ,       a.row  ,      a.col;

novinosrin · Posted 01-03-2018 05:51 PM

switch to hash, easy:

data A;

input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;

data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;

data want;
if _N_ = 1 then do;
if 0 then set a;
if 0 then set b;
declare hash h(dataset: "b");
h.defineKey('row','col');
h.defineData(all:'yes');
h.defineDone();
end;
set a;
by year row ;
if h.find() ne 0 then call missing(x,y);
run;

Astounding · Posted 01-03-2018 06:09 PM

You're just merging by the wrong variables. SORT (both data sets) and MERGE:

by row col;

*** EDITED: to correct the BY variables.

leonzheng · Posted 01-03-2018 06:23 PM

Do you mean sort A by YEAR and ROW,
sort B by ROW and COL
merge A and B by YEAR and ROW?
or by row1 row2 you mean ROW and COL for all sorting and merging

Reeza · Posted 01-03-2018 06:33 PM

Your BY in all the MERGE and SORTS should be:

By row col;

leonzheng · Posted 01-03-2018 06:59 PM

you are right, thank you!

SAS Innovate 2025: Call for Content

Classroom Training Available!