I have a problem of merge two datasets,
data A
year row col
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
data B
row col x y
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
Trying to combine the two datasets as below:
year row col x y
2017 1 1 a1 a2
2017 1 2 b1 b2
2017 2 1 c1 c2
2017 2 2 d1 d2
2018 1 1 a1 a2
2018 1 2 b1 b2
2018 2 1 c1 c2
2018 2 2 d1 d2
If I sort A by YEAR ROW, then merge A and B by ROW,
I will only get merged data with 2017, log shows ERROR: BY variables are not properly sorted on dataset A.
If I sort A by ROW YEAR, then merge A and B by ROW,
there will be no error, but for the year 2018, x and y values are not right.
Please help, thanks!
Please show your code and log.
A and B are as I described above.
If I try:
proc sort data=A;
by year row;
run;
data C;
merge A B;
by row;
C will be
year row col x y
2017 1 1 a1 a2
2017 1 2 b1 b2
2017 2 1 c1 c2
2017 2 2 d1 d2
ERROR: BY variables are not properly sorted on dataset A
If I try:
proc sort data=A;
by row year;
run;
data C;
merge A B;
by row;
C will have both 2017 and 2018 but value of x and y in 2018 will not be right, NOTE as below:
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There are 2N observations read from the data set A.
NOTE: There are N observations read from data set B.
NOTE: The data set C has 2N observations.
Observation number for C is right, but all values for 2018 is not right.
something like this
proc sql;
create table c as
select a.row, a.col, x, y
from A
left join
B
on a.row =b.row
and a.col=b.col
group by year , a.row , a.col;
this worked for me.
data A;
input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;
data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;
proc sql;
create table c as
select a.row, a.col, x, y
from A
left join
B
on a.row =b.row
and a.col=b.col
group by year , a.row , a.col;
switch to hash, easy:
data A;
input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;
data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;
data want;
if _N_ = 1 then do;
if 0 then set a;
if 0 then set b;
declare hash h(dataset: "b");
h.defineKey('row','col');
h.defineData(all:'yes');
h.defineDone();
end;
set a;
by year row ;
if h.find() ne 0 then call missing(x,y);
run;
You're just merging by the wrong variables. SORT (both data sets) and MERGE:
by row col;
*** EDITED: to correct the BY variables.
Your BY in all the MERGE and SORTS should be:
By row col;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.