I have a problem of merge two datasets,
data A
year row col
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
data B
row col x y
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
Trying to combine the two datasets as below:
year row col x y
2017 1 1 a1 a2
2017 1 2 b1 b2
2017 2 1 c1 c2
2017 2 2 d1 d2
2018 1 1 a1 a2
2018 1 2 b1 b2
2018 2 1 c1 c2
2018 2 2 d1 d2
If I sort A by YEAR ROW, then merge A and B by ROW,
I will only get merged data with 2017, log shows ERROR: BY variables are not properly sorted on dataset A.
If I sort A by ROW YEAR, then merge A and B by ROW,
there will be no error, but for the year 2018, x and y values are not right.
Please help, thanks!
Please show your code and log.
A and B are as I described above.
If I try:
proc sort data=A;
by year row;
run;
data C;
merge A B;
by row;
C will be
year row col x y
2017 1 1 a1 a2
2017 1 2 b1 b2
2017 2 1 c1 c2
2017 2 2 d1 d2
ERROR: BY variables are not properly sorted on dataset A
If I try:
proc sort data=A;
by row year;
run;
data C;
merge A B;
by row;
C will have both 2017 and 2018 but value of x and y in 2018 will not be right, NOTE as below:
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There are 2N observations read from the data set A.
NOTE: There are N observations read from data set B.
NOTE: The data set C has 2N observations.
Observation number for C is right, but all values for 2018 is not right.
something like this
proc sql;
create table c as
select a.row, a.col, x, y
from A
left join
B
on a.row =b.row
and a.col=b.col
group by year , a.row , a.col;
this worked for me.
data A;
input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;
data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;
proc sql;
create table c as
select a.row, a.col, x, y
from A
left join
B
on a.row =b.row
and a.col=b.col
group by year , a.row , a.col;
switch to hash, easy:
data A;
input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;
data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;
data want;
if _N_ = 1 then do;
if 0 then set a;
if 0 then set b;
declare hash h(dataset: "b");
h.defineKey('row','col');
h.defineData(all:'yes');
h.defineDone();
end;
set a;
by year row ;
if h.find() ne 0 then call missing(x,y);
run;
You're just merging by the wrong variables. SORT (both data sets) and MERGE:
by row col;
*** EDITED: to correct the BY variables.
Your BY in all the MERGE and SORTS should be:
By row col;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.