BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leonzheng
Obsidian | Level 7

I have a problem of merge two datasets, 

data A

year        row        col

2017         1            1

2017         1            2

2017         2            1

2017         2            2

2018         1            1

2018         1            2

2018         2            1

2018         2            2

 

data B

row       col       x        y

1           1         a1      a2

1           2         b1      b2

2           1         c1      c2

2           2         d1      d2

 

Trying to combine the two datasets as below:

 

year        row        col          x           y

2017         1            1         a1         a2

2017         1            2         b1         b2

2017         2            1         c1         c2

2017         2            2         d1         d2

2018         1            1         a1         a2

2018         1            2         b1         b2

2018         2            1         c1         c2

2018         2            2         d1         d2

 

 

If I sort A by YEAR ROW, then merge A and B by ROW,

I will only get merged data with 2017, log shows ERROR: BY variables are not properly sorted on dataset A.

 

If I sort A by ROW YEAR, then merge A and B by ROW,

there will be no error, but for the year 2018, x and y values are not right.

 

Please help, thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Your BY in all the MERGE and SORTS should be:

 

By row col;

 

 

View solution in original post

10 REPLIES 10
Reeza
Super User

Please show your code and log. 

 

 

leonzheng
Obsidian | Level 7

A and B are as I described above.

If I try:

proc sort data=A;

by year row;

run;

 

data C;

merge A B;

by row;

 

C will be 

year        row        col          x           y

2017         1            1         a1         a2

2017         1            2         b1         b2

2017         2            1         c1         c2

2017         2            2         d1         d2

ERROR: BY variables are not properly sorted on dataset A

 

If I try:

proc sort data=A;

by row year;

run;

 

data C;

merge A B;

by row;

 

C will have both 2017 and 2018 but value of x and y in 2018 will not be right, NOTE as below:

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There are 2N observations read from the data set A.

NOTE: There are N observations read from data set B.

NOTE: The data set C has 2N observations.

 

Observation number for C is right, but all values for 2018 is not right.

 

kiranv_
Rhodochrosite | Level 12

something like this 

proc sql;
create table c as 
select  a.row, a.col, x, y
from A 
left join 
B
on a.row =b.row
and a.col=b.col
group by year ,       a.row  ,      a.col;
leonzheng
Obsidian | Level 7
seems not working, and there is no x and y in data set A
kiranv_
Rhodochrosite | Level 12

this worked for me.

 

data A;

input year        row        col;
datalines;
2017         1            1
2017         1            2
2017         2            1
2017         2            2
2018         1            1
2018         1            2
2018         2            1
2018         2            2
;
run;
 

data B;
input
row       col       x   $      y $;
datalines;
1           1         a1      a2
1           2         b1      b2
2           1         c1      c2
2           2         d1      d2
;
run;


proc sql;
create table c as 
select  a.row, a.col, x, y
from A 
left join 
B
on a.row =b.row
and a.col=b.col
group by year ,       a.row  ,      a.col;
novinosrin
Tourmaline | Level 20

switch to hash, easy:

 

data A;

input year row col;
datalines;
2017 1 1
2017 1 2
2017 2 1
2017 2 2
2018 1 1
2018 1 2
2018 2 1
2018 2 2
;
run;

data B;
input
row col x $ y $;
datalines;
1 1 a1 a2
1 2 b1 b2
2 1 c1 c2
2 2 d1 d2
;
run;


data want;
if _N_ = 1 then do;
if 0 then set a;
if 0 then set b;
declare hash h(dataset: "b");
h.defineKey('row','col');
h.defineData(all:'yes');
h.defineDone();
end;
set a;
by year row ;
if h.find() ne 0 then call missing(x,y);
run;

Astounding
PROC Star

You're just merging by the wrong variables.  SORT (both data sets) and MERGE: 

 

by row col;

 

*** EDITED:  to correct the BY variables.

leonzheng
Obsidian | Level 7
Do you mean sort A by YEAR and ROW,
sort B by ROW and COL
merge A and B by YEAR and ROW?
or by row1 row2 you mean ROW and COL for all sorting and merging
Reeza
Super User

Your BY in all the MERGE and SORTS should be:

 

By row col;

 

 

leonzheng
Obsidian | Level 7
you are right, thank you!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1203 views
  • 0 likes
  • 5 in conversation