## data merge

Solved
Occasional Contributor
Posts: 6

# data merge

I have sas data

data1

 ID IN 1 A 2 B 3 C 4 C

data2

 IN EX A D A E A F B D B E C E C F

I want data3

 ID IN EX 1 A D 1 A E 1 A F 2 B D 2 B E 3 C E 3 C F 4 C E 4 C F

I sort file 1, file 2 by in, then merge data 1 data 2 or merge data2 data1, results are different. What's happening? Any better way to get data 3? Thank you!

data temp;

merge data 1 data 2;

by in;

run;

data temp;

merge data 2 data 1;

by in;

run;

Accepted Solutions
Solution
‎08-25-2013 02:41 AM
Super Contributor
Posts: 307

## Re: data merge

Try this (untested):

proc sql;

create table want as

select t1.*, t2.EX

from data1 t1

, data2 t2

where t1.IN = t2.IN

order by ID, IN, EX

;

quit;

All Replies
Occasional Contributor
Posts: 6

## Re: data merge

also, looks like i can not get data 3 by merging data1, data2 by variable in, no matter i merge data1 data 2, or merge data 2 data1.

But I don't know how to expand A to D,E,F, B to D,E, C to E, F ....in data 1.....

Solution
‎08-25-2013 02:41 AM
Super Contributor
Posts: 307

## Re: data merge

Try this (untested):

proc sql;

create table want as

select t1.*, t2.EX

from data1 t1

, data2 t2

where t1.IN = t2.IN

order by ID, IN, EX

;

quit;

New Contributor
Posts: 2

## Re: data merge

can you try using this piece of code.

i remember learning about such a situation in sas advanced prep guide:

data _3;

set _1;

do i =1 to num;

set _2(rename=(in = in1)) nobs=num point=i;

if in = in1 then output;

end;

run;

proc print data = _last_;

run;

New Contributor
Posts: 2

## Re: data merge

where _1 and _2 are the input datasets. you can try changing the order of the datasets to see if it works any which ways...hope i am making sense.

Posts: 3,167

## Re: data merge

@Fugue's approach will be the one I would choose to use as well, cause we can take advantage of Proc SQL's native engine to do this Cartesian product kind of join by groups. However, if you have the curious mind (like most of us on this forum) on how to achieve the same results using Data step, then Hash() will most likely come into play.

data data1;

input ID IN \$;

cards;

1 A

2 B

3 C

4 C

;

data data2;

input IN\$ EX\$;

cards;

A D

A E

A F

B D

B E

C E

C F

;

data data3;

if _n_=1 then do;

if 0 then set data1 data2;

declare hash h(dataset:'data2', multidata:'y');

h.definekey('in');

h.definedata('ex');

h.definedone();

end;

set data1;

_n_=h.find();

do _n_=0 by 0 while (_n_=0);

output;

_n_=h.find_next();

end;

run;

Without involving Hash(), Data step + Index might have a shot as well, with a little more overhead I guess. @sachu69in's method will work , but with too much overhead.

HTH,

Haikuo

🔒 This topic is solved and locked.