how to merge two dataset more efficiently

Solved
Frequent Contributor
Posts: 79

how to merge two dataset more efficiently

I want to merge eq2 and eq3 with final look like eq.

my code works. but I want to know a more efficient way by sql or data step to do it.

Thanks

``````data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 .
4 1 1 56 45 23
4 2 3 67 13 67
4 3 5 . 35 13
4 4 7 48 35 56
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 .
4 1 56 45 23
4 2 67 13 67
4 3 . 35 13
4 4 48 35 56
;
run;

data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;

proc sql;
create table m as
select * from eq2 as a
full join
eq3 as b
on a.y=b.y;
quit;

proc sort data=m ;
by id;
run;

data n;
format id y v x z w;
set m;
run;``````

Accepted Solutions
Solution
‎03-18-2018 07:48 PM
Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

[ Edited ]

Perfect.  Here's an easy way:

data eq;

array v_values {600} _temporary_;

if _n_=1 then do until (done);

set eq3 end=done;

v_values{y} = v;

end;

set eq2;

v=.;

if (1 <= y <= 600) then v = v_values{y};

format id y v x z w;

run;

proc sort data=eq;

by id;

run;

***** EDITED to add a small correction for safety's sake.

All Replies
Super User
Posts: 23,963

Re: how to merge two dataset more efficiently

What's the first table for, EQ, since your query only uses the last 2 tables, eq2 and eq3?

Since it's a many to one I would recommend a format as the fastest approach.

@xiangpang wrote:

I want to merge eq2 and eq3 with final look like eq.

my code works. but I want to know a more efficient way by sql or data step to do it.

Thanks

``````data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 .
4 1 1 56 45 23
4 2 3 67 13 67
4 3 5 . 35 13
4 4 7 48 35 56
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 .
4 1 56 45 23
4 2 67 13 67
4 3 . 35 13
4 4 48 35 56
;
run;

data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;

proc sql;
create table m as
select * from eq2 as a
full join
eq3 as b
on a.y=b.y;
quit;

proc sort data=m ;
by id;
run;

data n;
format id y v x z w;
set m;
run;``````

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

I want to create a table same as EQ

Super User
Posts: 23,963

Re: how to merge two dataset more efficiently

Ok, well, PROC FORMAT is likely the fastest method. You can find a comparison of the different approaches, proc format, hash, sql, data step or DS2 on lexjansen.com.

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

how to do merge in proc format? could you provide the code?

Thanks

Super User
Posts: 23,963

Re: how to merge two dataset more efficiently

@xiangpang wrote:

how to do merge in proc format? could you provide the code?

Thanks

http://www2.sas.com/proceedings/sugi30/001-30.pdf

Super User
Posts: 2,499

Re: how to merge two dataset more efficiently

> I want to create a table same as EQ

That's not at all obvious when looking at your post.

Using formats will be faster if table EQ3 is not too large, as this conserves the table order and avoids sorting the table again. Thousands of values is fine. If you have millions of values don't bother.

To avoid re-sorting, a hash table is the next choice if you have many values.

It is a matter of volume.

Otherwise, your existing code should be faster if you remove the proc sort and add order by instead.

Frequent Contributor
Posts: 79

Super User
Posts: 2,499

Re: how to merge two dataset more efficiently

Here is an example of code and benchmark result.

``````data F;
retain FMTNAME 'QE';
do START=1 to 1e7;
LABEL=put(START,z10.);
output;
end;
run;

proc format cntlin=F;
run;
%*    real time         26.92 seconds;
data W_FMT;
do START=1 to 1e7;
LABEL=put(START,qe.);
output;
end;
run;
%*  real time           10.07 seconds ;
%*  total               37 seconds ;

data W_HASH;
dcl hash H(dataset:'F');
H.definekey('START');
H.definedata('LABEL');
H.definedone();
LABEL='          ';
drop RC;
do START=1 to 1e7;
RC=H.find();
output;
end;
run;
%*  real time           18.39 seconds ;

proc sql;
create table W_SQL as
select a.START, b.LABEL from F a, F b where a.START=b.START order by 1;
quit;
%*  real time           22.96 seconds ;
``````

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

Thanks for the reply. I am happy to learn from you. Thanks again

Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

Relevant questions for some methods (some of which are much simpler):

Is Y always an integer?

How many different values do you have for Y?

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

ID Y V is always an integer.

For Y, it has about 500 different value.

Solution
‎03-18-2018 07:48 PM
Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

[ Edited ]

Perfect.  Here's an easy way:

data eq;

array v_values {600} _temporary_;

if _n_=1 then do until (done);

set eq3 end=done;

v_values{y} = v;

end;

set eq2;

v=.;

if (1 <= y <= 600) then v = v_values{y};

format id y v x z w;

run;

proc sort data=eq;

by id;

run;

***** EDITED to add a small correction for safety's sake.

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

Thanks. But I have two more questions.

Although "format id y v x z w;" is used, why the column sequence is "y v id x z w;"?

when y is character (letter+number), how will you change the code?

Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

It's possible that the FORMAT statement is intended to reorder the columns.  If so, getting the right order would require a small change to the program.  Also notice the small correction I added.

If Y is character, a lot depends on the range of values.  In some cases, a wider array could be used, with a formula to translate from letter+number to a unique integer value.  But that's really what a hash table is for.  An informat based on eq3 would also be possible, to be created using a CNTLIN= data set to PROC FORMAT.  Sorry to say, you would need to look up some of the details there.  But the DATA step would be easy:

data eq;

set eq2;

v = input(y, myform.);

run;

In this case, MYFORM. is the name of the informat that your program creates using a CNTLIN= data set to PROC FORMAT.

☑ This topic is solved.