DATA Step, Macro, Functions and more

how to merge two dataset more efficiently

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 79
Accepted Solution

how to merge two dataset more efficiently

I want to merge eq2 and eq3 with final look like eq. 

my code works. but I want to know a more efficient way by sql or data step to do it. 

 

Thanks

data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 . 
4 1 1 56 45 23
4 2 3 67 13 67  
4 3 5 . 35 13
4 4 7 48 35 56 
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 . 
4 1 56 45 23
4 2 67 13 67  
4 3 . 35 13
4 4 48 35 56 
;
run;

data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;

proc sql;
 create table m as 
select * from eq2 as a
full join 
eq3 as b
on a.y=b.y;
quit;

proc sort data=m ;
by id;
run;


data n;
format id y v x z w;
set m;
run;

Accepted Solutions
Solution
‎03-18-2018 07:48 PM
Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

[ Edited ]
Posted in reply to xiangpang

Perfect.  Here's an easy way:

 

data eq; 

array v_values {600} _temporary_;

if _n_=1 then do until (done);

   set eq3 end=done;

   v_values{y} = v;

end;

set eq2;

v=.;

if (1 <= y <= 600) then v = v_values{y};

format id y v x z w;

run;

 

proc sort data=eq;

by id;

run;

 

 

***** EDITED to add a small correction for safety's sake.

View solution in original post


All Replies
Super User
Posts: 23,963

Re: how to merge two dataset more efficiently

Posted in reply to xiangpang

What's the first table for, EQ, since your query only uses the last 2 tables, eq2 and eq3?

Since it's a many to one I would recommend a format as the fastest approach. 

 


@xiangpang wrote:

I want to merge eq2 and eq3 with final look like eq. 

my code works. but I want to know a more efficient way by sql or data step to do it. 

 

Thanks

data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 . 
4 1 1 56 45 23
4 2 3 67 13 67  
4 3 5 . 35 13
4 4 7 48 35 56 
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 . 
4 1 56 45 23
4 2 67 13 67  
4 3 . 35 13
4 4 48 35 56 
;
run;

data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;

proc sql;
 create table m as 
select * from eq2 as a
full join 
eq3 as b
on a.y=b.y;
quit;

proc sort data=m ;
by id;
run;


data n;
format id y v x z w;
set m;
run;

 

 

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

I want to create a table same as EQ

Super User
Posts: 23,963

Re: how to merge two dataset more efficiently

Posted in reply to xiangpang

Ok, well, PROC FORMAT is likely the fastest method. You can find a comparison of the different approaches, proc format, hash, sql, data step or DS2 on lexjansen.com.

 

 

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

how to do merge in proc format? could you provide the code? 

 

Thanks

Super User
Posts: 23,963

Re: how to merge two dataset more efficiently

Posted in reply to xiangpang

@xiangpang wrote:

how to do merge in proc format? could you provide the code? 

 

Thanks


http://www2.sas.com/proceedings/sugi30/001-30.pdf

Super User
Posts: 2,499

Re: how to merge two dataset more efficiently

Posted in reply to xiangpang

> I want to create a table same as EQ

That's not at all obvious when looking at your post.

Good questions bring good answers.

 

Using formats will be faster if table EQ3 is not too large, as this conserves the table order and avoids sorting the table again. Thousands of values is fine. If you have millions of values don't bother. 

 

To avoid re-sorting, a hash table is the next choice if you have many values.

 

It is a matter of volume.

 

Otherwise, your existing code should be faster if you remove the proc sort and add order by instead.

 

 

 

 

 

 

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

thanks for your reply

Super User
Posts: 2,499

Re: how to merge two dataset more efficiently

Posted in reply to xiangpang

Here is an example of code and benchmark result.

The results will vary depending on your data and your hardware.

data F;
  retain FMTNAME 'QE';
  do START=1 to 1e7;
    LABEL=put(START,z10.);
    output;
  end;
run;


proc format cntlin=F;
run;    
%*    real time         26.92 seconds;
data W_FMT;
 do START=1 to 1e7;
   LABEL=put(START,qe.);
   output;
  end; 
run;
%*  real time           10.07 seconds ;
%*  total               37 seconds ;


data W_HASH;
  dcl hash H(dataset:'F');
  H.definekey('START');
  H.definedata('LABEL');
  H.definedone();
  LABEL='          ';
  drop RC;
  do START=1 to 1e7;
    RC=H.find(); 
    output;
  end;
run;
%*  real time           18.39 seconds ;
  

proc sql;
  create table W_SQL as
  select a.START, b.LABEL from F a, F b where a.START=b.START order by 1;
quit;
%*  real time           22.96 seconds ;

 

 

 

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

Thanks for the reply. I am happy to learn from you. Thanks again

Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

Relevant questions for some methods (some of which are much simpler):

 

Is Y always an integer?

 

How many different values do you have for Y?

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

Posted in reply to Astounding

Thanks for your reply. 

 

ID Y V is always an integer. 

 

For Y, it has about 500 different value. 

Solution
‎03-18-2018 07:48 PM
Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

[ Edited ]
Posted in reply to xiangpang

Perfect.  Here's an easy way:

 

data eq; 

array v_values {600} _temporary_;

if _n_=1 then do until (done);

   set eq3 end=done;

   v_values{y} = v;

end;

set eq2;

v=.;

if (1 <= y <= 600) then v = v_values{y};

format id y v x z w;

run;

 

proc sort data=eq;

by id;

run;

 

 

***** EDITED to add a small correction for safety's sake.

Frequent Contributor
Posts: 79

Re: how to merge two dataset more efficiently

Posted in reply to Astounding

Thanks. But I have two more questions. 

Although "format id y v x z w;" is used, why the column sequence is "y v id x z w;"?

when y is character (letter+number), how will you change the code?

 

 

Super User
Posts: 6,908

Re: how to merge two dataset more efficiently

Posted in reply to xiangpang

It's possible that the FORMAT statement is intended to reorder the columns.  If so, getting the right order would require a small change to the program.  Also notice the small correction I added.

 

If Y is character, a lot depends on the range of values.  In some cases, a wider array could be used, with a formula to translate from letter+number to a unique integer value.  But that's really what a hash table is for.  An informat based on eq3 would also be possible, to be created using a CNTLIN= data set to PROC FORMAT.  Sorry to say, you would need to look up some of the details there.  But the DATA step would be easy:

 

data eq;

set eq2;

v = input(y, myform.);

run;

 

In this case, MYFORM. is the name of the informat that your program creates using a CNTLIN= data set to PROC FORMAT.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 235 views
  • 1 like
  • 4 in conversation