BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xiangpang
Quartz | Level 8

I want to merge eq2 and eq3 with final look like eq. 

my code works. but I want to know a more efficient way by sql or data step to do it. 

 

Thanks

data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 . 
4 1 1 56 45 23
4 2 3 67 13 67  
4 3 5 . 35 13
4 4 7 48 35 56 
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 . 
4 1 56 45 23
4 2 67 13 67  
4 3 . 35 13
4 4 48 35 56 
;
run;

data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;

proc sql;
 create table m as 
select * from eq2 as a
full join 
eq3 as b
on a.y=b.y;
quit;

proc sort data=m ;
by id;
run;


data n;
format id y v x z w;
set m;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Perfect.  Here's an easy way:

 

data eq; 

array v_values {600} _temporary_;

if _n_=1 then do until (done);

   set eq3 end=done;

   v_values{y} = v;

end;

set eq2;

v=.;

if (1 <= y <= 600) then v = v_values{y};

format id y v x z w;

run;

 

proc sort data=eq;

by id;

run;

 

 

***** EDITED to add a small correction for safety's sake.

View solution in original post

15 REPLIES 15
Reeza
Super User

What's the first table for, EQ, since your query only uses the last 2 tables, eq2 and eq3?

Since it's a many to one I would recommend a format as the fastest approach. 

 


@xiangpang wrote:

I want to merge eq2 and eq3 with final look like eq. 

my code works. but I want to know a more efficient way by sql or data step to do it. 

 

Thanks

data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 . 
4 1 1 56 45 23
4 2 3 67 13 67  
4 3 5 . 35 13
4 4 7 48 35 56 
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 . 
4 1 56 45 23
4 2 67 13 67  
4 3 . 35 13
4 4 48 35 56 
;
run;

data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;

proc sql;
 create table m as 
select * from eq2 as a
full join 
eq3 as b
on a.y=b.y;
quit;

proc sort data=m ;
by id;
run;


data n;
format id y v x z w;
set m;
run;

 

 

xiangpang
Quartz | Level 8

I want to create a table same as EQ

Reeza
Super User

Ok, well, PROC FORMAT is likely the fastest method. You can find a comparison of the different approaches, proc format, hash, sql, data step or DS2 on lexjansen.com.

 

 

xiangpang
Quartz | Level 8

how to do merge in proc format? could you provide the code? 

 

Thanks

Reeza
Super User

@xiangpang wrote:

how to do merge in proc format? could you provide the code? 

 

Thanks


http://www2.sas.com/proceedings/sugi30/001-30.pdf

ChrisNZ
Tourmaline | Level 20

> I want to create a table same as EQ

That's not at all obvious when looking at your post.

Good questions bring good answers.

 

Using formats will be faster if table EQ3 is not too large, as this conserves the table order and avoids sorting the table again. Thousands of values is fine. If you have millions of values don't bother. 

 

To avoid re-sorting, a hash table is the next choice if you have many values.

 

It is a matter of volume.

 

Otherwise, your existing code should be faster if you remove the proc sort and add order by instead.

 

 

 

 

 

 

xiangpang
Quartz | Level 8

thanks for your reply

ChrisNZ
Tourmaline | Level 20

Here is an example of code and benchmark result.

The results will vary depending on your data and your hardware.

data F;
  retain FMTNAME 'QE';
  do START=1 to 1e7;
    LABEL=put(START,z10.);
    output;
  end;
run;


proc format cntlin=F;
run;    
%*    real time         26.92 seconds;
data W_FMT;
 do START=1 to 1e7;
   LABEL=put(START,qe.);
   output;
  end; 
run;
%*  real time           10.07 seconds ;
%*  total               37 seconds ;


data W_HASH;
  dcl hash H(dataset:'F');
  H.definekey('START');
  H.definedata('LABEL');
  H.definedone();
  LABEL='          ';
  drop RC;
  do START=1 to 1e7;
    RC=H.find(); 
    output;
  end;
run;
%*  real time           18.39 seconds ;
  

proc sql;
  create table W_SQL as
  select a.START, b.LABEL from F a, F b where a.START=b.START order by 1;
quit;
%*  real time           22.96 seconds ;

 

 

 

xiangpang
Quartz | Level 8

Thanks for the reply. I am happy to learn from you. Thanks again

Astounding
PROC Star

Relevant questions for some methods (some of which are much simpler):

 

Is Y always an integer?

 

How many different values do you have for Y?

xiangpang
Quartz | Level 8

Thanks for your reply. 

 

ID Y V is always an integer. 

 

For Y, it has about 500 different value. 

Astounding
PROC Star

Perfect.  Here's an easy way:

 

data eq; 

array v_values {600} _temporary_;

if _n_=1 then do until (done);

   set eq3 end=done;

   v_values{y} = v;

end;

set eq2;

v=.;

if (1 <= y <= 600) then v = v_values{y};

format id y v x z w;

run;

 

proc sort data=eq;

by id;

run;

 

 

***** EDITED to add a small correction for safety's sake.

xiangpang
Quartz | Level 8

Thanks. But I have two more questions. 

Although "format id y v x z w;" is used, why the column sequence is "y v id x z w;"?

when y is character (letter+number), how will you change the code?

 

 

Astounding
PROC Star

It's possible that the FORMAT statement is intended to reorder the columns.  If so, getting the right order would require a small change to the program.  Also notice the small correction I added.

 

If Y is character, a lot depends on the range of values.  In some cases, a wider array could be used, with a formula to translate from letter+number to a unique integer value.  But that's really what a hash table is for.  An informat based on eq3 would also be possible, to be created using a CNTLIN= data set to PROC FORMAT.  Sorry to say, you would need to look up some of the details there.  But the DATA step would be easy:

 

data eq;

set eq2;

v = input(y, myform.);

run;

 

In this case, MYFORM. is the name of the informat that your program creates using a CNTLIN= data set to PROC FORMAT.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1834 views
  • 1 like
  • 4 in conversation