- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to merge eq2 and eq3 with final look like eq.
my code works. but I want to know a more efficient way by sql or data step to do it.
Thanks
data eq;
input ID y v x z w;
cards;
1 1 1 27 40 8
1 2 3 . 29 37
1 3 5 30 . 25
1 4 7 38 38 23
2 1 1 23 45 19
2 2 3 32 20 .
2 3 5 67 . .
2 4 7 . 27 .
3 1 1 33 23 46
3 2 3 21 12 56
3 3 5 78 . 34
3 4 7 13 45 .
4 1 1 56 45 23
4 2 3 67 13 67
4 3 5 . 35 13
4 4 7 48 35 56
;
run;
data eq2;
input ID y x z w;
cards;
1 1 27 40 8
1 2 . 29 37
1 3 30 . 25
1 4 38 38 23
2 1 23 45 19
2 2 32 20 .
2 3 67 . .
2 4 . 27 .
3 1 33 23 46
3 2 21 12 56
3 3 78 . 34
3 4 13 45 .
4 1 56 45 23
4 2 67 13 67
4 3 . 35 13
4 4 48 35 56
;
run;
data eq3;
input y v ;
cards;
1 1
2 3
3 5
4 7
;
run;
proc sql;
create table m as
select * from eq2 as a
full join
eq3 as b
on a.y=b.y;
quit;
proc sort data=m ;
by id;
run;
data n;
format id y v x z w;
set m;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perfect. Here's an easy way:
data eq;
array v_values {600} _temporary_;
if _n_=1 then do until (done);
set eq3 end=done;
v_values{y} = v;
end;
set eq2;
v=.;
if (1 <= y <= 600) then v = v_values{y};
format id y v x z w;
run;
proc sort data=eq;
by id;
run;
***** EDITED to add a small correction for safety's sake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What's the first table for, EQ, since your query only uses the last 2 tables, eq2 and eq3?
Since it's a many to one I would recommend a format as the fastest approach.
@xiangpang wrote:
I want to merge eq2 and eq3 with final look like eq.
my code works. but I want to know a more efficient way by sql or data step to do it.
Thanks
data eq; input ID y v x z w; cards; 1 1 1 27 40 8 1 2 3 . 29 37 1 3 5 30 . 25 1 4 7 38 38 23 2 1 1 23 45 19 2 2 3 32 20 . 2 3 5 67 . . 2 4 7 . 27 . 3 1 1 33 23 46 3 2 3 21 12 56 3 3 5 78 . 34 3 4 7 13 45 . 4 1 1 56 45 23 4 2 3 67 13 67 4 3 5 . 35 13 4 4 7 48 35 56 ; run; data eq2; input ID y x z w; cards; 1 1 27 40 8 1 2 . 29 37 1 3 30 . 25 1 4 38 38 23 2 1 23 45 19 2 2 32 20 . 2 3 67 . . 2 4 . 27 . 3 1 33 23 46 3 2 21 12 56 3 3 78 . 34 3 4 13 45 . 4 1 56 45 23 4 2 67 13 67 4 3 . 35 13 4 4 48 35 56 ; run; data eq3; input y v ; cards; 1 1 2 3 3 5 4 7 ; run; proc sql; create table m as select * from eq2 as a full join eq3 as b on a.y=b.y; quit; proc sort data=m ; by id; run; data n; format id y v x z w; set m; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to create a table same as EQ
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok, well, PROC FORMAT is likely the fastest method. You can find a comparison of the different approaches, proc format, hash, sql, data step or DS2 on lexjansen.com.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
how to do merge in proc format? could you provide the code?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@xiangpang wrote:
how to do merge in proc format? could you provide the code?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> I want to create a table same as EQ
That's not at all obvious when looking at your post.
Good questions bring good answers.
Using formats will be faster if table EQ3 is not too large, as this conserves the table order and avoids sorting the table again. Thousands of values is fine. If you have millions of values don't bother.
To avoid re-sorting, a hash table is the next choice if you have many values.
It is a matter of volume.
Otherwise, your existing code should be faster if you remove the proc sort and add order by instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks for your reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is an example of code and benchmark result.
The results will vary depending on your data and your hardware.
data F;
retain FMTNAME 'QE';
do START=1 to 1e7;
LABEL=put(START,z10.);
output;
end;
run;
proc format cntlin=F;
run;
%* real time 26.92 seconds;
data W_FMT;
do START=1 to 1e7;
LABEL=put(START,qe.);
output;
end;
run;
%* real time 10.07 seconds ;
%* total 37 seconds ;
data W_HASH;
dcl hash H(dataset:'F');
H.definekey('START');
H.definedata('LABEL');
H.definedone();
LABEL=' ';
drop RC;
do START=1 to 1e7;
RC=H.find();
output;
end;
run;
%* real time 18.39 seconds ;
proc sql;
create table W_SQL as
select a.START, b.LABEL from F a, F b where a.START=b.START order by 1;
quit;
%* real time 22.96 seconds ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply. I am happy to learn from you. Thanks again
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Relevant questions for some methods (some of which are much simpler):
Is Y always an integer?
How many different values do you have for Y?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
ID Y V is always an integer.
For Y, it has about 500 different value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perfect. Here's an easy way:
data eq;
array v_values {600} _temporary_;
if _n_=1 then do until (done);
set eq3 end=done;
v_values{y} = v;
end;
set eq2;
v=.;
if (1 <= y <= 600) then v = v_values{y};
format id y v x z w;
run;
proc sort data=eq;
by id;
run;
***** EDITED to add a small correction for safety's sake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. But I have two more questions.
Although "format id y v x z w;" is used, why the column sequence is "y v id x z w;"?
when y is character (letter+number), how will you change the code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's possible that the FORMAT statement is intended to reorder the columns. If so, getting the right order would require a small change to the program. Also notice the small correction I added.
If Y is character, a lot depends on the range of values. In some cases, a wider array could be used, with a formula to translate from letter+number to a unique integer value. But that's really what a hash table is for. An informat based on eq3 would also be possible, to be created using a CNTLIN= data set to PROC FORMAT. Sorry to say, you would need to look up some of the details there. But the DATA step would be easy:
data eq;
set eq2;
v = input(y, myform.);
run;
In this case, MYFORM. is the name of the informat that your program creates using a CNTLIN= data set to PROC FORMAT.