11-15-2014 02:08 AM
I would like to optimise the following code into one piece for sorting and merging.
Proc sort data=table1;
Proc sort data= table2;
I need to convert the following merge into one step as well.
Merge have2(in=a) have1(in=b);
Merge have3(in=a) have1(in=b);
I have multiple sort and merge with the same by variables. Hence I wish to optimise into one step for sort and merge via macros, looping,...
Any inputs to my question is highly appreciated.
11-15-2014 06:05 AM
Already I'm doing left join via merge in my datastep.But I wonder how to this in one proc sql to handle multiple left joins? May I ask you for the example?
How about my first question on sorting?
Thanks in advance for your inputs.
11-15-2014 06:33 AM
About merge multiple tables in a proc sql, just as simple as
from a left join b on ... left join c on ..
Post some your sample data, and I believe someone is going to give you some good code.
11-15-2014 09:56 AM
We probably need more information to make a useful suggestion.
You seem to be using DATA1 as a lookup table to merge onto the other data sets.
Not sure there is much utility into trying to combine the two merge steps into a single data step. There is not any I/O gains from that and much potential for confusion if DATA2 and DATA3 do not have the same data structure.
A macro solution might make writing the program easier, if it was clear what is being repeated and what varies.
For example perhaps you question is can I pass in one or more source dataset and have it merge on the fields from DATA1 onto each?
%do i=1 %to %sysfunc(countw(&dslist)) ;
data want&i ;
merge %scan(&dslist,&i) (in=in1) data1 ;
by id ;
if in1 ;
11-15-2014 11:53 AM
My preference is when it makes sense is to define the data set at the outset as sorted by the proper variable.
If the issue is that you are looking to perform a lookup on a variable that is NOT the natural key variable for DATA2 or DATA3 then perhaps you want to use some other method to perform the lookup than using a merge. For example if you are just converting an coded variable to a description then using a FORMAT is the natural way to do that. You might not need to do anything at all as you can always attach the format at the point where you need it.
If you code the merge using PROC SQL join instead then the PROC will decide for you if it needs to perform a sort behind the scenes.
proc sql ;
create table want1 as select * from data2 a left join data1 b on a.id = b.id ;
11-15-2014 10:17 AM
What is the problem to solve wanting it into a single step.
Are there computer resource limitations to be solved?
Are there human brain capactity limitations?
Is there some code review quality question?
11-15-2014 12:01 PM
sorting is one of the best optimized processes not easily to be improved. It does often a split merge internally.
The typical question with that is resources usage and sizing. With short on resources to the size (big data?) the split in several steps is overcoming internal limits.
A RDBMS using SQL can deliver ordered data. More common coding but the cost is the resource overhead of those internal steps not being optimized, moderate.
Your only question seems to be on some coding practices, not turn around processing times.
How big is your data? How fast is your machine? Is the needed additional processing time acceptable?
How good are you with SQL? Are you using Eguide?
With that little information how can we give a good advice/answer?
11-15-2014 01:54 PM
The bottom line. 50k observations is not that big (rather small these days), it could be sufficient in performance/execution without any need to tune that.
Assuming those are sas datasets and you want to design a SQL query. You have a Query builder in that Eguide tool.
Joining generating SQL is made easy that way. Result dataset getting back ordered also. There is a preview code (the Sql code) button.
Building a flow in EGuide is possible not really operational scheduling for a big company process but very documentative when combining that with code nodes.
Your first steps above are a union and merge-join (left?)
Would it be possible to convert to Sql using that tool?