DATA Step, Macro, Functions and more

How to convert multiple sort and merge into one step?

Reply
Super Contributor
Posts: 436

How to convert multiple sort and merge into one step?

I would like to optimise the following code into one piece for sorting and merging.

Proc sort data=table1;

By id;

Run;

Proc sort data= table2;

By id;

Run;

I need to convert the following merge into one step as well.

Data want1;

Merge have2(in=a) have1(in=b);

By id;

If a;

Run;

Data want2;

Merge have3(in=a) have1(in=b);

By id;

If a;

Run;

I have multiple sort and merge with the same by variables. Hence I wish to optimise into one step for sort and merge via macros, looping,...

Any inputs to my question is highly appreciated.

Super Contributor
Posts: 436

Re: How to convert multiple sort and merge into one step?

Any suggestions?

Super User
Posts: 10,041

Re: How to convert multiple sort and merge into one step?

if ID value is unique, I would consider using SQL left join . or Hash Table.

Super Contributor
Posts: 436

Re: How to convert multiple sort and merge into one step?

Already I'm doing left join via merge in my datastep.But I wonder how to this in one proc sql to handle multiple left joins? May I ask you for the example?

How about my first question on sorting?

Thanks in advance for your inputs.

Super User
Posts: 10,041

Re: How to convert multiple sort and merge into one step?

About merge multiple tables in a proc sql, just as simple as

from a left join b on ...   left join c on ..

Post some your sample data, and I believe someone is going to give you some good code.

Xia Keshan

Super User
Super User
Posts: 7,067

Re: How to convert multiple sort and merge into one step?

We probably need more information to make a useful suggestion.

You seem to be using DATA1 as a lookup table to merge onto the other data sets.

Not sure there is much utility into trying to combine the two merge steps into a single data step. There is not any I/O gains from that and much potential for confusion if DATA2 and DATA3 do not have the same data structure.

A macro solution might make writing the program easier, if it was clear what is being repeated and what varies.

For example perhaps you question is can I pass in one or more source dataset and have it merge on the fields from DATA1 onto each?

%macro test(dslist);

%do i=1 %to %sysfunc(countw(&dslist)) ;

data want&i ;

  merge %scan(&dslist,&i) (in=in1) data1 ;

  by id ;

  if in1 ;

run;

%end;

%mend test;

%test(data2 data3);

Super Contributor
Posts: 436

Re: How to convert multiple sort and merge into one step?

How about my question on sorting?

Super User
Super User
Posts: 7,067

Re: How to convert multiple sort and merge into one step?

My preference is when it makes sense is to define the data set at the outset as sorted by the proper variable.

If the issue is that you are looking to perform a lookup on a variable that is NOT the natural key variable for DATA2 or DATA3 then perhaps you want to use some other method to perform the lookup than using a merge.  For example if you are just converting an coded variable to a description then using a FORMAT is the natural way to do that.  You might not need to do anything at all as you can always attach the format at the point where you need it.

If you code the merge using PROC SQL join instead then the PROC will decide for you if it needs to perform a sort behind the scenes.

proc sql ;

  create table want1 as select * from data2 a left join data1 b on  a.id = b.id ;

quit;

Trusted Advisor
Posts: 3,215

Re: How to convert multiple sort and merge into one step?

What is the problem to solve wanting it into a single step.

Are there computer resource limitations to be solved?

Are there human brain capactity limitations?

Is there some code review quality question?

---->-- ja karman --<-----
Trusted Advisor
Posts: 3,215

Re: How to convert multiple sort and merge into one step?

sorting is one of the best optimized processes not easily to be improved. It does often a split merge internally.

The typical question with that is resources usage and sizing. With short on resources to the size (big data?) the split in several steps is overcoming internal limits.

A RDBMS using SQL can deliver ordered data. More common coding but the cost is the resource overhead of those internal steps not being optimized, moderate.

Your only question seems to be on some coding practices, not turn around processing times.  

How big is your data? How fast is your machine? Is the needed additional processing time  acceptable?

How good are you with SQL? Are you using Eguide?

With that little information how can we give a good advice/answer?

---->-- ja karman --<-----
Super Contributor
Posts: 436

Re: How to convert multiple sort and merge into one step?

Thanks for your suggestions. I've EG 5.1 and my dataset has around 50K observations.

Trusted Advisor
Posts: 3,215

Re: How to convert multiple sort and merge into one step?

The bottom line.   50k observations is not that big (rather small these days), it could be sufficient in performance/execution without any need to tune that.

  Assuming those are sas datasets and you want to design a SQL query. You have a Query builder in that Eguide tool.

Joining generating SQL is made easy that way. Result dataset getting back ordered also. There is a preview code (the Sql code) button.

Building a flow in EGuide is possible not really operational scheduling for a big company process but very documentative when combining that with code nodes. 

Your first steps above are a union and merge-join (left?)

Would it be possible to convert to Sql using that tool?

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 11 replies
  • 352 views
  • 6 likes
  • 4 in conversation