BookmarkSubscribeRSS Feed
Babloo
Rhodochrosite | Level 12

I would like to optimise the following code into one piece for sorting and merging.

Proc sort data=table1;

By id;

Run;

Proc sort data= table2;

By id;

Run;

I need to convert the following merge into one step as well.

Data want1;

Merge have2(in=a) have1(in=b);

By id;

If a;

Run;

Data want2;

Merge have3(in=a) have1(in=b);

By id;

If a;

Run;

I have multiple sort and merge with the same by variables. Hence I wish to optimise into one step for sort and merge via macros, looping,...

Any inputs to my question is highly appreciated.

11 REPLIES 11
Babloo
Rhodochrosite | Level 12

Any suggestions?

Ksharp
Super User

if ID value is unique, I would consider using SQL left join . or Hash Table.

Babloo
Rhodochrosite | Level 12

Already I'm doing left join via merge in my datastep.But I wonder how to this in one proc sql to handle multiple left joins? May I ask you for the example?

How about my first question on sorting?

Thanks in advance for your inputs.

Ksharp
Super User

About merge multiple tables in a proc sql, just as simple as

from a left join b on ...   left join c on ..

Post some your sample data, and I believe someone is going to give you some good code.

Xia Keshan

Tom
Super User Tom
Super User

We probably need more information to make a useful suggestion.

You seem to be using DATA1 as a lookup table to merge onto the other data sets.

Not sure there is much utility into trying to combine the two merge steps into a single data step. There is not any I/O gains from that and much potential for confusion if DATA2 and DATA3 do not have the same data structure.

A macro solution might make writing the program easier, if it was clear what is being repeated and what varies.

For example perhaps you question is can I pass in one or more source dataset and have it merge on the fields from DATA1 onto each?

%macro test(dslist);

%do i=1 %to %sysfunc(countw(&dslist)) ;

data want&i ;

  merge %scan(&dslist,&i) (in=in1) data1 ;

  by id ;

  if in1 ;

run;

%end;

%mend test;

%test(data2 data3);

Babloo
Rhodochrosite | Level 12

How about my question on sorting?

Tom
Super User Tom
Super User

My preference is when it makes sense is to define the data set at the outset as sorted by the proper variable.

If the issue is that you are looking to perform a lookup on a variable that is NOT the natural key variable for DATA2 or DATA3 then perhaps you want to use some other method to perform the lookup than using a merge.  For example if you are just converting an coded variable to a description then using a FORMAT is the natural way to do that.  You might not need to do anything at all as you can always attach the format at the point where you need it.

If you code the merge using PROC SQL join instead then the PROC will decide for you if it needs to perform a sort behind the scenes.

proc sql ;

  create table want1 as select * from data2 a left join data1 b on  a.id = b.id ;

quit;

jakarman
Barite | Level 11

What is the problem to solve wanting it into a single step.

Are there computer resource limitations to be solved?

Are there human brain capactity limitations?

Is there some code review quality question?

---->-- ja karman --<-----
jakarman
Barite | Level 11

sorting is one of the best optimized processes not easily to be improved. It does often a split merge internally.

The typical question with that is resources usage and sizing. With short on resources to the size (big data?) the split in several steps is overcoming internal limits.

A RDBMS using SQL can deliver ordered data. More common coding but the cost is the resource overhead of those internal steps not being optimized, moderate.

Your only question seems to be on some coding practices, not turn around processing times.  

How big is your data? How fast is your machine? Is the needed additional processing time  acceptable?

How good are you with SQL? Are you using Eguide?

With that little information how can we give a good advice/answer?

---->-- ja karman --<-----
Babloo
Rhodochrosite | Level 12

Thanks for your suggestions. I've EG 5.1 and my dataset has around 50K observations.

jakarman
Barite | Level 11

The bottom line.   50k observations is not that big (rather small these days), it could be sufficient in performance/execution without any need to tune that.

  Assuming those are sas datasets and you want to design a SQL query. You have a Query builder in that Eguide tool.

Joining generating SQL is made easy that way. Result dataset getting back ordered also. There is a preview code (the Sql code) button.

Building a flow in EGuide is possible not really operational scheduling for a big company process but very documentative when combining that with code nodes. 

Your first steps above are a union and merge-join (left?)

Would it be possible to convert to Sql using that tool?

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1425 views
  • 6 likes
  • 4 in conversation