BookmarkSubscribeRSS Feed
afiqcjohari
Quartz | Level 8
How is it possible to merge without a by statement? Otherwise how would merge know what is the variable(s) to use to merge the different tables?
Kurt_Bremser
Super User

@afiqcjohari wrote:
How is it possible to merge without a by statement? Otherwise how would merge know what is the variable(s) to use to merge the different tables?

That is because the merge statement can simply put two tables side-by-side where the records have no logical relation to each other. The "by" in a data step is used for "by-group-processing", with a merge it also names the key variables.

afiqcjohari
Quartz | Level 8
It will be a nice feature to have if data merge is intelligent enough to call proc sort if 'by' statement is provided.
Kurt_Bremser
Super User

@afiqcjohari wrote:
It will be a nice feature to have if data merge is intelligent enough to call proc sort if 'by' statement is provided.

That would be nuts. Making an implicit structural change (order) to a dataset will break other programs/apps that expect a specific order.

And sorting implicitly to temporary files everytime a dataset does not have the "ordered by" attribute correctly set would destroy the performance of the data step.

 

If you want automated sorting, use proc sql.

 

 

afiqcjohari
Quartz | Level 8
It's nuts if you bind yourself to the limitation of data merge of course. Hopefully SAS can take inspiration of other modern languages for this part. I still prefer data merge for the multiple outputs it can give as oppose to proc sql. Just that it's quite ugly to see proc sort prior to data merge. What to do...
Patrick
Opal | Level 21

@afiqcjohari

 

It's not a good idea for many reasons. 

 

BUT: If you want to use by group processing without the need of pre-sorting then use the SPDS engine (will not always perform very well though).

libname test spde 'c:\temp';

data test.a;
  set sashelp.class;
run;
data test.b;
  do _i=_nobs to 1 by -1;
    set sashelp.class point=_i nobs=_nobs;
    output;
  end;
  stop;
run;

data want;
  merge test.a test.b;
  by name;
run;
Kurt_Bremser
Super User

The SPDS engine just hides the fact that it does a sort on its own when a different sorting order is requested. At least with 9.2, that sort used the same UTILLOC and logic that proc sort uses, and has the same performance; only the initial read of the dataset is speeded up by the engine.

 

I much more like it when I see what is really going on represented in the code. But that's the programmer in me, that speaks from decades of experience and is wary of the fads of "modern" languages.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1679 views
  • 5 likes
  • 7 in conversation