Help using Base SAS procedures

Merge two datasets using index

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 131
Accepted Solution

Merge two datasets using index

Hi,

Can I merge two files using indexes which is said to be quicker than sort?

As an example, if I need to merge two files based on variables sex and agegrp. The following is wrong:

PROC DATASETS LIBRARY=work;

MODIFY file01;

INDEX CREATE sex agegrp;

MODIFY file02;

INDEX CREATE sex agegrp;

RUN;

DATA work.file03;

     MERGE work.file01 work.file02;

     BY sex agegrp;

RUN;


Accepted Solutions
Solution
‎12-13-2011 01:29 PM
Super User
Posts: 19,815

Merge two datasets using index

No.

It means if you merge using the double set statement with KEY then you can only do an inner join.

If you want an outer join use the standard merge and BY statements. SAS will still use the index you just don't need to do anything besides creating the index for it to work

View solution in original post


All Replies
PROC Star
Posts: 7,474

Merge two datasets using index

Frequent Contributor
Posts: 131

Merge two datasets using index

It seemd that indexes can only merge those common cases existing in both datasets. I tested it as below:

DATA work.data01;
  INPUT type $ group $;
  CARDS;
  A 1
  B 2

  ;
RUN;

DATA work.data02;
  INPUT type $ group $ price;
  CARDS;
  B 2 3
  B 1 4
  A 1 5
  A 2 2
  ;
RUN;

PROC DATASETS LIBRARY=work;
  MODIFY data02;
  INDEX CREATE key=(type group) /UNIQUE;
RUN;

DATA work.data;
  SET work.data01;
  SET work.data02(IN=a) KEY=key /UNIQUE;
  IF _IORC_ THEN DO; _ERROR_=0; trt_code=''; END;
  IF a;
RUN;

In the merged data, it only has cases of A1, B2. The cases A1 and B1 are missing.

My question is: if I want to keep all cases in data02, how to do this?


Super User
Posts: 19,815

Merge two datasets using index

Please don't post your questions multiple times...

You're merging using a key statement.

Another way is to use the merge and by statements. If you have the index on the data set SAS knows to use it if it will increase efficiency.

Run the code with the options msglevel=i set and check the log to see if the index is being used. You can create indexes on both dataset or one to increase efficiency. Depending on the size of the data you may or may not have an increase in efficiency.

You can read some more about indexes by looking at this paper by Michael Raithel . I think he also has a book on the topic.

www2.sas.com/proceedings/sugi29/123-29.pdf

options msglevel=i;

DATA work.data;

merge work.data01 (in=b)

work.data02(IN=a) ;

by type group;

if a;

RUN;

Trusted Advisor
Posts: 1,301

Merge two datasets using index

Reeza is correct.  You should be using a merge statement if you want to perform a outer join.  Using the set method you would be performing inner join only.

Also, you should delete your duplicate post...

Frequent Contributor
Posts: 131

Merge two datasets using index

THANKS for your answer.

Do you mean that the indexes created can only be used for inner join, not outer join?

Solution
‎12-13-2011 01:29 PM
Super User
Posts: 19,815

Merge two datasets using index

No.

It means if you merge using the double set statement with KEY then you can only do an inner join.

If you want an outer join use the standard merge and BY statements. SAS will still use the index you just don't need to do anything besides creating the index for it to work

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 192 views
  • 6 likes
  • 4 in conversation