BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bncoxuk
Obsidian | Level 7

Hi,

Can I merge two files using indexes which is said to be quicker than sort?

As an example, if I need to merge two files based on variables sex and agegrp. The following is wrong:

PROC DATASETS LIBRARY=work;

MODIFY file01;

INDEX CREATE sex agegrp;

MODIFY file02;

INDEX CREATE sex agegrp;

RUN;

DATA work.file03;

     MERGE work.file01 work.file02;

     BY sex agegrp;

RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

No.

It means if you merge using the double set statement with KEY then you can only do an inner join.

If you want an outer join use the standard merge and BY statements. SAS will still use the index you just don't need to do anything besides creating the index for it to work

View solution in original post

6 REPLIES 6
bncoxuk
Obsidian | Level 7

It seemd that indexes can only merge those common cases existing in both datasets. I tested it as below:

DATA work.data01;
  INPUT type $ group $;
  CARDS;
  A 1
  B 2

  ;
RUN;

DATA work.data02;
  INPUT type $ group $ price;
  CARDS;
  B 2 3
  B 1 4
  A 1 5
  A 2 2
  ;
RUN;

PROC DATASETS LIBRARY=work;
  MODIFY data02;
  INDEX CREATE key=(type group) /UNIQUE;
RUN;

DATA work.data;
  SET work.data01;
  SET work.data02(IN=a) KEY=key /UNIQUE;
  IF _IORC_ THEN DO; _ERROR_=0; trt_code=''; END;
  IF a;
RUN;

In the merged data, it only has cases of A1, B2. The cases A1 and B1 are missing.

My question is: if I want to keep all cases in data02, how to do this?


Reeza
Super User

Please don't post your questions multiple times...

You're merging using a key statement.

Another way is to use the merge and by statements. If you have the index on the data set SAS knows to use it if it will increase efficiency.

Run the code with the options msglevel=i set and check the log to see if the index is being used. You can create indexes on both dataset or one to increase efficiency. Depending on the size of the data you may or may not have an increase in efficiency.

You can read some more about indexes by looking at this paper by Michael Raithel . I think he also has a book on the topic.

www2.sas.com/proceedings/sugi29/123-29.pdf

options msglevel=i;

DATA work.data;

merge work.data01 (in=b)

work.data02(IN=a) ;

by type group;

if a;

RUN;

FriedEgg
SAS Employee

Reeza is correct.  You should be using a merge statement if you want to perform a outer join.  Using the set method you would be performing inner join only.

Also, you should delete your duplicate post...

bncoxuk
Obsidian | Level 7

THANKS for your answer.

Do you mean that the indexes created can only be used for inner join, not outer join?

Reeza
Super User

No.

It means if you merge using the double set statement with KEY then you can only do an inner join.

If you want an outer join use the standard merge and BY statements. SAS will still use the index you just don't need to do anything besides creating the index for it to work

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1273 views
  • 6 likes
  • 4 in conversation