Hi,
Can I merge two files using indexes which is said to be quicker than sort?
As an example, if I need to merge two files based on variables sex and agegrp. The following is wrong:
PROC DATASETS LIBRARY=work;
MODIFY file01;
INDEX CREATE sex agegrp;
MODIFY file02;
INDEX CREATE sex agegrp;
RUN;
DATA work.file03;
MERGE work.file01 work.file02;
BY sex agegrp;
RUN;
No.
It means if you merge using the double set statement with KEY then you can only do an inner join.
If you want an outer join use the standard merge and BY statements. SAS will still use the index you just don't need to do anything besides creating the index for it to work
Take a look at: http://www2.sas.com/proceedings/sugi26/p104-26.pdf
It seemd that indexes can only merge those common cases existing in both datasets. I tested it as below:
DATA work.data01;
INPUT type $ group $;
CARDS;
A 1
B 2
;
RUN;
DATA work.data02;
INPUT type $ group $ price;
CARDS;
B 2 3
B 1 4
A 1 5
A 2 2
;
RUN;
PROC DATASETS LIBRARY=work;
MODIFY data02;
INDEX CREATE key=(type group) /UNIQUE;
RUN;
DATA work.data;
SET work.data01;
SET work.data02(IN=a) KEY=key /UNIQUE;
IF _IORC_ THEN DO; _ERROR_=0; trt_code=''; END;
IF a;
RUN;
In the merged data, it only has cases of A1, B2. The cases A1 and B1 are missing.
My question is: if I want to keep all cases in data02, how to do this?
Please don't post your questions multiple times...
You're merging using a key statement.
Another way is to use the merge and by statements. If you have the index on the data set SAS knows to use it if it will increase efficiency.
Run the code with the options msglevel=i set and check the log to see if the index is being used. You can create indexes on both dataset or one to increase efficiency. Depending on the size of the data you may or may not have an increase in efficiency.
You can read some more about indexes by looking at this paper by Michael Raithel . I think he also has a book on the topic.
www2.sas.com/proceedings/sugi29/123-29.pdf
options msglevel=i;
DATA work.data;
merge work.data01 (in=b)
work.data02(IN=a) ;
by type group;
if a;
RUN;
Reeza is correct. You should be using a merge statement if you want to perform a outer join. Using the set method you would be performing inner join only.
Also, you should delete your duplicate post...
THANKS for your answer.
Do you mean that the indexes created can only be used for inner join, not outer join?
No.
It means if you merge using the double set statement with KEY then you can only do an inner join.
If you want an outer join use the standard merge and BY statements. SAS will still use the index you just don't need to do anything besides creating the index for it to work
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.