Hello community,
I am trying to do one is to many datasets merge and used the following code
DATA HB.IPADM;
MERGE SORT.ALLENRL2(IN=A KEEP=ENROLID)
IPADM.CCAEI103 (IN=B)
IPADM.CCAEI113 (IN=C)
IPADM.CCAEI123 (IN=D)
;
BY ENROLID;
IF A AND (B OR C OR D) ;
RUN;
The intent is to get all all data from B,C and D if they have a common enrolid of A.
When I run the code above I get 455 observation
When I run the merge statement individually I get different number of observations.
I ran the following code:
DATA HB.IPADM1;
MERGE SORT.ALLENRL2(IN=A KEEP=ENROLID)
IPADM.CCAEI103 (IN=B);
BY ENROLID;
IF A AND B ;
RUN;
DATA HB.IPADM2;
MERGE SORT.ALLENRL2(IN=A KEEP=ENROLID)
IPADM.CCAEI113 (IN=B);
BY ENROLID;
IF A AND B ;
RUN;
DATA HB.IPADM3;
MERGE SORT.ALLENRL2(IN=A KEEP=ENROLID)
IPADM.CCAEI123 (IN=B);
BY ENROLID;
IF A AND B ;
RUN;
After running the above three codes and combining them I get 555 observations. What am I doing incorrectly in the first code of one to many merge?
Many thanks for the help.
Use a VIEW.
data all / view=all;
set b c d ;
by enrolid;
run;
data want;
merge a(in=ina) all (in=inall);
by enrolid;
if ina and inall;
run;
If A is tiny pull the list of ENROLID values into a macro variable and use it in a WHERE statement.
proc sql noprint;
select distinct quote(trim(enrolid)) into :list separated by ' ' from A;
quit;
data want;
set b c d ;
by enrolid;
where enrolid in (&list);
run;
If A is small enough load it into a hash and use the FIND() method to check if the value is found.
If A is too large to load into a hash then create an index for A and use that in the data step to test if the id is in the A dataset.
data want;
set b c d ;
by enrolid;
set a key=enrolid;
_error_=0;
if _iorc_ then delete;
run;
What happens in your first data step is that whenever more than one of the IPADM.CCAEI sets has the same Enrolid the all of those sets get merged so would have fewer records.
Example:
data example1; input enrolid x; datalines; 1 1 2 2 ; data example2; input enrolid y; datalines; 1 11 ; data example3; input enrolid z; datalines; 1 111 ; data example4; input enrolid t; datalines; 1 1111 ; data merged; merge example1 (in=a) example2 (in=b) example3 (in=c) example4 (in=d) ; by enrolid; IF A AND (B OR C OR D) ; run;
All 3 of the "other" datasets match but only one record is in the output.
You may need to provide some small examples, such as I did above, of your existing data and what you expect the output to look like.
If by "combining" the sets you create with the separate merges used a SET statement then you could set those 3 sets together and do a single merge (likely after sorting)
This could be accomplished by using a SET statement to combine example2, 3 and 4. Then merge with example1.
However, this will fail if you have duplicates of the By variable in both example1 and the combined data.
data example1; input enrolid ; datalines; 1 2 ; data example2; input enrolid y; datalines; 1 11 1 12 3 11 ; data example3; input enrolid y; datalines; 1 111 1 222 2 411 ; data example4; input enrolid y; datalines; 1 1111 4 111 ; Data stacked; set example2 example3 example4 ; run; proc sort data=stacked; by enrolid; run; data merged; merge example1 (in=a) stacked (in=b) ; by enrolid; IF A AND B ; run;
Please post code in either a text box, opened on the forum with the </> icon that appears above the message window, or the code box opened with the "running man" icon. It helps to tell when something is code or part of a question or comment and they will not reformat pasted text like the main message window. The main message windows reformat text making any code indenting go away as well as other things that can result in not-seen characters that can impact whether code will run.
Use a VIEW.
data all / view=all;
set b c d ;
by enrolid;
run;
data want;
merge a(in=ina) all (in=inall);
by enrolid;
if ina and inall;
run;
If A is tiny pull the list of ENROLID values into a macro variable and use it in a WHERE statement.
proc sql noprint;
select distinct quote(trim(enrolid)) into :list separated by ' ' from A;
quit;
data want;
set b c d ;
by enrolid;
where enrolid in (&list);
run;
If A is small enough load it into a hash and use the FIND() method to check if the value is found.
If A is too large to load into a hash then create an index for A and use that in the data step to test if the id is in the A dataset.
data want;
set b c d ;
by enrolid;
set a key=enrolid;
_error_=0;
if _iorc_ then delete;
run;
Set before or three data steps with merge and then set (or proc append which would be faster generally). There is likely to be a Proc SQL solution but if your data sets are "too big" to combine first then likely the performance in SQL is going to be slower.
The MERGE statement cannot handle multiple input datasets where there are duplicate ENROLIDs in more than one of those input datasets.
Test each of your input datasets for duplicate ENROLIDs and if more than one has this issue then that explains the drop in observation counts.
If you are wanting to keep all duplicate ENROLIDs then you will need to switch to using SQL.
Thank you, I want to keep all the duplicate records as well. Can you please help with the proc sql code. I haven't used sql much.
> I want to keep all the duplicate records as well.
How? What output do you expect if there are duplicates in A and B but not C? Or B and C and not A?
So you want to use example1 as a lookup table to filter data out of the others. This is easily done with a hash object:
data want;
set
example2
example3
example4
;
if _n_ = 1
then do;
declare hash ex1 (dataset:"example1");
ex1.definekey("enrolid");
ex1.definedone();
end;
if ex1.check() = 0;
run;
Do you get any message in the log?
With so few observations, it should be easy to look at the observations that are duplicated.
Not knowing what you want, it's is difficult tot comment further.
Please post code using the appropriate icon.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.