BookmarkSubscribeRSS Feed
Quartz | Level 8

I'm combining 2 files by an ID variable.  Each of the input files has approximately 5000 records, and none of them have missing data for the ID variable.  The output file has about 7400 records, about 800 of which have missing data for the ID variable.  How can that be?


Proc SQL;
    Create Table &state..&state.&year._IMG8_matched_DX_IMAGED
    as Select a.MSIS_ID, a.DX, b.MSIS_ID, b.IMGPROC, a.SRVC_BGN_DT, b.SRVC_BGN_DT,
        a.SRVC_BGN_DT as DxDatenum,
        b.SRVC_BGN_DT as IMGDatenum
    from  &state..ipot&year._&state._IMGmeas8_DX as a
    RIGHT JOIN &state..ipot&year._&state._IMGmeas8_IMAGED as b
    ON a.MSIS_ID = b.MSIS_ID;

Obsidian | Level 7
You are using a right join so the missing 800 have a record in &state..ipot&year._&state._IMGmeas8_IMAGED but do not in &state..ipot&year._&state._IMGmeas8_DX

Therefore because its SQL and a right join its keeping all of the records from the right table and only joining the variables from the left which have a MSIS_ID which is in both tables. Also you should rename MSIS_ID from one of the two tables, right now you are asking it to pull in MSIS_ID from both tables and place them in the new dataset, however you cannot have 2 variables with the same name, so to make sure that you are pulling in everything that you want have the two variables names something different, ie
a.MSIS_ID, (b.MSIS_ID) as MSIS_ID2 ...
This also explains why you have missing ID variables because since its a right join MSIS_ID is being brought in after a missing value from the left table, but SQL isn't overwriting it, so it shows up as missing.

So the moral of the story is change the variable name for one of the two MSIS_ID variables and the ID variables will be available from both tables again.

Hope this helps.
Diamond | Level 26 RW9
Diamond | Level 26

Which MISS_ID variables are you talking about:

 as Select a.MSIS_ID, a.DX, b.MSIS_ID,


I will assume it is A.MSIS_ID.  The reason is your right joining the A dataset onto the B dataset, all records from B will be present, even if there is no match in A.  Consider:

A        B

2        1

3        2



If I right join A to B, 2-2, 3-3, -1, so B's id is present, but A has no match and so becomes missing.  What you should do is either use B.MSIS_ID as that will always be present.  


Super User
Check your log for a warning. That should tell you something's wrong.

Your code doesn't make a lot of sense as you select variables multiple times, but don't rename them, how would you know which dataset they come from?

You can look into the coalesce function, which may help.



Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.


Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer. 


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4 in conversation