Help using Base SAS procedures

Proc SQL output file has missing ID variables but input files do not

Reply
Contributor
Posts: 38

Proc SQL output file has missing ID variables but input files do not

I'm combining 2 files by an ID variable.  Each of the input files has approximately 5000 records, and none of them have missing data for the ID variable.  The output file has about 7400 records, about 800 of which have missing data for the ID variable.  How can that be?

 

Proc SQL;
    Create Table &state..&state.&year._IMG8_matched_DX_IMAGED
    as Select a.MSIS_ID, a.DX, b.MSIS_ID, b.IMGPROC, a.SRVC_BGN_DT, b.SRVC_BGN_DT,
        a.SRVC_BGN_DT as DxDatenum,
        b.SRVC_BGN_DT as IMGDatenum
    from  &state..ipot&year._&state._IMGmeas8_DX as a
    RIGHT JOIN &state..ipot&year._&state._IMGmeas8_IMAGED as b
    ON a.MSIS_ID = b.MSIS_ID;
quit;

Frequent Contributor
Posts: 83

Re: Proc SQL output file has missing ID variables but input files do not

You are using a right join so the missing 800 have a record in &state..ipot&year._&state._IMGmeas8_IMAGED but do not in &state..ipot&year._&state._IMGmeas8_DX

Therefore because its SQL and a right join its keeping all of the records from the right table and only joining the variables from the left which have a MSIS_ID which is in both tables. Also you should rename MSIS_ID from one of the two tables, right now you are asking it to pull in MSIS_ID from both tables and place them in the new dataset, however you cannot have 2 variables with the same name, so to make sure that you are pulling in everything that you want have the two variables names something different, ie
a.MSIS_ID, (b.MSIS_ID) as MSIS_ID2 ...
This also explains why you have missing ID variables because since its a right join MSIS_ID is being brought in after a missing value from the left table, but SQL isn't overwriting it, so it shows up as missing.

So the moral of the story is change the variable name for one of the two MSIS_ID variables and the ID variables will be available from both tables again.

Hope this helps.
Super User
Super User
Posts: 7,420

Re: Proc SQL output file has missing ID variables but input files do not

[ Edited ]

Which MISS_ID variables are you talking about:

 as Select a.MSIS_ID, a.DX, b.MSIS_ID,

 

I will assume it is A.MSIS_ID.  The reason is your right joining the A dataset onto the B dataset, all records from B will be present, even if there is no match in A.  Consider:

A        B

2        1

3        2

          3

 

If I right join A to B, 2-2, 3-3, -1, so B's id is present, but A has no match and so becomes missing.  What you should do is either use B.MSIS_ID as that will always be present.  

 Capture.PNG

Super User
Posts: 17,912

Re: Proc SQL output file has missing ID variables but input files do not

Check your log for a warning. That should tell you something's wrong.

Your code doesn't make a lot of sense as you select variables multiple times, but don't rename them, how would you know which dataset they come from?

You can look into the coalesce function, which may help.
Ask a Question
Discussion stats
  • 3 replies
  • 289 views
  • 0 likes
  • 4 in conversation