Help using Base SAS procedures

Question about SQL joins

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

Question about SQL joins

Hi All,

 

So, I have a kind of weired question that made me question my knowledge on SQL joins.

 

I have a dataset lets say HAVE, which I perform a left join on with another dataset, DATA like this:

 

proc sql;
create table WANT as
select h.*, d.x 

from have as h left join data as d

on h.id = d.id

quit;

 

I would have supposed that all of HAVE observations will be there in my WANT dataset. However, after the merge, I noticed that a few are missing. (the freq was 37 out of 20000 were missing.)

 

I redid the above code with a full join; still did not found them.

 

Then I just chose those disappearing ovservations, and tried to left join with the DATA, and I got zero.

 

Is this like left join not working perfectly? What can be causig this problem?

 

I really appreciate your helps since I am not that experienced in sql;

 

thanks;


Accepted Solutions
Solution
‎02-21-2016 10:43 AM
Frequent Contributor
Posts: 75

Re: Question about SQL joins

Posted in reply to Shayan2012

UPDATE: Ok, I think I got it. Apparently the problem was coming from the fact that I was working with the options obs = 20000. at the beginning of my code. Apparently, the result will not be in a sorted way, so when truncating to 20000 options I will not see the other observations which incudes the specific observation I am looking for.

 

View solution in original post


All Replies
Super User
Posts: 19,789

Re: Question about SQL joins

Posted in reply to Shayan2012

You don't have a WHERE on your query? 

 

Frequent Contributor
Posts: 75

Re: Question about SQL joins

not really. Just the basic above code.
Super User
Posts: 11,343

Re: Question about SQL joins

Posted in reply to Shayan2012

Run proc freq on each data set for the variable ID. Look for different counts.

 

If you get different counts for levles of ID then you have a place to look for other data issues.

Super User
Posts: 3,252

Re: Question about SQL joins

Posted in reply to Shayan2012

Is your ID variable numeric by any chance? One possible explanation for missing rows is precision problems with numeric keys, which can only handle a maximum of about 16 digits accurately.

Frequent Contributor
Posts: 75

Re: Question about SQL joins

thanks a lot,
It is numeric actually, but I not more than 12 digits. I should look into that anyways.
Valued Guide
Posts: 860

Re: Question about SQL joins

Posted in reply to Shayan2012

I would use strip() to start on those missing records.  See why they aren't coming up.  You can try upcase() as well, something along those lines is the issue.  I assume it's the way the data was entered.

Trusted Advisor
Posts: 1,117

Re: Question about SQL joins

Posted in reply to Shayan2012

Hi @Shayan2012,

 

This sounds like a really weird situation and I'm curious to know the solution.

 

So, you are really working with SAS datasets? Or are there any views involved?

 

As @SASKiwi has mentioned, numeric precision problems can lead to non-matching IDs which actually should be equal. However, this would lead to unexpected missing values of variable X (from dataset DATA), but not to a loss of observations from the left table in a left join. (At least I wouldn't know how.)

 

Sometimes things are different from what they seem.

Example 1: It's possible to obtain the following log:

593  data _null_;
594  if 0 then set have nobs=n;
595  put n;
596  stop;
597  run;

20000
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


598  proc sql;
599  create table want as
600  select h.*, d.x
601  from have as h left join data as d
602  on h.id = d.id;
NOTE: Table WORK.WANT created, with 19963 rows and 2 columns.

603  quit;

Explanation: 37 observations had been marked as deleted (by a PROC SQL DELETE statement).

 

Example 2: With (unexpected) duplicate IDs in the second dataset, the result of a left join can convey the impression that the first dataset has more observations than it actually contains. So, it may seem surprising when the number of observations drops, once the duplicates have been eliminated in the second table. (However, this could not explain the situation of zero observations in the result, which you have described.)

 

Example 3: You are telling us that your numeric keys have "not more than 12 digits." Well, provided that we are not talking about integers, but about numbers with decimal places, how would you know how many digits they have? Formatted output using default format or any standard numeric format is deceiving. Also, no less than 99.84% of the numbers with up to 4 decimal places have no exact internal representation (on Windows systems). Virtually any given formatted number can have several different internal representations in SAS. And this includes (numbers which look like) integers.

 

Suggestion: Please share PROC CONTENTS output of that dataset with 37 observations and the log showing how a left join with this dataset as the left table resulted in "0 rows."

Valued Guide
Posts: 860

Re: Question about SQL joins

Posted in reply to FreelanceReinhard

I see the suggestion about proc freq, that's good.  If it's numbers try round().  I still think it's a problem with the way the data was entered.  ID numbers should be character as you are never going to add them.

 

Mark

Solution
‎02-21-2016 10:43 AM
Frequent Contributor
Posts: 75

Re: Question about SQL joins

Posted in reply to Shayan2012

UPDATE: Ok, I think I got it. Apparently the problem was coming from the fact that I was working with the options obs = 20000. at the beginning of my code. Apparently, the result will not be in a sorted way, so when truncating to 20000 options I will not see the other observations which incudes the specific observation I am looking for.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 582 views
  • 5 likes
  • 6 in conversation