Desktop productivity for business analysts and programmers

How can I find multiple SSNs with different IDs using Proc SQL?

Reply
Occasional Contributor
Posts: 6

How can I find multiple SSNs with different IDs using Proc SQL?

I am trying to pull data from a very large data warehouse using SAS EG.  I was told I need to pull the data a specific way in order to get the data correctly.  I have to use PROC SQL.

 

The Problem:  In this data warehouse there are patient pharmacy records.  We need to find the "needle in the haystack".  We need to find any and all patients that have two or more IDs.  In order to find them we will have records for a patient with the same SSN, but different IDs.  In addition, I'm a Base SAS Programmer and I'm using SAS EG version 7.00 HF5 (7.100.1.2856) (64 bit), so I'm not doing a project, just trying to run the program.

 

To test, I was only pulling one week of data, just to see if my code runs, however, since it is large, the system only wants to pull back 1M rows.  So, I figured if I can only pull back those records that have two or more IDs for the same SSN, I should have a very small dataset.  However, I'm failing.

 

Below is my code:

 

proc sql;
   create table multiple_ssns as
   select distinct t2.MemberProfile, t2.MemberSSN
   FROM 
      SQLODBC.Claim_Fact t1 INNER JOIN SQLODBC.Patient_Dimension t2
      ON (t1.PatientKey=t2.PatientKey)
   WHERE
      (t1.DateDispensedKey BETWEEN 20170501 AND 20170530)
      AND t1.claimstatus = 'Paid'
;
quit;

 

Now, before you tell me this code won't work correctly, I know.  This is the code that gives me more than 1M records.  I tried adding a Group by and Having statement, but that didn't work as expected, either (see below).

 

GROUP BY t2.MemberSSN
HAVING count(Distinct t1.MemberProfile) > 1

 

I only need a list of SSNs and IDs.  I expect to see output (in a dataset) something like this with less than 100 observations (we know there aren't very many):

MemberSSN                 MemberProfile

123-45-6789                 5555555

123-45-6789                 6666666

 

If you have any suggestions, I'd love to try them out.  P.S.  I did try to do a Proc Freq after the Proc SQL data pull, but I got an out of resources error.  So, I was advised by the other programmer here that we should do all data pulls with criteria using Proc SQL.

 

Thank you,

L.

Super Contributor
Posts: 403

Re: How can I find multiple SSNs with different IDs using Proc SQL?

I wouldn't expect either Group By or Having to restrict your input to 1M rows since Group By operates on the result set and Having operates on the result of aggregate functions so therefore also would not affect the input. You appear to be using ODBC but I think altering the first line of your code to:

 

Proc SQL inobs=1000000;

 

should still work. Only 1M rows should be read from the DB table

 

Chris

Super User
Posts: 3,240

Re: How can I find multiple SSNs with different IDs using Proc SQL?

I would try something like this:

 

proc sql;
   create table multiple_ssns as
   select t2.MemberProfile, t2.MemberSSN,
             count(*) as SSN_Count
   FROM 
      SQLODBC.Claim_Fact t1 INNER JOIN SQLODBC.Patient_Dimension t2
      ON (t1.PatientKey=t2.PatientKey)
   WHERE
      (t1.DateDispensedKey BETWEEN 20170501 AND 20170530)
      AND t1.claimstatus = 'Paid'
   GROUP BY t2.MemberProfile, t2.MemberSSN
   HAVING count(*) > 1
;
quit;
Occasional Contributor
Posts: 6

Re: How can I find multiple SSNs with different IDs using Proc SQL?

Thank you both for your suggestions to my issue.  Unfortunately, I was pulled off this project to work on something else, and before I could go back and work on it, I left the client site (contract issue), so I am unable to let you know if your suggestions would have worked or not.  I apologize for not following up sooner.

Ask a Question
Discussion stats
  • 3 replies
  • 184 views
  • 1 like
  • 3 in conversation