BookmarkSubscribeRSS Feed
devsas
Pyrite | Level 9

Good Afternoon everyone!

So, I have this claims table at line level where I need to find all those records with different claim id's where date of service, member id, provider id and line level procedure code are same so that we can identify duplicate claim id's. I'm doing a simple self join but getting too many extra records where each claim line is joining itself with all claim lines of other claim id where all those 4 parameters match- something like a cross join. I have banged my head so much but still cant find a solution to this seemingly regular problem.

Some of the line level procedure code field values are blank, not sure if that is causing the problem. Here is my code in proc sql-

Proc sql;
SELECT DISTINCT c1.clm_nbr, c1.mbr_nbr, c1.CLM_LN_PROC_CD,
c1.CLM_DT_OF_SRVC, c1.clm_rcvd_dt, c1.clm_ln_nbr, c1.clm_ln_stat_cd , c1.RVNU_CD, c1.ln_item_sbmtd_chrg_amt
 , c1.pd_amt,
 c2.clm_nbr as clm_nbr_, c2.clm_rcvd_dt as clm_rcvd_dt_,
 c2.clm_ln_nbr as clm_ln_nbr_, c2.clm_ln_stat_cd as clm_ln_stat_cd_, c2.rvnu_cd as rvnu_cd_, c2.ln_item_sbmtd_chrg_amt
 as ln_item_sbmtd_chrg_amt_, c2.pd_amt as pd_amt_
into final_
            FROM NW_ALL_3 AS c1, NW_ALL_3 AS c2

            WHERE c1.MBR_NBR = c2.MBR_NBR and left(c1.CLM_LN_PROC_CD, 5) = left(c2.CLM_LN_PROC_CD,5)
            and c1.CLM_DT_OF_SRVC = c2.CLM_DT_OF_SRVC and c1.SVCG_PRACTNR_ID = c2.SVCG_PRACTNR_ID
           
            AND left(c1.Clm_nbr, 10) < left(c2.CLM_NBR, 10) ;
quit;
5 REPLIES 5
Patrick
Opal | Level 21

@devsas

If you've got a record identifier then you could add a condition like c1.rec_id ne c2.rec_id to avoid joining a claim with itself.

 

You could also just sort your claims by the relevant variables and then use a SAS data step to identify duplicates.

data duplicates;
  set claims_sorted;
  by var1 var2 var3;
  if not (first.var3 and last.var3) then output;
run;

 

devsas
Pyrite | Level 9
Thanks Patrick. But i dont have a unique identifier, plus i need to do this in proc sql. Btw, i dont think in my query claim is joining with itself.
Patrick
Opal | Level 21

@devsas wrote:
Thanks Patrick. But i dont have a unique identifier, plus i need to do this in proc sql. Btw, i dont think in my query claim is joining with itself.

You join a table with itself so for any record being joined with itself the join conditions will always all be true. So yes, you are definitely joining the same record together which is not what you're after.

You could add a GROUP BY.... HAVING count(*)>2 to your SQL to filter such cases. But you would still get the self joined records in your result set in case there is also another record matching the condition.

 

"i need to do this in proc sql"

Why? Because your data is stored in a database? If so then using explicit SQL pass-through and taking advantage of DB specific functions could eventually speed up things significantly (i.e. use of analytics functions in Oracle).

 

BTW: What's the INTO clause doing in your SQL? That doesn't make sense to me.

devsas
Pyrite | Level 9

Thanks again, man. Actually I realized that the main problem is that one of the fields on which im joining has many empty values. So all those records which I should not be getting are there whenever that field (procedure_code) is blank. So, for instance, if two claims match on member id, dos, prov id and procedure code, it will give corresponding claim number, claim line etc etc for those records. So, it will match line by line for such cases. So far so good.

But when the procedure code is blank and lets say claim no abcd with claim lines 1,2,34 match with claim no efgh with claim lines 5,6,7,8 on other three parameters, the result will give 16 records rather than just 4 matching records-like its doing a cross join with every claim line matching with other 4 claim lines from duplicate claim. Those 12 records i dont need and i cannot do distinct because technically they are not distinct. I hope i was able to explain.

My SAS is not working and so im using my proc sql skills to solve this in sql server, hence i was requesting a solution in sql.

I tried doing various other tricks but still not getting the result.

Thanks again for your help.

Patrick
Opal | Level 21

@devsas

SQL code similar to below should also work with SQL server. The basic idea is: First create a list with all duplicate key columns and then inner join this list back to your source data.

data class;
  set sashelp.class;
  i=1;
  output;
  do i=2 by 1;
    if ceil(ranuni(1)*10)>5 then leave;
    output;
  end;
run;

proc sql;
  create table want as
  select l.*
  from 
    class as l
    inner join
    (
      select
        substr(name,1,5) as name5, sex, age, count(*) as cnt
      from class
      group by name5, sex, age
      having cnt>1
    ) as r
    on substr(l.name,1,5)=r.name5 and l.sex=r.sex and l.age=r.age
  ;
quit;

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1557 views
  • 0 likes
  • 2 in conversation