Good Afternoon everyone!
So, I have this claims table at line level where I need to find all those records with different claim id's where date of service, member id, provider id and line level procedure code are same so that we can identify duplicate claim id's. I'm doing a simple self join but getting too many extra records where each claim line is joining itself with all claim lines of other claim id where all those 4 parameters match- something like a cross join. I have banged my head so much but still cant find a solution to this seemingly regular problem.
Some of the line level procedure code field values are blank, not sure if that is causing the problem. Here is my code in proc sql-
Proc sql;
SELECT DISTINCT c1.clm_nbr, c1.mbr_nbr, c1.CLM_LN_PROC_CD,
c1.CLM_DT_OF_SRVC, c1.clm_rcvd_dt, c1.clm_ln_nbr, c1.clm_ln_stat_cd , c1.RVNU_CD, c1.ln_item_sbmtd_chrg_amt
, c1.pd_amt,
c2.clm_nbr as clm_nbr_, c2.clm_rcvd_dt as clm_rcvd_dt_,
c2.clm_ln_nbr as clm_ln_nbr_, c2.clm_ln_stat_cd as clm_ln_stat_cd_, c2.rvnu_cd as rvnu_cd_, c2.ln_item_sbmtd_chrg_amt
as ln_item_sbmtd_chrg_amt_, c2.pd_amt as pd_amt_
into final_
FROM NW_ALL_3 AS c1, NW_ALL_3 AS c2
WHERE c1.MBR_NBR = c2.MBR_NBR and left(c1.CLM_LN_PROC_CD, 5) = left(c2.CLM_LN_PROC_CD,5)
and c1.CLM_DT_OF_SRVC = c2.CLM_DT_OF_SRVC and c1.SVCG_PRACTNR_ID = c2.SVCG_PRACTNR_ID
AND left(c1.Clm_nbr, 10) < left(c2.CLM_NBR, 10) ;
quit;
If you've got a record identifier then you could add a condition like c1.rec_id ne c2.rec_id to avoid joining a claim with itself.
You could also just sort your claims by the relevant variables and then use a SAS data step to identify duplicates.
data duplicates;
set claims_sorted;
by var1 var2 var3;
if not (first.var3 and last.var3) then output;
run;
@devsas wrote:
Thanks Patrick. But i dont have a unique identifier, plus i need to do this in proc sql. Btw, i dont think in my query claim is joining with itself.
You join a table with itself so for any record being joined with itself the join conditions will always all be true. So yes, you are definitely joining the same record together which is not what you're after.
You could add a GROUP BY.... HAVING count(*)>2 to your SQL to filter such cases. But you would still get the self joined records in your result set in case there is also another record matching the condition.
"i need to do this in proc sql"
Why? Because your data is stored in a database? If so then using explicit SQL pass-through and taking advantage of DB specific functions could eventually speed up things significantly (i.e. use of analytics functions in Oracle).
BTW: What's the INTO clause doing in your SQL? That doesn't make sense to me.
Thanks again, man. Actually I realized that the main problem is that one of the fields on which im joining has many empty values. So all those records which I should not be getting are there whenever that field (procedure_code) is blank. So, for instance, if two claims match on member id, dos, prov id and procedure code, it will give corresponding claim number, claim line etc etc for those records. So, it will match line by line for such cases. So far so good.
But when the procedure code is blank and lets say claim no abcd with claim lines 1,2,34 match with claim no efgh with claim lines 5,6,7,8 on other three parameters, the result will give 16 records rather than just 4 matching records-like its doing a cross join with every claim line matching with other 4 claim lines from duplicate claim. Those 12 records i dont need and i cannot do distinct because technically they are not distinct. I hope i was able to explain.
My SAS is not working and so im using my proc sql skills to solve this in sql server, hence i was requesting a solution in sql.
I tried doing various other tricks but still not getting the result.
Thanks again for your help.
SQL code similar to below should also work with SQL server. The basic idea is: First create a list with all duplicate key columns and then inner join this list back to your source data.
data class;
set sashelp.class;
i=1;
output;
do i=2 by 1;
if ceil(ranuni(1)*10)>5 then leave;
output;
end;
run;
proc sql;
create table want as
select l.*
from
class as l
inner join
(
select
substr(name,1,5) as name5, sex, age, count(*) as cnt
from class
group by name5, sex, age
having cnt>1
) as r
on substr(l.name,1,5)=r.name5 and l.sex=r.sex and l.age=r.age
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.