How to remove duplicate and Overlapping Observations in Proc Sql

Reply
Contributor
Posts: 32

How to remove duplicate and Overlapping Observations in Proc Sql

    Hi,

Could anyone help me how to remove duplicate and Overlapping Observations in Proc Sql

eg. i have dataset which is given below in that i want remove duplicate of aedcod and also which contains Overlapping.

subjidaetrmaedecodaestdtaeendt
1001feverFEVER12-Aug-1315-Aug-13
1001feverFEVER13-Aug-1314-Aug-13
1001headacheHEADACHE11-Sep-1319-Sep-13
1001feverFEVER11-Aug-1312-Aug-13
1001vomitVOMIT11-Aug-1312-Aug-13
1001headacheHEADACHE9-Sep-1310-Sep-13
1001sever headacheHEADACHE16-Sep-1318-Sep-13
1002body painBODY PAIN12-Aug-1315-Aug-13
1002body painBODY PAIN13-Aug-1314-Aug-13
1002headacheHEADACHE11-Sep-1319-Sep-13
1002body painBODY PAIN11-Aug-1312-Aug-13
1002vomitVOMIT11-Aug-1312-Aug-13
1002headacheHEADACHE9-Sep-1310-Sep-13
1002sever headacheHEADACHE16-Sep-1318-Sep-13
Esteemed Advisor
Esteemed Advisor
Posts: 7,223

Re: How to remove duplicate and Overlapping Observations in Proc Sql

Hi,

Well, dups is easy so:

proc sort data=ae nodupkey;

     by subjid aetrm aedecod aestdt aeendt;

run;

For overlaps:

data ae;

     set ae;

     if lag(subjid)=subjid and lag(aetrm)=aetrm and lag(aedecod)=aedecod then do;

          if (lag(aestdt) <= aestdt and lag(aeendt) > aestdt) or

               (lag(aestdt) > aestdt and lag(aestdt) < aeendt and lag(aeendt) > aeednt) then delete;

     end;

run;

Note with the above, you may also need to check partial dates as well.  I haven't tested the code above, so might missed a logic in the if, so do test it, but it gives the idea.

Grand Advisor
Posts: 9,582

Re: How to remove duplicate and Overlapping Observations in Proc Sql

Adverse Event.


data have;
infile cards expandtabs truncover;
input subjid     aetrm & $20. aedecod & $20. aestdt : date11.     aeendt : date11.;
format      aestdt      aeendt date11.;
cards;
1001     fever     FEVER     12-Aug-13     15-Aug-13
1001     fever     FEVER     13-Aug-13     14-Aug-13
1001     headache     HEADACHE     11-Sep-13     19-Sep-13
1001     fever     FEVER     11-Aug-13     12-Aug-13
1001     vomit     VOMIT     11-Aug-13     12-Aug-13
1001     headache     HEADACHE     9-Sep-13     10-Sep-13
1001     sever headache     HEADACHE     16-Sep-13     18-Sep-13
1002     body pain     BODY PAIN     12-Aug-13     15-Aug-13
1002     body pain     BODY PAIN     13-Aug-13     14-Aug-13
1002     headache     HEADACHE     11-Sep-13     19-Sep-13
1002     body pain     BODY PAIN     11-Aug-13     12-Aug-13
1002     vomit     VOMIT     11-Aug-13     12-Aug-13
1002     headache     HEADACHE     9-Sep-13     10-Sep-13
1002     sever headache     HEADACHE     16-Sep-13     18-Sep-13
;
run;
data temp;
 set have;
 do date=aestdt to aeendt;
  output;
 end;
format date date9.;
drop aestdt aeendt;
run;
proc sort data=temp nodupkey;by subjid     aetrm aedecod date;run;
data temp;
 set temp;
 if aedecod ne lag(aedecod) or dif(date) gt 1 then group+1;
run;
data want;
 set temp;
 by group;
 retain aestdt .;
 if first.group then aestdt=date;
 if last.group then do;aeendt=date;output;end;
 format aestdt aeendt date9.;
 drop date group;
run;

Xia Keshan

Occasional Contributor Pra
Occasional Contributor
Posts: 6

Re: How to remove duplicate and Overlapping Observations in Proc Sql

PROC SQL;

CREATE TABLE AE AS

SELECT DISTINCT *,COUNT(AEDECOD) AS CNT FROM AE_I

GROUP BY SUBJID,AEDECOD

ORDER BY SUBJID,AEDECOD, AESTDT, AEENDT;

;QUIT;

DATA FINAL;

SET AE;

BY SUBJID AEDECOD;

DT=LAG(AEENDT);

IF FIRST.AEDECOD THEN DT =.;

ELSE DO;

DIF =(AESTDT - DT);

IF (AESTDT - DT) <0 THEN DELETE;

END;

FORMAT DT DATE9.;

DROP DT DIF CNT;

RUN;

Contributor
Posts: 32

Re: How to remove duplicate and Overlapping Observations in Proc Sql

No i want to remove overlapping observation by Proc Sql also

Occasional Contributor Pra
Occasional Contributor
Posts: 6

Re: How to remove duplicate and Overlapping Observations in Proc Sql

Sushil,

Hope this helps

PROC SQL;

CREATE TABLE AE AS

SELECT DISTINCT *,COUNT(AEDECOD) AS CNT FROM COM

GROUP BY SUBJID,AEDECOD

ORDER BY SUBJID,AEDECOD, AESTDT, AEENDT;

;QUIT;

PROC SQL;

CREATE TABLE FINAL1 AS

SELECT  SUBJID,AEDECOD,AESTDT,AEENDT

FROM (

SELECT A.*,B.AESTDT AS STDT,B.AEENDT AS ENDT,

(B.AESTDT - A.AEENDT) AS DIF

FROM

AE AS A,

AE AS B

WHERE A.SUBJID = B.SUBJID

AND A.AEDECOD = B.AEDECOD

AND A.AESTDT < B.AESTDT)

WHERE  (B.AESTDT- A.AEENDT) GE 0

UNION ALL

SELECT SUBJID,AEDECOD,AESTDT,AEENDT FROM

AE

GROUP BY SUBJID,AEDECOD

HAVING COUNT(SUBJID) =1

;QUIT;

Ask a Question
Discussion stats
  • 5 replies
  • 976 views
  • 0 likes
  • 4 in conversation