BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pinkyc
Calcite | Level 5


Input data named pat_detail

Pat_ID     Dis_Date     Linker               Pat_Age     Zip_code

A11          03/28/12     A11 03/28/12     39               000

A11          03/28/12     A11 03/28/12     39               000    

A12          04/19/13     A12 04/19/13     43               111

A13          05/20/14     A13 05/20/14     44               222

I have a table with the following variables and several more observations but some of them are duplicates as shown in example for Pat_ID A11.  I did this to check for the Pat_ID's that had duplicates and if so, how many.

proc sql;

create table duplicate as

select distinct PAT_ID, dis_date, linker, count(*) as count

from pat_detail

group by recip_id, discharge_date

Having count>=2;

quit;

duplicate table data:

Pat_ID    Dis_Date     Linker          Count

A11       03/28/12     A11 03/28/12     2

How do I delete the observations in Pat_detail that are duplicates based on checking Pat_ID, dis_date and linker?

1 ACCEPTED SOLUTION

Accepted Solutions
MaikH_Schutze
Quartz | Level 8

Hi,

The NODUPKEY option in PROC SORT should do the trick. For example:

proc sort data=pat_detail out=pat_detail_new nodupkey;

by pat_ID dis_date linker;

run;

The OUT= option creates a new data set with only the unique combinations of the BY variables.

Hope this helps,

M.

View solution in original post

5 REPLIES 5
stat_sas
Ammonite | Level 13

proc sql;

select distinct * from pat_detail;

quit;

MaikH_Schutze
Quartz | Level 8

Hi,

The NODUPKEY option in PROC SORT should do the trick. For example:

proc sort data=pat_detail out=pat_detail_new nodupkey;

by pat_ID dis_date linker;

run;

The OUT= option creates a new data set with only the unique combinations of the BY variables.

Hope this helps,

M.

Patrick
Opal | Level 21

What if you have for some data quality issues duplicate keys for "Pat_ID, dis_date and linker" but then also different values in these records for "Pat_Age"? To be on the safe side I would only remove duplicate records (all variables having the same value).

data have;

  infile datalines truncover;

  input Pat_ID $ Dis_Date:mmddyy. Linker $ Pat_Age:mmddyy. Zip_code $;

  format Dis_Date date9.;

  datalines;

A11 03/28/12 A11 03/28/12 39 000

A11 03/28/12 A11 03/28/12 39 000

A11 03/28/12 A11 03/28/12 49 000

A12 04/19/13 A12 04/19/13 43 111

A13 05/20/14 A13 05/20/14 44 222

;

run;

/* remove duplicates */

proc sort data=have out=want dupout=duplicates nodupkey;

  by _all_;

run;

/* DQ issues */

proc sql;

  create table DQ_issue as

    select * from want

    group by Pat_ID, dis_date, linker

    having count(*)>=2

  ;

quit;

Loko
Barite | Level 11

Hello,

Depending on your interest you may choose the appropriate calculation for the variables not included in the key.

data have;

input Pat_ID   $  Dis_Date $     Linker    & $12.     Pat_Age     Zip_code ;

datalines;

A11          03/28/12     A11 03/28/12     39               000

A11          03/28/12     A11 03/28/12     43               000   

A11          03/28/12     A11 03/28/12     44               001

A12          04/19/13     A12 04/19/13     43               111

A13          05/20/14     A13 05/20/14     44               222

;

proc sql;

select pat_ID, Dis_Date,  Linker, max(Pat_Age) , count(Zip_code)

from have

group by pat_ID, Dis_Date,  Linker

;

quit;

Tanmay
Calcite | Level 5

proc sort data=pat_detail out=pat_detail_new nodupkey;

by pat_ID dis_date linker;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 11652 views
  • 3 likes
  • 6 in conversation