Solved: Re: Deleting first few duplicate observations within each subject

heavensong · Posted 08-05-2015 12:27 AM

Dear all,

I have a longitudinal dataset in which each patient is recorded with multiple dates. Each date is counted as a separate row. Note that a patient could have multiple entries for the same date. The data layout is as follow:

ID Date Var1 Var2

1 2002 4 5

1 2002 5 7

1 2003 7 9

2 2001 4 8

2 2002 3 8

Ideally, I would like to delete observations from the first recored date for each patient, so the output data would look like:

ID Date Var1 Var2

1 2003 7 9

2 2002 3 8

I have tried proc sort with nodupkey as well as proc sql distinct but none of them gave me the output i want. I was also trying to come up with an indicator to code for the first date within each patient but did not succeed. I'd really appreciate if anyone could help!

Ksharp · Posted 08-05-2015 08:17 AM

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
 set have;
 by id date notsorted;
 if first.id then n=0;
 n+first.date;
 if n ne 1;
 drop n;
run;

View solution in original post

PhilC · Posted 08-05-2015 08:07 AM

See PROC SORT examples especially with the option NODUPKEY.

MadhuKorni · Posted 08-05-2015 08:14 AM

proc sort data = have;

by id date;

run;

data want;

set have;

if last.id;

by id;

run;

MikeZdeb · Posted 08-10-2015 06:59 PM

Hi. I think that the only problem with this is if there are multiple occurrences of dates after the first date, for example ...

data have;

input ID Date Var1 Var2 @@;

cards;

1 2002 4 5 1 2002 5 7 1 2003 7 9 1 2003 9 9

2 2001 4 8 2 2002 3 8 2 2002 9 9

;

data want;

set have;

by id;

if last.id;

run;

WANT would be ...

Obs ID Date Var1 Var2

1 1 2003 9 9

2 2 2002 9 9

and miss one "not first date" for each ID. How about ...

data want (drop=first_date);

set have;

by id;

first_date + first.id*date;

if date ne first_date then output;

if last.id then call missing(first_date);

run;

WANT would be ...

Obs ID Date Var1 Var2

1 1 2003 7 9

2 1 2003 9 9

3 2 2002 3 8

4 2 2002 9 9

(same answer you'd get with Xia's data step answer or my PROC SQL answer)

Ksharp · Posted 08-05-2015 08:17 AM

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
 set have;
 by id date notsorted;
 if first.id then n=0;
 n+first.date;
 if n ne 1;
 drop n;
run;

heavensong · Posted 08-09-2015 10:43 PM

Thank you so much. This is exactly what I was looking for.

MikeZdeb · Posted 08-10-2015 06:09 PM

Hi. You tried SQL. Did you try this ...

proc sql;

create table want as select * from have group by id having date ^= min(date);

quit;

Haikuo · Posted 08-10-2015 10:24 PM

Mike, I suppose you want GT (>) instead of ^=, just in case some dates are missing.

MikeZdeb · Posted 08-10-2015 11:05 PM

Sure, thanks.

Code: Program

Code: Program

Registration is open

SAS Training: Just a Click Away