Help using Base SAS procedures

Deleting first few duplicate observations within each subject

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Deleting first few duplicate observations within each subject

Dear all,

I have a longitudinal dataset in which each patient is recorded with multiple dates. Each date is counted as a separate row. Note that a patient could have multiple entries for the same date. The data layout is as follow:

ID   Date   Var1   Var2

1    2002    4        5

1    2002    5        7

1    2003    7        9

2    2001    4        8

2    2002    3        8

Ideally, I would like to delete observations from the first recored date for each patient, so the output data would look like:

ID   Date   Var1   Var2

1     2003   7       9

2     2002   3       8

I have tried proc sort with nodupkey as well as proc sql distinct but none of them gave me the output i want. I was also trying to come up with an indicator to code for the first date within each patient but did not succeed. I'd really appreciate if anyone could help!


Accepted Solutions
Solution
‎08-05-2015 08:17 AM
Super User
Posts: 9,671

Re: Deleting first few duplicate observations within each subject

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
set have;
by id date notsorted;
if first.id then n=0;
n+first.date;
if n ne 1;
drop n;
run;

View solution in original post


All Replies
Regular Contributor
Posts: 156

Re: Deleting first few duplicate observations within each subject

See PROC SORT examples especially with the option NODUPKEY.

Contributor
Posts: 65

Re: Deleting first few duplicate observations within each subject

proc sort data = have;

by id date;

run;

data want;

set have;

if last.id;

by id;

run;

Valued Guide
Posts: 765

Re: Deleting first few duplicate observations within each subject

Hi.  I think that the only problem with this is if there are multiple occurrences of dates after the first date, for example ...

data have;

input ID   Date   Var1   Var2 @@;

cards;

1 2002 4 5 1 2002 5 7 1 2003 7 9 1 2003 9 9

2 2001 4 8 2 2002 3 8 2 2002 9 9

;

data want;

set have;

by id;

if last.id;

run;

WANT would be ...

Obs    ID    Date    Var1    Var2

1      1    2003      9       9

2      2    2002      9       9

and miss one "not first date" for each ID.  How about ...

data want (drop=first_date);

set have;

by id;

first_date + first.id*date;

if date ne first_date then output;

if last.id then call missing(first_date);

run;

WANT would be ...

Obs    ID    Date    Var1    Var2

1      1    2003      7       9

2      1    2003      9       9

3      2    2002      3       8

4      2    2002      9       9

(same answer you'd get with Xia's data step answer or my PROC SQL answer)

Solution
‎08-05-2015 08:17 AM
Super User
Posts: 9,671

Re: Deleting first few duplicate observations within each subject

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
set have;
by id date notsorted;
if first.id then n=0;
n+first.date;
if n ne 1;
drop n;
run;
New Contributor
Posts: 2

Re: Deleting first few duplicate observations within each subject

Thank you so much. This is exactly what I was looking for.

Valued Guide
Posts: 765

Re: Deleting first few duplicate observations within each subject

Hi.  You tried SQL.  Did you try this ...

proc sql;

create table want as select * from have group by id having date ^= min(date);

quit;

Respected Advisor
Posts: 3,124

Re: Deleting first few duplicate observations within each subject

Mike, I suppose you want GT (>) instead of ^=, just in case some dates are missing.

Valued Guide
Posts: 765

Re: Deleting first few duplicate observations within each subject

Sure, thanks.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 471 views
  • 7 likes
  • 6 in conversation