BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
heavensong
Calcite | Level 5

Dear all,

I have a longitudinal dataset in which each patient is recorded with multiple dates. Each date is counted as a separate row. Note that a patient could have multiple entries for the same date. The data layout is as follow:

ID   Date   Var1   Var2

1    2002    4        5

1    2002    5        7

1    2003    7        9

2    2001    4        8

2    2002    3        8

Ideally, I would like to delete observations from the first recored date for each patient, so the output data would look like:

ID   Date   Var1   Var2

1     2003   7       9

2     2002   3       8

I have tried proc sort with nodupkey as well as proc sql distinct but none of them gave me the output i want. I was also trying to come up with an indicator to code for the first date within each patient but did not succeed. I'd really appreciate if anyone could help!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
set have;
by id date notsorted;
if first.id then n=0;
n+first.date;
if n ne 1;
drop n;
run;

View solution in original post

8 REPLIES 8
PhilC
Rhodochrosite | Level 12

See PROC SORT examples especially with the option NODUPKEY.

MadhuKorni
Quartz | Level 8

proc sort data = have;

by id date;

run;

data want;

set have;

if last.id;

by id;

run;

MikeZdeb
Rhodochrosite | Level 12

Hi.  I think that the only problem with this is if there are multiple occurrences of dates after the first date, for example ...

data have;

input ID   Date   Var1   Var2 @@;

cards;

1 2002 4 5 1 2002 5 7 1 2003 7 9 1 2003 9 9

2 2001 4 8 2 2002 3 8 2 2002 9 9

;

data want;

set have;

by id;

if last.id;

run;

WANT would be ...

Obs    ID    Date    Var1    Var2

1      1    2003      9       9

2      2    2002      9       9

and miss one "not first date" for each ID.  How about ...

data want (drop=first_date);

set have;

by id;

first_date + first.id*date;

if date ne first_date then output;

if last.id then call missing(first_date);

run;

WANT would be ...

Obs    ID    Date    Var1    Var2

1      1    2003      7       9

2      1    2003      9       9

3      2    2002      3       8

4      2    2002      9       9

(same answer you'd get with Xia's data step answer or my PROC SQL answer)

Ksharp
Super User

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
set have;
by id date notsorted;
if first.id then n=0;
n+first.date;
if n ne 1;
drop n;
run;
heavensong
Calcite | Level 5

Thank you so much. This is exactly what I was looking for.

MikeZdeb
Rhodochrosite | Level 12

Hi.  You tried SQL.  Did you try this ...

proc sql;

create table want as select * from have group by id having date ^= min(date);

quit;

Haikuo
Onyx | Level 15

Mike, I suppose you want GT (>) instead of ^=, just in case some dates are missing.

MikeZdeb
Rhodochrosite | Level 12

Sure, thanks.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2604 views
  • 7 likes
  • 6 in conversation