BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
heavensong
Calcite | Level 5

Dear all,

I have a longitudinal dataset in which each patient is recorded with multiple dates. Each date is counted as a separate row. Note that a patient could have multiple entries for the same date. The data layout is as follow:

ID   Date   Var1   Var2

1    2002    4        5

1    2002    5        7

1    2003    7        9

2    2001    4        8

2    2002    3        8

Ideally, I would like to delete observations from the first recored date for each patient, so the output data would look like:

ID   Date   Var1   Var2

1     2003   7       9

2     2002   3       8

I have tried proc sort with nodupkey as well as proc sql distinct but none of them gave me the output i want. I was also trying to come up with an indicator to code for the first date within each patient but did not succeed. I'd really appreciate if anyone could help!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
set have;
by id date notsorted;
if first.id then n=0;
n+first.date;
if n ne 1;
drop n;
run;

View solution in original post

8 REPLIES 8
PhilC
Rhodochrosite | Level 12

See PROC SORT examples especially with the option NODUPKEY.

MadhuKorni
Quartz | Level 8

proc sort data = have;

by id date;

run;

data want;

set have;

if last.id;

by id;

run;

MikeZdeb
Rhodochrosite | Level 12

Hi.  I think that the only problem with this is if there are multiple occurrences of dates after the first date, for example ...

data have;

input ID   Date   Var1   Var2 @@;

cards;

1 2002 4 5 1 2002 5 7 1 2003 7 9 1 2003 9 9

2 2001 4 8 2 2002 3 8 2 2002 9 9

;

data want;

set have;

by id;

if last.id;

run;

WANT would be ...

Obs    ID    Date    Var1    Var2

1      1    2003      9       9

2      2    2002      9       9

and miss one "not first date" for each ID.  How about ...

data want (drop=first_date);

set have;

by id;

first_date + first.id*date;

if date ne first_date then output;

if last.id then call missing(first_date);

run;

WANT would be ...

Obs    ID    Date    Var1    Var2

1      1    2003      7       9

2      1    2003      9       9

3      2    2002      3       8

4      2    2002      9       9

(same answer you'd get with Xia's data step answer or my PROC SQL answer)

Ksharp
Super User

Code: Program

data have;
input ID   Date   Var1   Var2;
cards;
1 2002 4 5
1 2002 5 7
1 2003 7 9
2 2001 4 8
2 2002 3 8
;
run;

data want;
set have;
by id date notsorted;
if first.id then n=0;
n+first.date;
if n ne 1;
drop n;
run;
heavensong
Calcite | Level 5

Thank you so much. This is exactly what I was looking for.

MikeZdeb
Rhodochrosite | Level 12

Hi.  You tried SQL.  Did you try this ...

proc sql;

create table want as select * from have group by id having date ^= min(date);

quit;

Haikuo
Onyx | Level 15

Mike, I suppose you want GT (>) instead of ^=, just in case some dates are missing.

MikeZdeb
Rhodochrosite | Level 12

Sure, thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1221 views
  • 7 likes
  • 6 in conversation