picking specific observation longitudinal data

Reply
Contributor
Posts: 44

picking specific observation longitudinal data

HI,

 

I have a longitudinal dataset with a variable that describes disease severity (mild, moderate, severe). Since this is longitudinal subjects may have at some point been mild then became moderate or severe throughout the study. Some the opposite. I would like to pick the most severe for each subject regardless of when. I am pasting below how the data appears:

 

ID       Severity

1           mild

1           mild

1           moderate

1           moderate

2           moderate 

2           mild

2           severe

2           severe

3           mild

3           mild

3           mild

3           mild

 

How I want the data to look like:

 

ID         Severity

1           moderate 

2           severe

3           mild

 

What is the best way to achieve this?

 

Thanks!

PROC Star
Posts: 826

Re: picking specific observation longitudinal data

Untested, let me know if this doesn't work:

 

data have;
input ID Severity : $10.;
datalines;
1 severe
1 mild
1 moderate
1 mild
2 moderate
2 mild
2 severe
2 mild
3 mild
3 mild
3 moderate
3 mild
;

data want;
set have;
by id;
retain t;
if first.id then t=severity;
else if t ne 'severe' and severity ne 'mild' then t=severity;
if last.id;
run;

Contributor
Posts: 44

Re: picking specific observation longitudinal data

Posted in reply to novinosrin

Hi novinosrin,

 

It didn't work. I am trying to play with it. One idea is to do a two step approach. What do you think if I first made the data look like this by making all ID values into picking the most severe. then running a if last.id. How would I make it change all the values per ID to the most severe...

 

ID     severity

1       moderate 

1       moderate

1       moderate

1       moderate

2       severe

2       severe

2       severe

2       severe

 

Super User
Posts: 12,148

Re: picking specific observation longitudinal data


Jest wrote:

HI,

  

What is the best way to achieve this?

 

Thanks!


"Best" would depend on what you are measuring. Best by shortest execution time, shortest code, easiest to understand/ maintain code could all be used.

 

Since you have a variable that has an order, if this were my data I would have read the data so that I had and ordinal numeric variable with the "least severe" or what ever term would apply with the value of 1 to a larger integer for the "most severe". There would also be an associated display format to display the desired text when used.

Which would make my code for your request look like:

 

proc summary data=have nway;

   class id;

   var severity;

   output out=idsummary max= ;

run;

Which would have the maximum value of that numeric severity variable for each ID value.

 

There are a number of other things possible with numeric ordinals that aren't possible with most character variables.

Contributor
Posts: 44

Re: picking specific observation longitudinal data

Thank you Ballardw. Proc summary worked after converting the variable to numeric. 

However, is there a way to retain the rest of the variables in the dataset. They don't very across time.

 

thanks,

 

Super User
Posts: 12,148

Re: picking specific observation longitudinal data

[ Edited ]

Jest wrote:

Thank you Ballardw. Proc summary worked after converting the variable to numeric. 

However, is there a way to retain the rest of the variables in the dataset. They don't very across time.

 

thanks,

 


Show the code you used to add or replace the severity variable. If done correctly it shouldn't have affected any of the other variables.

 Or if you are talking about adding the single result in the summary back to the other data matching the id variable?

proc sql;
   create table severityadded as
   select a.*, b.severity as maxseverity
   from have as a left join
        idsummary as b
        on a.id=b.id;
run;
Contributor
Posts: 44

Re: picking specific observation longitudinal data

I just ran this code after converting the variable to numeric.

 

proc summary data=have nway;
class id;
var severe;
output out=data.idsummary max= ;
run;

 

so the output dataset picks the most severe per ID over time.. It includes ID, _Type_, _FREQ_ and the variable severe. If possible I would like the output dataset to also include all the remainder variable in the original dataset (have) as well. I could technically go back and run an if last.ID on the original dataset and then merge it with the idsummary one which includes the severity variable....but is there a way to just do that in the proc summary code above....

 

Thanks

Ask a Question
Discussion stats
  • 6 replies
  • 99 views
  • 0 likes
  • 3 in conversation