HI,
I have a longitudinal dataset with a variable that describes disease severity (mild, moderate, severe). Since this is longitudinal subjects may have at some point been mild then became moderate or severe throughout the study. Some the opposite. I would like to pick the most severe for each subject regardless of when. I am pasting below how the data appears:
ID Severity
1 mild
1 mild
1 moderate
1 moderate
2 moderate
2 mild
2 severe
2 severe
3 mild
3 mild
3 mild
3 mild
How I want the data to look like:
ID Severity
1 moderate
2 severe
3 mild
What is the best way to achieve this?
Thanks!
Untested, let me know if this doesn't work:
data have;
input ID Severity : $10.;
datalines;
1 severe
1 mild
1 moderate
1 mild
2 moderate
2 mild
2 severe
2 mild
3 mild
3 mild
3 moderate
3 mild
;
data want;
set have;
by id;
retain t;
if first.id then t=severity;
else if t ne 'severe' and severity ne 'mild' then t=severity;
if last.id;
run;
Hi novinosrin,
It didn't work. I am trying to play with it. One idea is to do a two step approach. What do you think if I first made the data look like this by making all ID values into picking the most severe. then running a if last.id. How would I make it change all the values per ID to the most severe...
ID severity
1 moderate
1 moderate
1 moderate
1 moderate
2 severe
2 severe
2 severe
2 severe
@Jest wrote:
HI,
What is the best way to achieve this?
Thanks!
"Best" would depend on what you are measuring. Best by shortest execution time, shortest code, easiest to understand/ maintain code could all be used.
Since you have a variable that has an order, if this were my data I would have read the data so that I had and ordinal numeric variable with the "least severe" or what ever term would apply with the value of 1 to a larger integer for the "most severe". There would also be an associated display format to display the desired text when used.
Which would make my code for your request look like:
proc summary data=have nway;
class id;
var severity;
output out=idsummary max= ;
run;
Which would have the maximum value of that numeric severity variable for each ID value.
There are a number of other things possible with numeric ordinals that aren't possible with most character variables.
Thank you Ballardw. Proc summary worked after converting the variable to numeric.
However, is there a way to retain the rest of the variables in the dataset. They don't very across time.
thanks,
@Jest wrote:
Thank you Ballardw. Proc summary worked after converting the variable to numeric.
However, is there a way to retain the rest of the variables in the dataset. They don't very across time.
thanks,
Show the code you used to add or replace the severity variable. If done correctly it shouldn't have affected any of the other variables.
Or if you are talking about adding the single result in the summary back to the other data matching the id variable?
proc sql; create table severityadded as select a.*, b.severity as maxseverity from have as a left join idsummary as b on a.id=b.id; run;
I just ran this code after converting the variable to numeric.
proc summary data=have nway;
class id;
var severe;
output out=data.idsummary max= ;
run;
so the output dataset picks the most severe per ID over time.. It includes ID, _Type_, _FREQ_ and the variable severe. If possible I would like the output dataset to also include all the remainder variable in the original dataset (have) as well. I could technically go back and run an if last.ID on the original dataset and then merge it with the idsummary one which includes the severity variable....but is there a way to just do that in the proc summary code above....
Thanks
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.