Good day fellow SAS community forum users,
I have a question concerning data manipulation to create a new variable that would represent the maximum (or alternatively minimum) of a continuous variable, by study_id, without altering the existing data set.
Suppose I have the following data and I want to create the variable "max", which is the largest value for "event", by study ID. Also included is the SAS code to generate the sample data. How would I go about creating the maximum "event", without altering the data set?
study_id | time | event | max <- what I want to get |
1 | 1 | 0 | 1 |
1 | 2 | 0 | 1 |
1 | 3 | 1 | 1 |
1 | 4 | 0 | 1 |
2 | 1 | 0 | 1 |
2 | 2 | 1 | 1 |
2 | 3 | 1 | 1 |
2 | 4 | 1 | 1 |
3 | 1 | 0 | 0 |
3 | 2 | 0 | 0 |
3 | 3 | 0 | 0 |
3 | 4 | 0 | 0 |
data have;
input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
Thanks again! The help of the community has been invaluable for my project.
-Carmine
Here's a quicky data step solution.
data have; input study_id time event; datalines; 1 1 0 1 2 0 1 3 1 1 4 0 2 1 0 2 2 1 2 3 1 2 4 1 3 1 0 3 2 0 3 3 0 3 4 0 run; proc sort data=have; by study_id descending event; run; data want; set have; by study_id; retain maxevent; if first.study_id then maxevent = event; run; proc sort data=want; by study_id time; run;
If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.
There are multiple approaches.
I'm not sure what you mean by 'without altering the data set?' ? By default SAS doesn't change the original data unless you code it that way explicitly.
Here's a link that shows how to get the averages, the similar method works for MAX, just change MEAN to MAX.
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
******************************************************; *Add average value to a dataset; *Solution 1 - PROC MEANS + Data step; ******************************************************; proc means data=sashelp.class noprint; output out=avg_values mean(height)=avg_height; run; data class_data; set sashelp.class; if _n_=1 then set avg_values; run; proc print data=class; run; *Solution 2 - PROC SQL - note the warning in the log; PROC SQL; Create table class_sql as select *, mean(height) as avg_height from sashelp.class; quit; ******************************************************; *Add average value to a dataset - with grouping variables; *Solution 1 - PROC MEANS + Data step; ******************************************************; proc means data=sashelp.class noprint nway; class sex; output out=avg_values mean(height)=avg_height; run; *sort data before merge; proc sort data=sashelp.class out=class; by sex; run; data class_data; merge class avg_values; by sex; run; proc print data=class_data; run; *Solution 2 - PROC SQL - note the warning in the log; PROC SQL; Create table class_sql as select *, mean(height) as avg_height from sashelp.class group by sex; quit;
Here's a quicky data step solution.
data have; input study_id time event; datalines; 1 1 0 1 2 0 1 3 1 1 4 0 2 1 0 2 2 1 2 3 1 2 4 1 3 1 0 3 2 0 3 3 0 3 4 0 run; proc sort data=have; by study_id descending event; run; data want; set have; by study_id; retain maxevent; if first.study_id then maxevent = event; run; proc sort data=want; by study_id time; run;
If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.
Thank you very much! This is a simple solution that works for what I am trying to do.
I still think in terms of STATA where we have egen(max/min), so when I described my problem I noted that I didn't want to the data set to be altered.
I also thank the other posters for their advice. And I do agree that for multiple mins/maxs, that creating a new table and re-merging is an alternative route.
Thanks again!
-Carmine
I recommend you take advantage of remerging stats of proc sql for problems like this. Proc sql is simply incredible with the provision of remerging stats.
data have;
input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
;
run;
proc sql;
create table want as
select study_id,time,event, max(event) as max_event
from have
group by study_id
order by study_id ,time;
quit;
or a DOW approach:
data have;
input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
;
run;
data want2;
do until(last.study_id);
set have;
by study_id;
max_event=max(event,max_event);
end;
do until(last.study_id);
set have;
by study_id;
output;
end;
run;
Thank you! This works excellent as well! I like this approach!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.