Good day fellow SAS community forum users,
I have a question concerning data manipulation to create a new variable that would represent the maximum (or alternatively minimum) of a continuous variable, by study_id, without altering the existing data set.
Suppose I have the following data and I want to create the variable "max", which is the largest value for "event", by study ID. Also included is the SAS code to generate the sample data. How would I go about creating the maximum "event", without altering the data set?
| study_id | time | event | max <- what I want to get | 
| 1 | 1 | 0 | 1 | 
| 1 | 2 | 0 | 1 | 
| 1 | 3 | 1 | 1 | 
| 1 | 4 | 0 | 1 | 
| 2 | 1 | 0 | 1 | 
| 2 | 2 | 1 | 1 | 
| 2 | 3 | 1 | 1 | 
| 2 | 4 | 1 | 1 | 
| 3 | 1 | 0 | 0 | 
| 3 | 2 | 0 | 0 | 
| 3 | 3 | 0 | 0 | 
| 3 | 4 | 0 | 0 | 
data have;
input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
Thanks again! The help of the community has been invaluable for my project.
-Carmine
Here's a quicky data step solution.
data have; input study_id time event; datalines; 1 1 0 1 2 0 1 3 1 1 4 0 2 1 0 2 2 1 2 3 1 2 4 1 3 1 0 3 2 0 3 3 0 3 4 0 run; proc sort data=have; by study_id descending event; run; data want; set have; by study_id; retain maxevent; if first.study_id then maxevent = event; run; proc sort data=want; by study_id time; run;
If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.
There are multiple approaches.
I'm not sure what you mean by 'without altering the data set?' ? By default SAS doesn't change the original data unless you code it that way explicitly.
Here's a link that shows how to get the averages, the similar method works for MAX, just change MEAN to MAX.
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
******************************************************;
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint;
    output out=avg_values mean(height)=avg_height;
run;
data class_data;
    set sashelp.class;
    if _n_=1 then
        set avg_values;
run;
proc print data=class;
run;
*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class;
quit;
******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;
*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;
data class_data;
 merge class avg_values;
 by sex;
run;
proc print data=class_data;
run;
*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;
Here's a quicky data step solution.
data have; input study_id time event; datalines; 1 1 0 1 2 0 1 3 1 1 4 0 2 1 0 2 2 1 2 3 1 2 4 1 3 1 0 3 2 0 3 3 0 3 4 0 run; proc sort data=have; by study_id descending event; run; data want; set have; by study_id; retain maxevent; if first.study_id then maxevent = event; run; proc sort data=want; by study_id time; run;
If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.
Thank you very much! This is a simple solution that works for what I am trying to do.
I still think in terms of STATA where we have egen(max/min), so when I described my problem I noted that I didn't want to the data set to be altered.
I also thank the other posters for their advice. And I do agree that for multiple mins/maxs, that creating a new table and re-merging is an alternative route.
Thanks again!
-Carmine
I recommend you take advantage of remerging stats of proc sql for problems like this. Proc sql is simply incredible with the provision of remerging stats.
data have;
input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
;
run;
proc sql;
create table want as
select study_id,time,event, max(event) as max_event
from have
group by study_id
order by study_id ,time;
quit;
or a DOW approach:
data have;
input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
;
run;
data want2;
do until(last.study_id);
set have;
by study_id;
max_event=max(event,max_event);
end;
do until(last.study_id);
set have;
by study_id;
output;
end;
run;
Thank you! This works excellent as well! I like this approach!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
