BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Carmine_Rossi
Calcite | Level 5

Good day fellow SAS community forum users,

 

I have a question concerning data manipulation to create a new variable that would represent the maximum (or alternatively minimum) of a continuous variable, by study_id, without altering the existing data set.

 

Suppose I have the following data and I want to create the variable "max", which is the largest value for "event", by study ID. Also included is the SAS code to generate the sample data. How would I go about creating the maximum "event", without altering the data set?

 

study_id

time

event

max <- what I want to get

1

1

0

1

1

2

0

1

1

3

1

1

1

4

0

1

2

1

0

1

2

2

1

1

2

3

1

1

2

4

1

1

3

1

0

0

3

2

0

0

3

3

0

0

3

4

0

0

 

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

run;

 

Thanks again! The help of the community has been invaluable for my project.

-Carmine

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here's a quicky data step solution.

data have;
  input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
proc sort data=have;
 by study_id descending event;
run;

data want;
   set have;
   by study_id;
   retain maxevent;
   if first.study_id then maxevent = event;
run;

proc sort data=want;
  by study_id time;
run;

If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.

 

View solution in original post

6 REPLIES 6
Reeza
Super User

There are multiple approaches. 

I'm not sure what you mean by 'without altering the data set?' ? By default SAS doesn't change the original data unless you code it that way explicitly.

 

Here's a link that shows how to get the averages, the similar method works for MAX, just change MEAN to MAX.

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 



******************************************************;
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;

proc means data=sashelp.class noprint;
    output out=avg_values mean(height)=avg_height;
run;

data class_data;
    set sashelp.class;

    if _n_=1 then
        set avg_values;
run;

proc print data=class;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class;
quit;

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;

 

ballardw
Super User

Here's a quicky data step solution.

data have;
  input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
proc sort data=have;
 by study_id descending event;
run;

data want;
   set have;
   by study_id;
   retain maxevent;
   if first.study_id then maxevent = event;
run;

proc sort data=want;
  by study_id time;
run;

If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.

 

Carmine_Rossi
Calcite | Level 5

Thank you very much! This is a simple solution that works for what I am trying to do.

 

I still think in terms of STATA where we have egen(max/min), so when I described my problem I noted that I didn't want to the data set to be altered. 

 

I also thank the other posters for their advice. And I do agree that for multiple mins/maxs, that creating a new table and re-merging is an alternative route. 

 

Thanks again!

-Carmine

novinosrin
Tourmaline | Level 20

I recommend you take advantage of remerging stats of proc sql for problems like this. Proc sql is simply incredible with the provision of remerging stats. 

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

;

run;

 

 

 proc sql;

create table want as

select study_id,time,event, max(event) as max_event

from have

group by study_id

order by study_id ,time;

quit;

 

novinosrin
Tourmaline | Level 20

or a DOW approach:

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

;

run;

 

data want2;

do until(last.study_id);

     set have;

     by study_id;

     max_event=max(event,max_event);

end;

do until(last.study_id);

     set have;

     by study_id;

     output;

end;

run;

Carmine_Rossi
Calcite | Level 5

Thank you! This works excellent as well! I like this approach!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 9978 views
  • 2 likes
  • 4 in conversation