BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Carmine_Rossi
Calcite | Level 5

Good day fellow SAS community forum users,

 

I have a question concerning data manipulation to create a new variable that would represent the maximum (or alternatively minimum) of a continuous variable, by study_id, without altering the existing data set.

 

Suppose I have the following data and I want to create the variable "max", which is the largest value for "event", by study ID. Also included is the SAS code to generate the sample data. How would I go about creating the maximum "event", without altering the data set?

 

study_id

time

event

max <- what I want to get

1

1

0

1

1

2

0

1

1

3

1

1

1

4

0

1

2

1

0

1

2

2

1

1

2

3

1

1

2

4

1

1

3

1

0

0

3

2

0

0

3

3

0

0

3

4

0

0

 

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

run;

 

Thanks again! The help of the community has been invaluable for my project.

-Carmine

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here's a quicky data step solution.

data have;
  input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
proc sort data=have;
 by study_id descending event;
run;

data want;
   set have;
   by study_id;
   retain maxevent;
   if first.study_id then maxevent = event;
run;

proc sort data=want;
  by study_id time;
run;

If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.

 

View solution in original post

6 REPLIES 6
Reeza
Super User

There are multiple approaches. 

I'm not sure what you mean by 'without altering the data set?' ? By default SAS doesn't change the original data unless you code it that way explicitly.

 

Here's a link that shows how to get the averages, the similar method works for MAX, just change MEAN to MAX.

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 



******************************************************;
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;

proc means data=sashelp.class noprint;
    output out=avg_values mean(height)=avg_height;
run;

data class_data;
    set sashelp.class;

    if _n_=1 then
        set avg_values;
run;

proc print data=class;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class;
quit;

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;

 

ballardw
Super User

Here's a quicky data step solution.

data have;
  input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
proc sort data=have;
 by study_id descending event;
run;

data want;
   set have;
   by study_id;
   retain maxevent;
   if first.study_id then maxevent = event;
run;

proc sort data=want;
  by study_id time;
run;

If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.

 

Carmine_Rossi
Calcite | Level 5

Thank you very much! This is a simple solution that works for what I am trying to do.

 

I still think in terms of STATA where we have egen(max/min), so when I described my problem I noted that I didn't want to the data set to be altered. 

 

I also thank the other posters for their advice. And I do agree that for multiple mins/maxs, that creating a new table and re-merging is an alternative route. 

 

Thanks again!

-Carmine

novinosrin
Tourmaline | Level 20

I recommend you take advantage of remerging stats of proc sql for problems like this. Proc sql is simply incredible with the provision of remerging stats. 

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

;

run;

 

 

 proc sql;

create table want as

select study_id,time,event, max(event) as max_event

from have

group by study_id

order by study_id ,time;

quit;

 

novinosrin
Tourmaline | Level 20

or a DOW approach:

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

;

run;

 

data want2;

do until(last.study_id);

     set have;

     by study_id;

     max_event=max(event,max_event);

end;

do until(last.study_id);

     set have;

     by study_id;

     output;

end;

run;

Carmine_Rossi
Calcite | Level 5

Thank you! This works excellent as well! I like this approach!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 8083 views
  • 2 likes
  • 4 in conversation