DATA Step, Macro, Functions and more

Creating a variable to represent the maximum of value by group

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Creating a variable to represent the maximum of value by group

Good day fellow SAS community forum users,

 

I have a question concerning data manipulation to create a new variable that would represent the maximum (or alternatively minimum) of a continuous variable, by study_id, without altering the existing data set.

 

Suppose I have the following data and I want to create the variable "max", which is the largest value for "event", by study ID. Also included is the SAS code to generate the sample data. How would I go about creating the maximum "event", without altering the data set?

 

study_id

time

event

max <- what I want to get

1

1

0

1

1

2

0

1

1

3

1

1

1

4

0

1

2

1

0

1

2

2

1

1

2

3

1

1

2

4

1

1

3

1

0

0

3

2

0

0

3

3

0

0

3

4

0

0

 

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

run;

 

Thanks again! The help of the community has been invaluable for my project.

-Carmine


Accepted Solutions
Solution
‎12-07-2017 01:26 PM
Super User
Posts: 13,293

Re: Creating a variable to represent the maximum of value by group

Posted in reply to Carmine_Rossi

Here's a quicky data step solution.

data have;
  input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
proc sort data=have;
 by study_id descending event;
run;

data want;
   set have;
   by study_id;
   retain maxevent;
   if first.study_id then maxevent = event;
run;

proc sort data=want;
  by study_id time;
run;

If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.

 

View solution in original post


All Replies
Super User
Posts: 23,237

Re: Creating a variable to represent the maximum of value by group

Posted in reply to Carmine_Rossi

There are multiple approaches. 

I'm not sure what you mean by 'without altering the data set?' ? By default SAS doesn't change the original data unless you code it that way explicitly.

 

Here's a link that shows how to get the averages, the similar method works for MAX, just change MEAN to MAX.

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 



******************************************************;
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;

proc means data=sashelp.class noprint;
    output out=avg_values mean(height)=avg_height;
run;

data class_data;
    set sashelp.class;

    if _n_=1 then
        set avg_values;
run;

proc print data=class;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class;
quit;

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;

 

Solution
‎12-07-2017 01:26 PM
Super User
Posts: 13,293

Re: Creating a variable to represent the maximum of value by group

Posted in reply to Carmine_Rossi

Here's a quicky data step solution.

data have;
  input study_id time event;
datalines;
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 1
2 3 1
2 4 1
3 1 0
3 2 0
3 3 0
3 4 0
run;
proc sort data=have;
 by study_id descending event;
run;

data want;
   set have;
   by study_id;
   retain maxevent;
   if first.study_id then maxevent = event;
run;

proc sort data=want;
  by study_id time;
run;

If you have multiple variables you want to find the maximum (or minimum) then a different summary and merge back solution would be better.

 

Occasional Contributor
Posts: 8

Re: Creating a variable to represent the maximum of value by group

Thank you very much! This is a simple solution that works for what I am trying to do.

 

I still think in terms of STATA where we have egen(max/min), so when I described my problem I noted that I didn't want to the data set to be altered. 

 

I also thank the other posters for their advice. And I do agree that for multiple mins/maxs, that creating a new table and re-merging is an alternative route. 

 

Thanks again!

-Carmine

PROC Star
Posts: 1,559

Re: Creating a variable to represent the maximum of value by group

[ Edited ]
Posted in reply to Carmine_Rossi

I recommend you take advantage of remerging stats of proc sql for problems like this. Proc sql is simply incredible with the provision of remerging stats. 

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

;

run;

 

 

 proc sql;

create table want as

select study_id,time,event, max(event) as max_event

from have

group by study_id

order by study_id ,time;

quit;

 

PROC Star
Posts: 1,559

Re: Creating a variable to represent the maximum of value by group

Posted in reply to novinosrin

or a DOW approach:

 

data have;

  input study_id time event;

datalines;

1 1 0

1 2 0

1 3 1

1 4 0

2 1 0

2 2 1

2 3 1

2 4 1

3 1 0

3 2 0

3 3 0

3 4 0

;

run;

 

data want2;

do until(last.study_id);

     set have;

     by study_id;

     max_event=max(event,max_event);

end;

do until(last.study_id);

     set have;

     by study_id;

     output;

end;

run;

Occasional Contributor
Posts: 8

Re: Creating a variable to represent the maximum of value by group

Posted in reply to novinosrin

Thank you! This works excellent as well! I like this approach!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 248 views
  • 0 likes
  • 4 in conversation