SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to select the max value from a range of measurements over time

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

How to select the max value from a range of measurements over time

[ Edited ]

 

I have a dataset where four scores were measured over time (R_score, M_score, S_score, and P_score). I have successfully matched these measurements to the time periods during which the subjects (e.g. STUDYID 0001 and 0002) were in a particular environment (SCHOOL vs. HOME). When I say "time period", this refers to a period of time marked by the same dates and environment. In this case, the first subject 0001 has multiple (14) observations/measurements taken in the SCHOOL environment during the time period 02/06/2008-03/12/2010, followed by a period of time (03/13/2010-01/01/2015) in the HOME environment during which no measurements were taken and thus the R_score, M_score, S_score, and P_score variables are all missing (which is still useful data to retain for my purposes). Sample data is below:

 

data tempfile;

infile datalines truncover;

INPUT

environment$ studyID$4.@+1 StartDate mmddyy10.@+1 EndDate mmddyy10.@+1

daysbwn Movementcode Movecode short_inc count

ScoreDate mmddyy10.@+1 R_Score M_Score S_Score P_Score;

format Startdate EndDate ScoreDate yymmdd10.;

datalines;

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/06/2008 1 3 3 1

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/08/2008 1 3 3 1   

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/12/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 03/17/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 05/21/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/06/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/26/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/29/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/27/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/28/2009 1 4 4 3

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/26/2009 1 5 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 09/16/2009 1 4 5 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 12/14/2009 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/27/2010 1 4 4 2

HOME 0001 03/13/2010 01/01/2015 1755 3 2 0 3 . . . . .

HOME 0002 01/01/2007 07/31/2007 211 . . 0 1 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 08/01/2007 2 3 3 2

HOME 0002 08/29/2007 02/20/2008 175 3 2 0 3 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 02/21/2008 2 2 2 1

HOME 0002 03/06/2008 01/01/2015 2492 3 2 0 5 . . . . .

;

run;

proc print data=tempfile; run;

 

Now I would like to select out the highest value seen for each of the four different score variables during each time period. For example (using red text to indicate the measurements of interest):

 

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/06/2008 1 3 3 1

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/08/2008 1 3 3 1   

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/12/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 03/17/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 05/21/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/06/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/26/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/29/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/27/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/28/2009 1 4 4 3

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/26/2009 1 5 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 09/16/2009 1 4 5 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 12/14/2009 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/27/2010 1 4 4 2

HOME 0001 03/13/2010 01/01/2015 1755 3 2 0 3 . . . . .

HOME 0002 01/01/2007 07/31/2007 211 . . 0 1 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 08/01/2007 2 3 3 2

HOME 0002 08/29/2007 02/20/2008 175 3 2 0 3 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 02/21/2008 2 2 2 1

HOME 0002 03/06/2008 01/01/2015 2492 3 2 0 5 . . . . .

 

If only a single row of measurements or only missing values are available, they should be the values chosen and kept in the dataset. If multiple rows of data are available, then the highest value available for each of the score variables should be chosen to produce something that looks like this (the ScoreDate can be dropped):

 

environment$ studyID$4.@+1 StartDate mmddyy10.@+1 EndDate mmddyy10.@+daysbwn Movementcode Movecode short_inc count R_Score M_Score S_Score P_Score

 

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 1 5 5 3

HOME 0001 03/13/2010 01/01/2015 1755 3 2 0 3 . . . .

HOME 0002 01/01/2007 07/31/2007 211 . . 0 1 . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 2 3 3 2

HOME 0002 08/29/2007 02/20/2008 175 3 2 0 3 . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 2 2 2 1

HOME 0002 03/06/2008 01/01/2015 2492 3 2 0 5 . . . .

 

Any help to do this properly would be very much appreciated! Thank you in advance.


Accepted Solutions
Solution
‎01-24-2017 05:23 PM
Valued Guide
Posts: 505

Re: How to select the max value from a range of measurements over time

This does not solve the problem but may get you on the right path

proc summary data=tempfile max nway;
class environment studyID StartDate;
id environment studyID StartDate;
var R_Score M_Score S_Score P_Score;
output out=max(drop=_type_ _freq_) max=;
run;quit;

proc print data=max; run;



Obs    ENVIRONMENT    STUDYID     STARTDATE    R_SCORE    M_SCORE    S_SCORE    P_SCORE

 1       HOME          0002      2007-01-01       .          .          .          .
 2       HOME          0002      2007-08-29       .          .          .          .
 3       HOME          0002      2008-03-06       .          .          .          .
 4       SCHOOL        0001      2008-02-06       1          5          5          2
 5       SCHOOL        0001      2010-03-13       .          .          .          .
 6       SCHOOL        0002      2007-08-01       2          3          3          2
 7       SCHOOL        0002      2008-02-21       2          2          2          1

You can match back using the compound key plus each max score, or you can just repeat the key with each max score.

View solution in original post


All Replies
Super User
Posts: 11,343

Re: How to select the max value from a range of measurements over time

"Each time period" means what exactly?

Contributor
Posts: 51

Re: How to select the max value from a range of measurements over time

I just corrected an error and added an additional explanation to my original post in response to your post. Please let me know if you still have questions. Thank you.

Solution
‎01-24-2017 05:23 PM
Valued Guide
Posts: 505

Re: How to select the max value from a range of measurements over time

This does not solve the problem but may get you on the right path

proc summary data=tempfile max nway;
class environment studyID StartDate;
id environment studyID StartDate;
var R_Score M_Score S_Score P_Score;
output out=max(drop=_type_ _freq_) max=;
run;quit;

proc print data=max; run;



Obs    ENVIRONMENT    STUDYID     STARTDATE    R_SCORE    M_SCORE    S_SCORE    P_SCORE

 1       HOME          0002      2007-01-01       .          .          .          .
 2       HOME          0002      2007-08-29       .          .          .          .
 3       HOME          0002      2008-03-06       .          .          .          .
 4       SCHOOL        0001      2008-02-06       1          5          5          2
 5       SCHOOL        0001      2010-03-13       .          .          .          .
 6       SCHOOL        0002      2007-08-01       2          3          3          2
 7       SCHOOL        0002      2008-02-21       2          2          2          1

You can match back using the compound key plus each max score, or you can just repeat the key with each max score.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 272 views
  • 1 like
  • 3 in conversation