BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kels123
Quartz | Level 8

 

I have a dataset where four scores were measured over time (R_score, M_score, S_score, and P_score). I have successfully matched these measurements to the time periods during which the subjects (e.g. STUDYID 0001 and 0002) were in a particular environment (SCHOOL vs. HOME). When I say "time period", this refers to a period of time marked by the same dates and environment. In this case, the first subject 0001 has multiple (14) observations/measurements taken in the SCHOOL environment during the time period 02/06/2008-03/12/2010, followed by a period of time (03/13/2010-01/01/2015) in the HOME environment during which no measurements were taken and thus the R_score, M_score, S_score, and P_score variables are all missing (which is still useful data to retain for my purposes). Sample data is below:

 

data tempfile;

infile datalines truncover;

INPUT

environment$ studyID$4.@+1 StartDate mmddyy10.@+1 EndDate mmddyy10.@+1

daysbwn Movementcode Movecode short_inc count

ScoreDate mmddyy10.@+1 R_Score M_Score S_Score P_Score;

format Startdate EndDate ScoreDate yymmdd10.;

datalines;

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/06/2008 1 3 3 1

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/08/2008 1 3 3 1   

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/12/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 03/17/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 05/21/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/06/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/26/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/29/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/27/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/28/2009 1 4 4 3

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/26/2009 1 5 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 09/16/2009 1 4 5 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 12/14/2009 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/27/2010 1 4 4 2

HOME 0001 03/13/2010 01/01/2015 1755 3 2 0 3 . . . . .

HOME 0002 01/01/2007 07/31/2007 211 . . 0 1 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 08/01/2007 2 3 3 2

HOME 0002 08/29/2007 02/20/2008 175 3 2 0 3 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 02/21/2008 2 2 2 1

HOME 0002 03/06/2008 01/01/2015 2492 3 2 0 5 . . . . .

;

run;

proc print data=tempfile; run;

 

Now I would like to select out the highest value seen for each of the four different score variables during each time period. For example (using red text to indicate the measurements of interest):

 

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/06/2008 1 3 3 1

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/08/2008 1 3 3 1   

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 02/12/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 03/17/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 05/21/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/06/2008 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/26/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/29/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/27/2009 1 3 3 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 07/28/2009 1 4 4 3

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 08/26/2009 1 5 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 09/16/2009 1 4 5 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 12/14/2009 1 4 4 2

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 01/27/2010 1 4 4 2

HOME 0001 03/13/2010 01/01/2015 1755 3 2 0 3 . . . . .

HOME 0002 01/01/2007 07/31/2007 211 . . 0 1 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 . . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 08/01/2007 2 3 3 2

HOME 0002 08/29/2007 02/20/2008 175 3 2 0 3 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 . . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 02/21/2008 2 2 2 1

HOME 0002 03/06/2008 01/01/2015 2492 3 2 0 5 . . . . .

 

If only a single row of measurements or only missing values are available, they should be the values chosen and kept in the dataset. If multiple rows of data are available, then the highest value available for each of the score variables should be chosen to produce something that looks like this (the ScoreDate can be dropped):

 

environment$ studyID$4.@+1 StartDate mmddyy10.@+1 EndDate mmddyy10.@+daysbwn Movementcode Movecode short_inc count R_Score M_Score S_Score P_Score

 

SCHOOL 0001 02/06/2008 03/12/2010 765 1 1 0 2 1 5 5 3

HOME 0001 03/13/2010 01/01/2015 1755 3 2 0 3 . . . .

HOME 0002 01/01/2007 07/31/2007 211 . . 0 1 . . . .

SCHOOL 0002 08/01/2007 08/28/2007 27 1 1 1 2 2 3 3 2

HOME 0002 08/29/2007 02/20/2008 175 3 2 0 3 . . . .

SCHOOL 0002 02/21/2008 03/05/2008 13 1 1 1 4 2 2 2 1

HOME 0002 03/06/2008 01/01/2015 2492 3 2 0 5 . . . .

 

Any help to do this properly would be very much appreciated! Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
rogerjdeangelis
Barite | Level 11
This does not solve the problem but may get you on the right path

proc summary data=tempfile max nway;
class environment studyID StartDate;
id environment studyID StartDate;
var R_Score M_Score S_Score P_Score;
output out=max(drop=_type_ _freq_) max=;
run;quit;

proc print data=max; run;



Obs    ENVIRONMENT    STUDYID     STARTDATE    R_SCORE    M_SCORE    S_SCORE    P_SCORE

 1       HOME          0002      2007-01-01       .          .          .          .
 2       HOME          0002      2007-08-29       .          .          .          .
 3       HOME          0002      2008-03-06       .          .          .          .
 4       SCHOOL        0001      2008-02-06       1          5          5          2
 5       SCHOOL        0001      2010-03-13       .          .          .          .
 6       SCHOOL        0002      2007-08-01       2          3          3          2
 7       SCHOOL        0002      2008-02-21       2          2          2          1

You can match back using the compound key plus each max score, or you can just repeat the key with each max score.

View solution in original post

3 REPLIES 3
ballardw
Super User

"Each time period" means what exactly?

Kels123
Quartz | Level 8

I just corrected an error and added an additional explanation to my original post in response to your post. Please let me know if you still have questions. Thank you.

rogerjdeangelis
Barite | Level 11
This does not solve the problem but may get you on the right path

proc summary data=tempfile max nway;
class environment studyID StartDate;
id environment studyID StartDate;
var R_Score M_Score S_Score P_Score;
output out=max(drop=_type_ _freq_) max=;
run;quit;

proc print data=max; run;



Obs    ENVIRONMENT    STUDYID     STARTDATE    R_SCORE    M_SCORE    S_SCORE    P_SCORE

 1       HOME          0002      2007-01-01       .          .          .          .
 2       HOME          0002      2007-08-29       .          .          .          .
 3       HOME          0002      2008-03-06       .          .          .          .
 4       SCHOOL        0001      2008-02-06       1          5          5          2
 5       SCHOOL        0001      2010-03-13       .          .          .          .
 6       SCHOOL        0002      2007-08-01       2          3          3          2
 7       SCHOOL        0002      2008-02-21       2          2          2          1

You can match back using the compound key plus each max score, or you can just repeat the key with each max score.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 947 views
  • 1 like
  • 3 in conversation