BookmarkSubscribeRSS Feed
Kels123
Quartz | Level 8

Hello,

 

I have long, repeated data for my participants (sample data on first two participants shown below):

 

StudyID ScoreDate R_Score M_Score S_Score P_Score count 

0001 01/11/2007 1 3 3 3 1 

0001 02/01/2007 1 3 3 3 2 

0001 05/17/2007 1 3 3 3 3 

0001 10/26/2007 1 4 4 3 4 

0001 11/06/2007 1 5 5 3 5 

0001 11/07/2007 1 3 3 3 6

0001 11/21/2007 1 3 3 3 7

0001 12/31/2007 1 3 3 3 8

0001 03/18/2008 1 3 3 3 9 

0002 06/04/2012 2 3 3 2 1 

0002 08/08/2012 2 3 3 2 2 

0002 09/27/2012 2 3 3 2 3

0002 11/26/2012 2 3 3 2 4

0002 01/29/2013 2 3 3 2 5

0002 03/21/2013 2 3 3 2 6

0002 04/05/2013 2 3 3 2 7

 

I also have a second database of time periods of interest, specified by a start and end date, as follows (sample data for the first two participants):

StudyID StartDate EndDate 

0001 2007-01-01 2007-01-30 

0001 2007-01-31 2008-04-30 

0001 2008-05-01 2015-01-01 

0002 2007-01-01 2012-05-31 

0002 2012-06-01 2013-05-23 

0002 2013-05-24 2015-01-01

 

I need to combine these two databases by placing the score data into the correct time periods based on whether the ScoreDate (e.g. 05/17/2007) fits into the time period (e.g. 2007-01-31 to 2008-04-30). If the ScoreDate fits into a certain period, I need to put the ScoreDate and all three corresponding scores into that period. If there are multiple ScoreDates that belong in that time period, then I need all ScoreDates and corresponding scores to be entered (in wide format) into that period (preferably with some type of indicator, e.g. ScoreDate1 R_Score1 M_Score1 S_Score1 P_Score1 and ScoreDate2 R_Score2 M_Score1 S_Score2 P_Score2, so that I know which scores were measured on which date. I also need to be able to later compare the ScoreDate to the time period StartDate and EndDate, so I think ScoreDate needs to stay as a variable). If there are no ScoreDates for a certain period, then the period remains unchanged (with no additional variables).

 

I think this could be somewhat straightforward using PROC SQL, but I am still learning how to use it and I could really use some guidance. Thank you very much.

5 REPLIES 5
Sven111
Pyrite | Level 9

I'm not sure if I exactly understand your question, i can see it two ways and if it's one of the ways, this code may work:

PROC SQL;
    SELECT  tbl_2.StudyID
           ,tbl_2.StartDate
           ,tbl_2.EndDate
           ,tbl_1.ScoreDate 
           ,tbl_1.R_Score 
           ,tbl_1.M_Score 
           ,tbl_1.S_Score 
           ,tbl_1.P_Score 
           ,tbl_1.count 
    FROM            tbl_2
    INNER JOIN      tbl_1
                 ON tbl_2.StudyID    = tbl_1.StudyID
                AND tbl_2.StartDate <= tbl_1.ScoreDate
                AND tbl_2.EndDate   >= tbl_1.ScoreDate;                
;QUIT;

 

But if by wide you mean you want differing number of columns in the output dataset per row (or more accurately the same arbitrarily large number of columns, many of which will be empty) and only one row in the output dataset per row in tbl_2, then that is more difficult although still possible, in DBs like Oracle there are some easy ways to do this, but not in SAS via a single step that I'm aware of.

 

 

 

 

Kels123
Quartz | Level 8

The code you provided definitely got me closer to where I want to be. See below:

 

StudyID StartDate EndDate ScoreDate R_Score M_Score S_Score P_Score count

01 2007-01-01 2007-01-30 01/11/2007 1 3 3 3 1

01 2007-01-31 2008-04-30 02/01/2007 1 3 3 3 2

01 2007-01-31 2008-04-30 05/17/2007 1 3 3 3 3

01 2007-01-31 2008-04-30 10/26/2007 1 4 4 3 4

01 2007-01-31 2008-04-30 11/06/2007 1 5 5 3 5

01 2007-01-31 2008-04-30 11/07/2007 1 3 3 3 6

01 2007-01-31 2008-04-30 11/21/2007 1 3 3 3 7

01 2007-01-31 2008-04-30 12/31/2007 1 3 3 3 8

01 2007-01-31 2008-04-30 03/18/2008 1 3 3 3 9

02 2012-06-01 2013-05-23 06/04/2012 2 3 3 2 1

02 2012-06-01 2013-05-23 08/08/2012 2 3 3 2 2

02 2012-06-01 2013-05-23 09/27/2012 2 3 3 2 3

02 2012-06-01 2013-05-23 11/26/2012 2 3 3 2 4

02 2012-06-01 2013-05-23 01/29/2013 2 3 3 2 5

02 2012-06-01 2013-05-23 03/21/2013 2 3 3 2 6

02 2012-06-01 2013-05-23 04/05/2013 2 3 3 2 7

 

However, I think I was ultimately thinking of Option #2 (the same arbitrarily large number of columns, many of which will be empty). Do you have recommendations for next steps on how to get there from here?

 

Thank you!

Sven111
Pyrite | Level 9

I don't have any code handy that would do that, but my first thoughts would be to take the output of the code I sent earlier and process it through a data step where you output a separate dataset for each unique value from table 1 (studyID, startDate, EndDate).  Each file would only have one variable and you would sequentially output R_Score M_Score S_Score P_Score count to the file.  At that point you'll have X files (where X is the number of unique study periods in tbl 1).  You could then do a PROC TRANSPOSE on the file to get a single row instead of a huge number of rows.  I'm not sure about the best way to go about putting the X rows into 1 dataset at that point though.  It may be easiest to export to CSV at that point, combine the files there and then reimport into a dataset (assuming you're not ok with just a CSV as the result).

gergely_batho
SAS Employee

proc transpose is a good idea. I think you don't even need to create X data sets, just run it directly on the output Sven111 provided.

 

proc transpose data=  out=want ;

 by StudyID StartDate EndDate;

 id ScoreDate;

 var R_Score M_Score S_Score P_Score count;

run;

 

...or some variant of the above. Look into the doc of proc transpose.

 

Edit: The above is not a perfect solution, as it will create 5 rows for each by group: one for each variable (R_Score M_Score S_Score P_Score count). If you one to transpose the dataset only because of displaying purposes, you could first put all those 5 variables into a single character variable: vars_char=catx(' ',R_Score,M_Score,S_Score,P_Score,count)    and do the traspose after that.
If you need additional processing on the results... In that case I would not do the transose at all 🙂

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1189 views
  • 0 likes
  • 4 in conversation