Solved: Help Formatting Data

avalancheshane · Posted 08-09-2011 04:11 PM

I'm pretty much a beginner at SAS so I'm hoping this is the right place for this question and I'll do my best to explain what I'm trying to do.

I have a large dataset (5,582 rows) of patient data from a study involving depression treatment. The data is artificially generated for training purposes. A score from an assessment was collected once per week for a total of 9 weeks. The way the data is currently formatted looks like this:

Trt ID Sex Race Week Score

RxA 1003 F Caucasian 1 .

RxB 1004 M Caucasian 1 .

RxA 1007 F Caucasian 1 30

RxA 1007 F Caucasian 2 31

RxA 1007 F Caucasian 3 36

RxA 1007 F Caucasian 4 36

RxA 1007 F Caucasian 5 35

RxA 1007 F Caucasian 6 15

RxA 1007 F Caucasian 7 12

RxA 1007 F Caucasian 8 10

RxA 1007 F Caucasian 9 7

RxB 1008 M Caucasia 1 .

I'm trying to create a variable defined as the time to a response of interest. In this case I want to create a variable that will tell me when each patient has reached a score of 1/2 or less than the score they had at week 1. However I am a bit confused as to how to do this since each patient has a row of data for each week in the study. I created another variable called Responder which is assigned a value of 1 when a subject's score in a given week is 1/2 or less of their week 1 score.

Using that I tried writing the following:

select;

when (Responder=1) responsetime=week;

otherwise;

end;

This variable was assigned a value for EACH week that a subject's score was 1/2 or less of the week 1 score. However, what I am attempting to do is to create a variable that will only have ONE value for each patient and that will be equal to the first week that the subject meets the criteria of <= 1/2 of the week 1 score. In other words, I don't care if a patient was at 1/2 or below the week 1 score on the 5th, 7th, and 8th weeks; I'm only interested in the fact that the subject met the criteria on week 5 and I would like the responsetime variable to be equal to 5 for that patient.

I'm not sure how best to proceed and whether this issue would be best solved by trying to manipulate my data to wind up with one row for each patient or if there is a way to do that while producing this variable.

Ideally I'd like the end result to look like

Trt ID Response Time

RxA 1003 N/A

RxB 1004 N/A

RxA 1007 6

RxB 1008 N/A

etc.

If it wouldn't be too difficult I'd also be interesting in knowing how to find the same response time but using subjects who didn't have a score at Week 1 but did starting on Week 2!

Any help would be greatly appreciated!

art297 · Posted 08-09-2011 04:40 PM

Since you've already created the variable, in a file called "have", and if there is a maximum number of weeks one could have (in my example less than 99), then you COULD use something like:

data have;

informat trt $3.;

informat sex $1.;

informat race $9.;

input Trt ID Sex Race Week Score ResponseTime;

cards;

RxA 1003 F Caucasian 1 . .

RxB 1004 M Caucasian 1 . .

RxA 1007 F Caucasian 1 30 .

RxA 1007 F Caucasian 2 31 .

RxA 1007 F Caucasian 3 36 .

RxA 1007 F Caucasian 4 36 .

RxA 1007 F Caucasian 5 35 .

RxA 1007 F Caucasian 6 15 6

RxA 1007 F Caucasian 7 12 7

RxA 1007 F Caucasian 8 10 8

RxA 1007 F Caucasian 9 7 9

RxB 1008 M Caucasian 1 . .

;

proc format;

value rt

99="N/A"

;

run;

data want;

set have;

format ResponseTime rt.;

if missing(ResponseTime) then ResponseTime=99;

run;

proc sort data=want;

by id trt ResponseTime;

run;

proc sort data=want nodupkey;

by id trt;

run;

I should point out that using values like 99 to represent missing can be dangerous, as others who also have to access your code might not know that the value only represents missing .. not 99 weeks, but it can definitely simplify selecting the records you want in this case.

Art

View solution in original post

art297 · Posted 08-09-2011 04:40 PM

Since you've already created the variable, in a file called "have", and if there is a maximum number of weeks one could have (in my example less than 99), then you COULD use something like:

data have;

informat trt $3.;

informat sex $1.;

informat race $9.;

input Trt ID Sex Race Week Score ResponseTime;

cards;

RxA 1003 F Caucasian 1 . .

RxB 1004 M Caucasian 1 . .

RxA 1007 F Caucasian 1 30 .

RxA 1007 F Caucasian 2 31 .

RxA 1007 F Caucasian 3 36 .

RxA 1007 F Caucasian 4 36 .

RxA 1007 F Caucasian 5 35 .

RxA 1007 F Caucasian 6 15 6

RxA 1007 F Caucasian 7 12 7

RxA 1007 F Caucasian 8 10 8

RxA 1007 F Caucasian 9 7 9

RxB 1008 M Caucasian 1 . .

;

proc format;

value rt

99="N/A"

;

run;

data want;

set have;

format ResponseTime rt.;

if missing(ResponseTime) then ResponseTime=99;

run;

proc sort data=want;

by id trt ResponseTime;

run;

proc sort data=want nodupkey;

by id trt;

run;

I should point out that using values like 99 to represent missing can be dangerous, as others who also have to access your code might not know that the value only represents missing .. not 99 weeks, but it can definitely simplify selecting the records you want in this case.

Art

avalancheshane · Posted 08-09-2011 05:27 PM

Thank you very much! This was exactly what I was looking for! I was able to adapt this code to apply to my entire data set and now have the entries merged in the fashion I was looking for! Thank you again!

Help Formatting Data

Help Formatting Data

Help Formatting Data

Help Formatting Data

Registration is open

SAS Training: Just a Click Away