Help using Base SAS procedures

Help Formatting Data

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Help Formatting Data

I'm pretty much a beginner at SAS so I'm hoping this is the right place for this question and I'll do my best to explain what I'm trying to do.

I have a large dataset (5,582 rows) of patient data from a study involving depression treatment. The data is artificially generated for training purposes. A score from an assessment was collected once per week for a total of 9 weeks. The way the data is currently formatted looks like this:

Trt                  ID            Sex                 Race         Week               Score

RxA          1003               F               Caucasian       1                           .

RxB          1004               M               Caucasian      1                          .

RxA          1007               F               Caucasian       1                         30

RxA          1007               F               Caucasian       2                         31

RxA          1007               F               Caucasian       3                         36

RxA          1007               F               Caucasian       4                         36

RxA          1007               F               Caucasian       5                         35

RxA          1007               F               Caucasian       6                         15

RxA          1007               F               Caucasian       7                         12

RxA          1007               F               Caucasian       8                         10

RxA          1007               F               Caucasian       9                          7

RxB          1008               M              Caucasia         1                          .

I'm trying to create a variable defined as the time to a response of interest. In this case I want to create a variable that will tell me when each patient has reached a score of 1/2 or less than the score they had at week 1. However I am a bit confused as to how to do this since each patient has a row of data for each week in the study. I created another variable called Responder which is assigned a value of 1 when a subject's score in a given week is 1/2 or less of their week 1 score.

Using that I tried writing the following:

select;

when (Responder=1) responsetime=week;

otherwise;

end;

This variable was assigned a value for EACH week that a subject's score was 1/2 or less of the week 1 score. However, what I am attempting to do is to create a variable that will only have ONE value for each patient and that will be equal to the first week that the subject meets the criteria of <= 1/2 of the week 1 score. In other words, I don't care if a patient was at 1/2 or below the week 1 score on the 5th, 7th, and 8th weeks; I'm only interested in the fact that the subject met the criteria on week 5 and I would like the responsetime variable to be equal to 5 for that patient.

I'm not sure how best to proceed and whether this issue would be best solved by trying to manipulate my data to wind up with one row for each patient or if there is a way to do that while producing this variable.

Ideally I'd like the end result to look like

Trt          ID          Response Time

RxA     1003               N/A

RxB     1004               N/A

RxA     1007                6

RxB     1008               N/A

etc.

If it wouldn't be too difficult I'd also be interesting in knowing how to find the same response time but using subjects who didn't have a score at Week 1 but did starting on Week 2!

Any help would be greatly appreciated!    


Accepted Solutions
Solution
‎08-09-2011 04:40 PM
PROC Star
Posts: 7,363

Help Formatting Data

Since you've already created the variable, in a file called "have", and if there is a maximum number of weeks one could have (in my example less than 99), then you COULD use something like:

data have;

  informat trt $3.;

  informat sex $1.;

  informat race $9.;

  input Trt ID Sex Race Week Score ResponseTime;

  cards;

RxA 1003   F   Caucasian       1    . .

RxB 1004   M   Caucasian       1    . .

RxA 1007   F   Caucasian       1   30 .

RxA 1007   F   Caucasian       2   31 .

RxA 1007   F   Caucasian       3   36 .

RxA 1007   F   Caucasian       4   36 .

RxA 1007   F   Caucasian       5   35 .

RxA 1007   F   Caucasian       6   15 6

RxA 1007   F   Caucasian       7   12 7

RxA 1007   F   Caucasian       8   10 8

RxA 1007   F   Caucasian       9    7 9

RxB 1008   M   Caucasian       1    . .

;

proc format;

  value rt

  99="N/A"

;

run;

data want;

  set have;

  format ResponseTime rt.;

  if missing(ResponseTime) then ResponseTime=99;

run;

proc sort data=want;

  by id trt ResponseTime;

run;

proc sort data=want nodupkey;

  by id trt;

run;

I should point out that using values like 99 to represent missing can be dangerous, as others who also have to access your code might not know that the value only represents missing .. not 99 weeks, but it can definitely simplify selecting the records you want in this case.

Art

View solution in original post


All Replies
Solution
‎08-09-2011 04:40 PM
PROC Star
Posts: 7,363

Help Formatting Data

Since you've already created the variable, in a file called "have", and if there is a maximum number of weeks one could have (in my example less than 99), then you COULD use something like:

data have;

  informat trt $3.;

  informat sex $1.;

  informat race $9.;

  input Trt ID Sex Race Week Score ResponseTime;

  cards;

RxA 1003   F   Caucasian       1    . .

RxB 1004   M   Caucasian       1    . .

RxA 1007   F   Caucasian       1   30 .

RxA 1007   F   Caucasian       2   31 .

RxA 1007   F   Caucasian       3   36 .

RxA 1007   F   Caucasian       4   36 .

RxA 1007   F   Caucasian       5   35 .

RxA 1007   F   Caucasian       6   15 6

RxA 1007   F   Caucasian       7   12 7

RxA 1007   F   Caucasian       8   10 8

RxA 1007   F   Caucasian       9    7 9

RxB 1008   M   Caucasian       1    . .

;

proc format;

  value rt

  99="N/A"

;

run;

data want;

  set have;

  format ResponseTime rt.;

  if missing(ResponseTime) then ResponseTime=99;

run;

proc sort data=want;

  by id trt ResponseTime;

run;

proc sort data=want nodupkey;

  by id trt;

run;

I should point out that using values like 99 to represent missing can be dangerous, as others who also have to access your code might not know that the value only represents missing .. not 99 weeks, but it can definitely simplify selecting the records you want in this case.

Art

New Contributor
Posts: 2

Help Formatting Data

Thank you very much! This was exactly what I was looking for! I was able to adapt this code to apply to my entire data set and now have the entries merged in the fashion I was looking for! Thank you again!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 119 views
  • 0 likes
  • 2 in conversation