A longitudinal clinical trial was conducted to examine the effectiveness of an experimental treatment in preventing disease progression. Subjects identified at an early stage of the disease are entered into the study and randomized to receive either standard treatment (the control group) or the experimental treatment. Subjects are scheduled for regular follow-up visits at roughly 6-month intervals. The rate of disease progression is of primary interest.
Variables in the file are:
1. Patient id (1-50)
2. Treatment group ( 0 = control group, and 1 = experimental group)
3. Visit number, numbered consecutively for each subject
4. Time since last visit (months, with 0 at the first visit)
5. Stage of disease (0 = early stage, and 1 = after disease progression)
All subjects are in the early stage of disease at the first visit. Subjects remain in the study after disease progression, so there may be several records from a study patient after disease progression. The data are sorted by study visit, and then by patient id within study visit. There are no missing data.
1. Keeping data in a “one row for each visit” format (with multiple observations per patient) create a variable giving the number of months between the entry into the study for that subject and that study visit. (This will give the total number of months on study at each visit). PROC PRINT (e.g., the first 20 observations) a partial listing of the data to make sure that this new variable was created correctly.
2. For a study like this, we want to report on the number of subjects in each group and how many total visits that there were per group. How many subjects and how many visits are there for each treatment group? (Hint: use PROC MEANS with two new created variables. The sum of 1s and 0s at the subject level is equal to the number of subjects. Also, what would the sum of 1s represent if the value, 1, is assigned for each observation?)
3. For analysis, we would now like a new data set with only one row per subject. As part of this data set, we want a variable that gives the length of time that the subject was in the study up to and including the visit when disease progression is first noted (stage=1 if observed). For patients whose disease did not progress during the study (all observations with stage=0), however. this would be their total time from visit 1 until the last visit for that patient. (Hint: to create this variable, you should use the variable created in part 1., and use the first. and last. commands. This is tricky and you may need to try this a few different ways before you find one that works. To be sure that you have it right, check your results on a few subjects who progress and a few who do not.) For the one observation for each patient, only keep the id number, treatment group, follow-up time, and whether disease has progressed or not.
My code to solve all questions
for question1
filename in "E:\longdata.dat";
data temp;
infile in;
input pt_id trt_grp visits time stage ;
run;
proc sort data = temp;by pt_id;run;
data temp1;
set temp;
by pt_id;
if first.pt_id then totmnth=0;
totmnth+time;
run;
for question 2
proc print data=temp1 (obs=20);run;
proc sort data=temp1;by trt_grp;run;
data temp1;
set temp;
by pt_id;
if first.pt_id then totmnth=0;
totmnth+time;
if first.pt_id then id=1;
else id=0;
run;
proc means data=temp1 noprint;
by trt_grp;
var id visits;
output out=one sum(id)=totalsubj n= visits;
run;
for question 3
data temp2;
set temp1 ;
by pt_id ;
retain lngth;
if first.pt_id then lngth=0;
lngth+totmnth;
if last.pt_id then output;
run;
now in question three the code which I have used is somewhat wrong according to me, because it is not fullfilling whats asked in the question regarding the stage condition
and I am not able to figure oiut how to fulfill the stage condition for question 3
To fulfill the stage condition in question 3, you can add a condition to the if
statement that checks for the value of stage
. For example, you could do something like this:
data temp2;
set temp1;
by pt_id;
retain lngth;
if first.pt_id then lngth=0;
lngth+totmnth;
if last.pt_id or stage=1 then output;
run;
This will output the data for each patient only if they have reached the last visit in the study or if they have reached stage 1 (i.e., disease progression has been observed).
You could also add an additional variable to the output data set that indicates whether disease progression has been observed for each patient. This could be done by adding a new variable called disease_progressed
and setting its value to 1 if the patient has reached stage 1, and 0 otherwise. This could be done like this:
data temp2;
set temp1;
by pt_id;
retain lngth;
if first.pt_id then lngth=0;
lngth+totmnth;
if last.pt_id or stage=1 then do;
output;
disease_progressed = (stage=1);
end;
run;
This should produce the desired output data set with one row per patient, containing the patient ID, treatment group, follow-up time, and a variable indicating whether disease progression has been observed. You can then use this data set for further analysis.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.