BookmarkSubscribeRSS Feed
TDC
Calcite | Level 5 TDC
Calcite | Level 5

Dear All,

 

I am working on a project for which I have to get mean levels of a disease marker (let's call it DM). The issue is that the mean must only include measures before the individual developed disease, and the time each person may have developed disease differs. So, for example, if ID=1 has 7 time points where DM was assessed and this person developed disease at time 5, I would need the mean of DM from time 1 to time 4. If ID=2 has DM measured from time 1 to 6 and the person developed disease at time 3, I'd need the mean DM from time 1 to time 2. If another person did not develop disease, I'd get the mean of all the available DM measures. I am really stuck on how to do this. Any help would be greatly appreciated!

 

Many thanks!

 

 

 

8 REPLIES 8
PGStats
Opal | Level 21

It all depends on your data structure. Please post an example.

PG
TDC
Calcite | Level 5 TDC
Calcite | Level 5

Dear PGStats,

 

I have a wide data set (an example is given below).

data test;

input id dm1-dm7 disease time;

cards;

1 35 38 40 43 53 55 55 1 7

2 25 29 20 . 25 30 28 0 .

3 38 40 40 50 48 30 35 1 5

4 44 43 41 39 38 . 37 1 3

5 28 25 30 28 30 27 31 0 .

6 38 40 45 . 50 29 28 1 5

;

run;

 

Above, ID 1 developed disease at the 7th time point, so I'd need to get the means of DM1-DM6.

ID 2 did not develop disease, so I'd need the means of all available DM values.

ID 3 developed disease at time 5 so I'd get the mean of DM from times 1 to 4, etc.

Thank you for any help!

TDC.

run;

Shmuel
Garnet | Level 18

In case there is only one record per ID (and MEAN means AVERAGE)

than you can do it by:

 

data test;

input id dm1-dm7 disease time;

  array dm {*} dm1-dm7;

  sum_dm = 0;

  if time > 1 then do;

    do i=1 to (time -1);

       sum_dm = sum(of sum_dm, dm(i));

    end;

    MEAN = sum_dm / (time -1);

  end;

  else MEAN = .;

cards;

1 35 38 40 43 53 55 55 1 7

2 25 29 20 . 25 30 28 0 .

3 38 40 40 50 48 30 35 1 5

4 44 43 41 39 38 . 37 1 3

5 28 25 30 28 30 27 31 0 .

6 38 40 45 . 50 29 28 1 5

;

run;

PGStats
Opal | Level 21

@Shmuel, your denominator will be wrong if missing dm values occur before time. If you want to calculate your own average, you must keep track of the count of non-missing dm values as well as the sum of dm values.

PG
ballardw
Super User

Something like this perhaps:


data want;
   set test;
   array dm dm1-dm7;
   if disease and not missing(time) then do;
      do i=1 to time;
         dmsum=sum(dmsum,dm[i]);
      end;
      dmmean= dmsum/time;
   end;
   drop i;
run;   
      
Reeza
Super User

Transpose and use proc means.

 

proc sort data=test;
by id disease time;
run;

proc transpose data=test out=long;
by id disease time;
var dm1-dm7;
run;

data long;
set long;
*calculate measurement time from variable name, generally bad practice to store data in column headers; dm_time=input(compress(_name_, , 'kd'), 8.); run; proc means data=long nway mean n std; class id disease ; var col1; where dm_time < time or time=.; run;
PGStats
Opal | Level 21

If you want to keep it wide:

 

data want;
set test;
array dm dm1-dm7;
array x{7};
do i = 1 to coalesce(time, dim(dm)+1) - 1;
    x{i} = dm{i};
    end;
meanDm = mean(of x{*});
drop i x:;
run;
PG
TDC
Calcite | Level 5 TDC
Calcite | Level 5

Thank you All so much!! I really appreciate your help!!

TDC.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1363 views
  • 1 like
  • 5 in conversation