New Contributor
Posts: 3

# Obtaining mean for variable number of variables per subject

Dear All,

I am working on a project for which I have to get mean levels of a disease marker (let's call it DM). The issue is that the mean must only include measures before the individual developed disease, and the time each person may have developed disease differs. So, for example, if ID=1 has 7 time points where DM was assessed and this person developed disease at time 5, I would need the mean of DM from time 1 to time 4. If ID=2 has DM measured from time 1 to 6 and the person developed disease at time 3, I'd need the mean DM from time 1 to time 2. If another person did not develop disease, I'd get the mean of all the available DM measures. I am really stuck on how to do this. Any help would be greatly appreciated!

Many thanks!

Posts: 5,616

## Re: Obtaining mean for variable number of variables per subject

It all depends on your data structure. Please post an example.

PG
New Contributor
Posts: 3

## Re: Obtaining mean for variable number of variables per subject

Dear PGStats,

I have a wide data set (an example is given below).

data test;

input id dm1-dm7 disease time;

cards;

1 35 38 40 43 53 55 55 1 7

2 25 29 20 . 25 30 28 0 .

3 38 40 40 50 48 30 35 1 5

4 44 43 41 39 38 . 37 1 3

5 28 25 30 28 30 27 31 0 .

6 38 40 45 . 50 29 28 1 5

;

run;

Above, ID 1 developed disease at the 7th time point, so I'd need to get the means of DM1-DM6.

ID 2 did not develop disease, so I'd need the means of all available DM values.

ID 3 developed disease at time 5 so I'd get the mean of DM from times 1 to 4, etc.

Thank you for any help!

TDC.

run;

Posts: 1,848

## Re: Obtaining mean for variable number of variables per subject

In case there is only one record per ID (and MEAN means AVERAGE)

than you can do it by:

data test;

input id dm1-dm7 disease time;

array dm {*} dm1-dm7;

sum_dm = 0;

if time > 1 then do;

do i=1 to (time -1);

sum_dm = sum(of sum_dm, dm(i));

end;

MEAN = sum_dm / (time -1);

end;

else MEAN = .;

cards;

1 35 38 40 43 53 55 55 1 7

2 25 29 20 . 25 30 28 0 .

3 38 40 40 50 48 30 35 1 5

4 44 43 41 39 38 . 37 1 3

5 28 25 30 28 30 27 31 0 .

6 38 40 45 . 50 29 28 1 5

;

run;

Posts: 5,616

## Re: Obtaining mean for variable number of variables per subject

@Shmuel, your denominator will be wrong if missing dm values occur before time. If you want to calculate your own average, you must keep track of the count of non-missing dm values as well as the sum of dm values.

PG
Super User
Posts: 13,898

## Re: Obtaining mean for variable number of variables per subject

Something like this perhaps:

``````
data want;
set test;
array dm dm1-dm7;
if disease and not missing(time) then do;
do i=1 to time;
dmsum=sum(dmsum,dm[i]);
end;
dmmean= dmsum/time;
end;
drop i;
run;

``````
Super User
Posts: 23,963

## Re: Obtaining mean for variable number of variables per subject

Transpose and use proc means.

``````proc sort data=test;
by id disease time;
run;

proc transpose data=test out=long;
by id disease time;
var dm1-dm7;
run;

data long;
set long;*calculate measurement time from variable name, generally bad practice to store data in column headers;
dm_time=input(compress(_name_, , 'kd'), 8.);

run;

proc means data=long nway mean n std;
class id disease ;
var col1;
where dm_time < time or time=.;
run;``````
Posts: 5,616

## Re: Obtaining mean for variable number of variables per subject

If you want to keep it wide:

``````data want;
set test;
array dm dm1-dm7;
array x{7};
do i = 1 to coalesce(time, dim(dm)+1) - 1;
x{i} = dm{i};
end;
meanDm = mean(of x{*});
drop i x:;
run;``````
PG
New Contributor
Posts: 3

## Re: Obtaining mean for variable number of variables per subject

Thank you All so much!! I really appreciate your help!!

TDC.

Discussion stats
• 8 replies
• 369 views
• 1 like
• 5 in conversation