Contributor
Posts: 68

# Imputing Missing Data Using the Mean

Hello,

I have to compute some 3 summary scores, and there are N=61 participations, each of which would receive 3 summary scores (IPTFACTOR, DTFACTOR and CBTFACTOR). Each factor is the summary score of 5 or 6 variables. I have defined them:

CBTFACTOR = SUM(PP_05, PP_07, PP_08,  PP_11, PP_12, PP_14);

DTFACTOR = SUM(PP_02, PP_06, PP_10, PP_13, PP_16);

IPTFACTOR = SUM(PP_01, PP_03, PP_04, PP_09, PP_15);

--

Here's a way of wording  it.

Example: Looking at CBTFACTOR:

We should probably impute a score for those items that are missing on the scale (i.e., summary score) for those who completed 75% of the items of the scale (i.e., summary score). If they miss more than that, our imputation may not be very good estimate. So for example, the CBT FACTOR has 6 items. But the person is missing one score. Let's say they scored 3, 3, 3, 4, 5, [missing/no response]. Then their average is 3.6. So they would get a 3.6 for PP_14. So their average score will be 3.6, but their total CBT factor score will be 21.6.

So I tried writing this in SAS, for the CBT score:

IF nmiss(PP_05, PP_07, PP_08,  PP_11, PP_12, PP_14)>4 then CBTFACTOR=MEAN(PP_05, PP_07, PP_08,  PP_11, PP_12, PP_14)*6;

But now, I want to tell SAS to make the missing entry, PP_14 (if you see how CBTFACTOR is calculated above, PP_14 is the last entry), 3.6. How do I do this? I just have the above code but not sure what to do with it. I hope I am being clear...  But if they have more than 75% of entries missing then we won't impute. Any ideas on how to code this in SAS? I don't think  it is supposed to be too  difficult, but I'm not sure what  to write after the line of code above.

Thanks,

Gina

Super User
Posts: 10,219

## Imputing Missing Data Using the Mean

Check

proc stdize  reponly missing=

Contributor
Posts: 68

## Imputing Missing Data Using the Mean

I just googled a few pages regarding proc stdize. It seems like it would work, but how would I tell SAS to use the method above in the "missing=" statement? I don't want it to just standardize them, but to use the mean of the given values, only if less than 75% are missing.

PROC STDIZE data=h.ginaimputed REPONLY MISSING= ????????;

run;

thanks,

Gina

PROC Star
Posts: 7,686

## Imputing Missing Data Using the Mean

Can't you get what you want with a fairly simple datastep?  e.g.:

data have;

input PP_01-PP_16;

cards;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 . 8 9 10 11 . 13 14 15 16

1 2 3 4 3 6 3 3 9 10 4 5 13 . 15 16

;

data want;

set have;

array cbt pp_05 pp_07 pp_08 pp_11 pp_12 pp_14;

array dft pp_02 pp_06 pp_10 pp_13 pp_16;

array ipt pp_01 pp_03 pp_04 pp_09 pp_15;

CBTmean=mean(of cbt(*));

if nmiss(of cbt(*)) le 2 then do over cbt;

if missing(cbt) then cbt=CBTmean;

end;

DFTmean=mean(of dft(*));

if nmiss(of dft(*)) le 1 then do over dft;

if missing(dft) then dft=DFTmean;

end;

IPTmean=mean(of ipt(*));

if nmiss(of ipt(*)) le 1 then do over ipt;

if missing(ipt) then ipt=IPTmean;

end;

CBTFACTOR = SUM(PP_05, PP_07, PP_08,  PP_11, PP_12, PP_14);

DTFACTOR = SUM(PP_02, PP_06, PP_10, PP_13, PP_16);

IPTFACTOR = SUM(PP_01, PP_03, PP_04, PP_09, PP_15);

run;

Discussion stats
• 3 replies
• 377 views
• 0 likes
• 3 in conversation