DATA Step, Macro, Functions and more

How to complete missing data with average

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 97
Accepted Solution

How to complete missing data with average

Hi, As follwing , I wanna  complete the missing data , but my data is too few, so I have to use average as my method. My code didnt work. Thank you for any suggestion.

data WORK.have;
  infile datalines dsd truncover;
  input oftic:$6. yymm:YYMMN6. yymm_e:YYMMN6. date:YYMMN6. g:32. meanest:32. g_indavg:32.;
  format yymm YYMMN6. yymm_e YYMMN6. date YYMMN6.;
  label oftic="Official Ticker Symbol" g="g" meanest="Mean Estimate";
datalines4;
ABC,200403,200207,200212,,,
ABC,200403,200207,200303,,,
ABC,200403,200207,200306,,,
ABC,200403,200207,200309,,,
ABC,200403,200207,200312,,,
ABC,200403,200207,200403,15.21,1.03,15.21
ABC,200403,200207,200406,,,
ABC,200403,200207,200409,,,
ABC,200403,200207,200412,,,
ABC,200403,200207,200503,,,
ABC,200403,200207,200506,,,
ABC,200403,200207,200509,,,
ABC,200403,200301,200306,,,
ABC,200403,200301,200309,,,
ABC,200403,200301,200312,,,
ABC,200403,200301,200403,15.21,1.03,15.21
ABC,200403,200301,200406,,,
ABC,200403,200301,200409,,,
ABC,200403,200301,200412,,,
ABC,200403,200301,200503,,,
ABC,200403,200301,200506,,,
ABC,200403,200301,200509,,,
ABC,200403,200301,200512,,,
ABC,200403,200301,200603,,,
;;;;
data want;
set have;
if g='.' then g1=mean(g);
else g1=g;
if meanest='.' then meanest1=mean(meanest);
else meanest1=meanest;
if g_indavg='.' then g_indavg1=mean(g_indavg);
else g_indavg1=g_indavg;
by oftic yymm_e;
run;

Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 17,851

Re: How to complete missing data with average

STDIZE as mentioned by @stat_sas is the simplest way:

 

*Create sample with missing data;
data missing;
	set sashelp.class;

	if mod(_n_, 2) eq 1 then
		call missing(of _numeric_);
run;

*Display for example purposes;
title 'Original data';
proc print ;
run;

*Show mean values to check results;
title 'Mean results to check output' proc means data=missing mean;
run;

*Replace missing with mean, for ALL NUMERIC variables. If you want specific variables
add them via a VAR statement;
proc stdize out=class_mean reponly method=mean;
run;

*Diplay results for example;
title 'Final - Missing replaced with Mean';
proc print data=class_mean;
run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,204

Re: How to complete missing data with average

Hi,

 

Please try proc stdize.

Contributor
Posts: 65

Re: How to complete missing data with average

data WORK.have;
  infile datalines dsd truncover;
  input oftic:$6. yymm:YYMMN6. yymm_e:YYMMN6. date:YYMMN6. g:32. meanest:32. g_indavg:32.;
  format yymm YYMMN6. yymm_e YYMMN6. date YYMMN6.;
  label oftic="Official Ticker Symbol" g="g" meanest="Mean Estimate";
datalines4;
ABC,200403,200207,200212,,,
ABC,200403,200207,200303,,,
ABC,200403,200207,200306,,,
ABC,200403,200207,200309,,,
ABC,200403,200207,200312,,,
ABC,200403,200207,200403,15.21,1.03,15.21
ABC,200403,200207,200406,,,
ABC,200403,200207,200409,,,
ABC,200403,200207,200412,,,
ABC,200403,200207,200503,,,
ABC,200403,200207,200506,,,
ABC,200403,200207,200509,,,
ABC,200403,200301,200306,,,
ABC,200403,200301,200309,,,
ABC,200403,200301,200312,,,
ABC,200403,200301,200403,15.21,1.03,15.21
ABC,200403,200301,200406,,,
ABC,200403,200301,200409,,,
ABC,200403,200301,200412,,,
ABC,200403,200301,200503,,,
ABC,200403,200301,200506,,,
ABC,200403,200301,200509,,,
ABC,200403,200301,200512,,,
ABC,200403,200301,200603,,,
;;;;

proc print data=have;
  title 'Have data set from datalines';
run;

proc sql;
  create table have_means as
    select *,
           mean(g) as mean_g,
           mean(meanest) as mean_meanest,
           mean(g_indavg) as mean_g_indavg
      from have;
quit;

data want(drop=mean_:);
set have_means;
if g=. then g1=mean_g;
else g1=g;
if meanest=. then meanest1=mean_meanest;
else meanest1=meanest;
if g_indavg=. then g_indavg1=mean_g_indavg;
else g_indavg1=g_indavg;
by oftic yymm_e;
run;

proc print data=want;
  title 'After using means for missing data';
run;
Solution
3 weeks ago
Super User
Posts: 17,851

Re: How to complete missing data with average

STDIZE as mentioned by @stat_sas is the simplest way:

 

*Create sample with missing data;
data missing;
	set sashelp.class;

	if mod(_n_, 2) eq 1 then
		call missing(of _numeric_);
run;

*Display for example purposes;
title 'Original data';
proc print ;
run;

*Show mean values to check results;
title 'Mean results to check output' proc means data=missing mean;
run;

*Replace missing with mean, for ALL NUMERIC variables. If you want specific variables
add them via a VAR statement;
proc stdize out=class_mean reponly method=mean;
run;

*Diplay results for example;
title 'Final - Missing replaced with Mean';
proc print data=class_mean;
run;
Trusted Advisor
Posts: 1,619

Re: How to complete missing data with average

[ Edited ]

Hi, As follwing , I wanna  complete the missing data , but my data is too few, so I have to use average as my method.


As always, I am the voice of gloom and doom, but if you really have very little data and lots of missings, then of course you can do the SAS coding and make the mean appear in place of the missings, but this doesn't really get you anywhere in my opinion. The problem that you have very little data and lots of missings is not fixed, it still exists, and is simply masked by this method. At worst, in this situation, your replacing the many missings with the mean of a small number of data could be mis-leading. Impossible to know.

 

In other words, just because you CAN program something in SAS doesn't mean you should program it.

Frequent Contributor
Posts: 97

Re: How to complete missing data with average

Hi Paigemiller, Right, I get your idea, also I think it unreasonable to use average. But I failed to get missing data from orginal source. Any suggestion will be appreciated.

Trusted Advisor
Posts: 1,619

Re: How to complete missing data with average

Depending on how much missing data you have, this stops being a programming problem, and it essentially becomes a data quality problem.

 

If, for example, 90% of your data is missing, my advice would be to state that you have very poor data quality and you cannot come to valid conclusions from this data.

 

But only you know exactly what's in your data, and you are the one who has to decide whether the data quality is good enough to do anything or not.

Frequent Contributor
Posts: 97

Re: How to complete missing data with average

Actually, my data likes that and I am checking my code and datasets carefully. As you know, giving up is very diffcult to a researcher. Thanks a lot for your suggestion.

SAS Super FREQ
Posts: 3,482

Re: How to complete missing data with average

To expand on @PaigeMiller's comment, replacing missing values by the average results in variables that have smaller variances than they should. Consequently. inferences (standard errors, p-values) will be smaller than they should be, which means that you might draw invalid conclusions from the data. 

 

Replacing missing values with a nonmissing value is called "imputation." A statistically valid way to address missing data is through a process called multiple imputations, which is carried out in SAS by using PROC MI and PROC MIANALYZE. In addition to the documentation, you might want to browse SAS conference proceedings on the topic.  Try doing an internet search for 

    "proc mi" berglund site:lexjansen.com

or look at the book Multiple Imputation of Missing Data Using SAS

Frequent Contributor
Posts: 97

Re: How to complete missing data with average

Hi Rick, thank you so much,I will study the book.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 352 views
  • 3 likes
  • 6 in conversation