Hi, As follwing , I wanna complete the missing data , but my data is too few, so I have to use average as my method. My code didnt work. Thank you for any suggestion.
data WORK.have;
infile datalines dsd truncover;
input oftic:$6. yymm:YYMMN6. yymm_e:YYMMN6. date:YYMMN6. g:32. meanest:32. g_indavg:32.;
format yymm YYMMN6. yymm_e YYMMN6. date YYMMN6.;
label oftic="Official Ticker Symbol" g="g" meanest="Mean Estimate";
datalines4;
ABC,200403,200207,200212,,,
ABC,200403,200207,200303,,,
ABC,200403,200207,200306,,,
ABC,200403,200207,200309,,,
ABC,200403,200207,200312,,,
ABC,200403,200207,200403,15.21,1.03,15.21
ABC,200403,200207,200406,,,
ABC,200403,200207,200409,,,
ABC,200403,200207,200412,,,
ABC,200403,200207,200503,,,
ABC,200403,200207,200506,,,
ABC,200403,200207,200509,,,
ABC,200403,200301,200306,,,
ABC,200403,200301,200309,,,
ABC,200403,200301,200312,,,
ABC,200403,200301,200403,15.21,1.03,15.21
ABC,200403,200301,200406,,,
ABC,200403,200301,200409,,,
ABC,200403,200301,200412,,,
ABC,200403,200301,200503,,,
ABC,200403,200301,200506,,,
ABC,200403,200301,200509,,,
ABC,200403,200301,200512,,,
ABC,200403,200301,200603,,,
;;;;
data want;
set have;
if g='.' then g1=mean(g);
else g1=g;
if meanest='.' then meanest1=mean(meanest);
else meanest1=meanest;
if g_indavg='.' then g_indavg1=mean(g_indavg);
else g_indavg1=g_indavg;
by oftic yymm_e;
run;
STDIZE as mentioned by @stat_sas is the simplest way:
*Create sample with missing data;
data missing;
set sashelp.class;
if mod(_n_, 2) eq 1 then
call missing(of _numeric_);
run;
*Display for example purposes;
title 'Original data';
proc print ;
run;
*Show mean values to check results;
title 'Mean results to check output' proc means data=missing mean;
run;
*Replace missing with mean, for ALL NUMERIC variables. If you want specific variables
add them via a VAR statement;
proc stdize out=class_mean reponly method=mean;
run;
*Diplay results for example;
title 'Final - Missing replaced with Mean';
proc print data=class_mean;
run;
Hi,
Please try proc stdize.
data WORK.have;
infile datalines dsd truncover;
input oftic:$6. yymm:YYMMN6. yymm_e:YYMMN6. date:YYMMN6. g:32. meanest:32. g_indavg:32.;
format yymm YYMMN6. yymm_e YYMMN6. date YYMMN6.;
label oftic="Official Ticker Symbol" g="g" meanest="Mean Estimate";
datalines4;
ABC,200403,200207,200212,,,
ABC,200403,200207,200303,,,
ABC,200403,200207,200306,,,
ABC,200403,200207,200309,,,
ABC,200403,200207,200312,,,
ABC,200403,200207,200403,15.21,1.03,15.21
ABC,200403,200207,200406,,,
ABC,200403,200207,200409,,,
ABC,200403,200207,200412,,,
ABC,200403,200207,200503,,,
ABC,200403,200207,200506,,,
ABC,200403,200207,200509,,,
ABC,200403,200301,200306,,,
ABC,200403,200301,200309,,,
ABC,200403,200301,200312,,,
ABC,200403,200301,200403,15.21,1.03,15.21
ABC,200403,200301,200406,,,
ABC,200403,200301,200409,,,
ABC,200403,200301,200412,,,
ABC,200403,200301,200503,,,
ABC,200403,200301,200506,,,
ABC,200403,200301,200509,,,
ABC,200403,200301,200512,,,
ABC,200403,200301,200603,,,
;;;;
proc print data=have;
title 'Have data set from datalines';
run;
proc sql;
create table have_means as
select *,
mean(g) as mean_g,
mean(meanest) as mean_meanest,
mean(g_indavg) as mean_g_indavg
from have;
quit;
data want(drop=mean_:);
set have_means;
if g=. then g1=mean_g;
else g1=g;
if meanest=. then meanest1=mean_meanest;
else meanest1=meanest;
if g_indavg=. then g_indavg1=mean_g_indavg;
else g_indavg1=g_indavg;
by oftic yymm_e;
run;
proc print data=want;
title 'After using means for missing data';
run;
STDIZE as mentioned by @stat_sas is the simplest way:
*Create sample with missing data;
data missing;
set sashelp.class;
if mod(_n_, 2) eq 1 then
call missing(of _numeric_);
run;
*Display for example purposes;
title 'Original data';
proc print ;
run;
*Show mean values to check results;
title 'Mean results to check output' proc means data=missing mean;
run;
*Replace missing with mean, for ALL NUMERIC variables. If you want specific variables
add them via a VAR statement;
proc stdize out=class_mean reponly method=mean;
run;
*Diplay results for example;
title 'Final - Missing replaced with Mean';
proc print data=class_mean;
run;
Hi, As follwing , I wanna complete the missing data , but my data is too few, so I have to use average as my method.
As always, I am the voice of gloom and doom, but if you really have very little data and lots of missings, then of course you can do the SAS coding and make the mean appear in place of the missings, but this doesn't really get you anywhere in my opinion. The problem that you have very little data and lots of missings is not fixed, it still exists, and is simply masked by this method. At worst, in this situation, your replacing the many missings with the mean of a small number of data could be mis-leading. Impossible to know.
In other words, just because you CAN program something in SAS doesn't mean you should program it.
Hi Paigemiller, Right, I get your idea, also I think it unreasonable to use average. But I failed to get missing data from orginal source. Any suggestion will be appreciated.
Depending on how much missing data you have, this stops being a programming problem, and it essentially becomes a data quality problem.
If, for example, 90% of your data is missing, my advice would be to state that you have very poor data quality and you cannot come to valid conclusions from this data.
But only you know exactly what's in your data, and you are the one who has to decide whether the data quality is good enough to do anything or not.
Actually, my data likes that and I am checking my code and datasets carefully. As you know, giving up is very diffcult to a researcher. Thanks a lot for your suggestion.
To expand on @PaigeMiller's comment, replacing missing values by the average results in variables that have smaller variances than they should. Consequently. inferences (standard errors, p-values) will be smaller than they should be, which means that you might draw invalid conclusions from the data.
Replacing missing values with a nonmissing value is called "imputation." A statistically valid way to address missing data is through a process called multiple imputations, which is carried out in SAS by using PROC MI and PROC MIANALYZE. In addition to the documentation, you might want to browse SAS conference proceedings on the topic. Try doing an internet search for
"proc mi" berglund site:lexjansen.com
or look at the book Multiple Imputation of Missing Data Using SAS
Hi Rick, thank you so much,I will study the book.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.