BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lixuan
Obsidian | Level 7

Hi, As follwing , I wanna  complete the missing data , but my data is too few, so I have to use average as my method. My code didnt work. Thank you for any suggestion.

data WORK.have;
  infile datalines dsd truncover;
  input oftic:$6. yymm:YYMMN6. yymm_e:YYMMN6. date:YYMMN6. g:32. meanest:32. g_indavg:32.;
  format yymm YYMMN6. yymm_e YYMMN6. date YYMMN6.;
  label oftic="Official Ticker Symbol" g="g" meanest="Mean Estimate";
datalines4;
ABC,200403,200207,200212,,,
ABC,200403,200207,200303,,,
ABC,200403,200207,200306,,,
ABC,200403,200207,200309,,,
ABC,200403,200207,200312,,,
ABC,200403,200207,200403,15.21,1.03,15.21
ABC,200403,200207,200406,,,
ABC,200403,200207,200409,,,
ABC,200403,200207,200412,,,
ABC,200403,200207,200503,,,
ABC,200403,200207,200506,,,
ABC,200403,200207,200509,,,
ABC,200403,200301,200306,,,
ABC,200403,200301,200309,,,
ABC,200403,200301,200312,,,
ABC,200403,200301,200403,15.21,1.03,15.21
ABC,200403,200301,200406,,,
ABC,200403,200301,200409,,,
ABC,200403,200301,200412,,,
ABC,200403,200301,200503,,,
ABC,200403,200301,200506,,,
ABC,200403,200301,200509,,,
ABC,200403,200301,200512,,,
ABC,200403,200301,200603,,,
;;;;
data want;
set have;
if g='.' then g1=mean(g);
else g1=g;
if meanest='.' then meanest1=mean(meanest);
else meanest1=meanest;
if g_indavg='.' then g_indavg1=mean(g_indavg);
else g_indavg1=g_indavg;
by oftic yymm_e;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

STDIZE as mentioned by @stat_sas is the simplest way:

 

*Create sample with missing data;
data missing;
	set sashelp.class;

	if mod(_n_, 2) eq 1 then
		call missing(of _numeric_);
run;

*Display for example purposes;
title 'Original data';
proc print ;
run;

*Show mean values to check results;
title 'Mean results to check output' proc means data=missing mean;
run;

*Replace missing with mean, for ALL NUMERIC variables. If you want specific variables
add them via a VAR statement;
proc stdize out=class_mean reponly method=mean;
run;

*Diplay results for example;
title 'Final - Missing replaced with Mean';
proc print data=class_mean;
run;

View solution in original post

9 REPLIES 9
stat_sas
Ammonite | Level 13

Hi,

 

Please try proc stdize.

SuzanneDorinski
Lapis Lazuli | Level 10
data WORK.have;
  infile datalines dsd truncover;
  input oftic:$6. yymm:YYMMN6. yymm_e:YYMMN6. date:YYMMN6. g:32. meanest:32. g_indavg:32.;
  format yymm YYMMN6. yymm_e YYMMN6. date YYMMN6.;
  label oftic="Official Ticker Symbol" g="g" meanest="Mean Estimate";
datalines4;
ABC,200403,200207,200212,,,
ABC,200403,200207,200303,,,
ABC,200403,200207,200306,,,
ABC,200403,200207,200309,,,
ABC,200403,200207,200312,,,
ABC,200403,200207,200403,15.21,1.03,15.21
ABC,200403,200207,200406,,,
ABC,200403,200207,200409,,,
ABC,200403,200207,200412,,,
ABC,200403,200207,200503,,,
ABC,200403,200207,200506,,,
ABC,200403,200207,200509,,,
ABC,200403,200301,200306,,,
ABC,200403,200301,200309,,,
ABC,200403,200301,200312,,,
ABC,200403,200301,200403,15.21,1.03,15.21
ABC,200403,200301,200406,,,
ABC,200403,200301,200409,,,
ABC,200403,200301,200412,,,
ABC,200403,200301,200503,,,
ABC,200403,200301,200506,,,
ABC,200403,200301,200509,,,
ABC,200403,200301,200512,,,
ABC,200403,200301,200603,,,
;;;;

proc print data=have;
  title 'Have data set from datalines';
run;

proc sql;
  create table have_means as
    select *,
           mean(g) as mean_g,
           mean(meanest) as mean_meanest,
           mean(g_indavg) as mean_g_indavg
      from have;
quit;

data want(drop=mean_:);
set have_means;
if g=. then g1=mean_g;
else g1=g;
if meanest=. then meanest1=mean_meanest;
else meanest1=meanest;
if g_indavg=. then g_indavg1=mean_g_indavg;
else g_indavg1=g_indavg;
by oftic yymm_e;
run;

proc print data=want;
  title 'After using means for missing data';
run;
Reeza
Super User

STDIZE as mentioned by @stat_sas is the simplest way:

 

*Create sample with missing data;
data missing;
	set sashelp.class;

	if mod(_n_, 2) eq 1 then
		call missing(of _numeric_);
run;

*Display for example purposes;
title 'Original data';
proc print ;
run;

*Show mean values to check results;
title 'Mean results to check output' proc means data=missing mean;
run;

*Replace missing with mean, for ALL NUMERIC variables. If you want specific variables
add them via a VAR statement;
proc stdize out=class_mean reponly method=mean;
run;

*Diplay results for example;
title 'Final - Missing replaced with Mean';
proc print data=class_mean;
run;
PaigeMiller
Diamond | Level 26

Hi, As follwing , I wanna  complete the missing data , but my data is too few, so I have to use average as my method.


As always, I am the voice of gloom and doom, but if you really have very little data and lots of missings, then of course you can do the SAS coding and make the mean appear in place of the missings, but this doesn't really get you anywhere in my opinion. The problem that you have very little data and lots of missings is not fixed, it still exists, and is simply masked by this method. At worst, in this situation, your replacing the many missings with the mean of a small number of data could be mis-leading. Impossible to know.

 

In other words, just because you CAN program something in SAS doesn't mean you should program it.

--
Paige Miller
lixuan
Obsidian | Level 7

Hi Paigemiller, Right, I get your idea, also I think it unreasonable to use average. But I failed to get missing data from orginal source. Any suggestion will be appreciated.

PaigeMiller
Diamond | Level 26

Depending on how much missing data you have, this stops being a programming problem, and it essentially becomes a data quality problem.

 

If, for example, 90% of your data is missing, my advice would be to state that you have very poor data quality and you cannot come to valid conclusions from this data.

 

But only you know exactly what's in your data, and you are the one who has to decide whether the data quality is good enough to do anything or not.

--
Paige Miller
lixuan
Obsidian | Level 7

Actually, my data likes that and I am checking my code and datasets carefully. As you know, giving up is very diffcult to a researcher. Thanks a lot for your suggestion.

Rick_SAS
SAS Super FREQ

To expand on @PaigeMiller's comment, replacing missing values by the average results in variables that have smaller variances than they should. Consequently. inferences (standard errors, p-values) will be smaller than they should be, which means that you might draw invalid conclusions from the data. 

 

Replacing missing values with a nonmissing value is called "imputation." A statistically valid way to address missing data is through a process called multiple imputations, which is carried out in SAS by using PROC MI and PROC MIANALYZE. In addition to the documentation, you might want to browse SAS conference proceedings on the topic.  Try doing an internet search for 

    "proc mi" berglund site:lexjansen.com

or look at the book Multiple Imputation of Missing Data Using SAS

lixuan
Obsidian | Level 7

Hi Rick, thank you so much,I will study the book.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1224 views
  • 3 likes
  • 6 in conversation