I have 2 data sets one with mean value for each variable(proc means output) and the other with values. I want to replace the missing values with mean value for each particular variable in the dataset.
Any help please.
I tried call symput but does it work for multiple variables?
On Tue, 16 Apr 2013 17:45:04 +0100, Ian
<ian.wakeling@HANANI.QISTATS.CO.UK> wrote:
>How about using PROC STDIZE?
proc stdize data = old
out = new
missing = mean
reponly;
var _numeric_;
run;
>You can also impute with the median or read values from a second data set.
Hi..
Not sure how you going to accomplish this with Call Symput..If possible can you provide your code which you have tried???
Other then that coalesce function will be helpful for your requirement.
Regards.
Sanjeev.K
hi ,
try this...but............please atleast post your working / sample datasets.....
data samp;
infile datalines missover;
input id sal ;
if missing(sal) then sal=symget("avgsal");
datalines;
001 5000
002 7000
003
004 7832
005
;
run;
proc sql noprint ;
select avg(sal) into : avgsal from samp ;
run;
Regards
ALLU
PROC STDIZE, for 1 or more variables in the VAR statement and BY groups if you like.
1. read in the data structure of your data.
make an array of all the numeric variables
2. read in the single row of the Means data set
whose variable have been autonamed
so that Mean is the prefix of each variable.
The issues are getting the names of the variables aligned with their means.
so that the array references are correct.
alternately you can use sql to extract the list of variable
and feed that list to Summary
which you can then use in the array statements.
DATA Update;*todo: (drop = Mean:);
*get the data structure for the array Valu;
if 0 then set Library.Mydata;
array Valu(*) _numeric_;
*read the Means data set -- one row -- into the set of values for updating;
set Means;
array Mean(*);
do until(EndoFile);
set Library.MyData end = EndoFile;
do i = 1 to dim(Valu);
if Valu(I) eq . then Valu(I) = Mean(I);
end;
output;
end;
stop;
run;
PROC SQL; describe table &syslast;
quit;
Ron Fehd matching maven
I have posted a general solution on SAS community wiki:
http://www.sascommunity.org/wiki/Replacing_Missing_with_Mean
On Tue, 16 Apr 2013 17:45:04 +0100, Ian
<ian.wakeling@HANANI.QISTATS.CO.UK> wrote:
>How about using PROC STDIZE?
proc stdize data = old
out = new
missing = mean
reponly;
var _numeric_;
run;
>You can also impute with the median or read values from a second data set.
This works Ron.
Thank:-)
you can merge this 2 datasets with by variables which you have used in proc means to create multiple means values. then you can assign the mean values to missing values if needed.
I think i will give clear idea.
First dataset with means.
mean_score mean_forum mean_subs
23.45 43.45 12.22
Second data set with original values.
Score forum subs
12 23 .
145 12 87
. 12 98
15 . 67
Now i want to replace the missing values in second data set with the mean value.
For example in score if any value is missing it should be replaced with mean_score.
take all mean values to respective Macro variables. Then in the next step use symget or simple if condition to replace the values.
Data _null_;
set mean_data;
call symput("Mscore", strip(put(mean_score)));
call symput("Mforum", strip(put(mean_forum)));
call symput("Msubs", strip(put(mean_subs)));
run;
data hh;
set values;
if score=. then score=&mean_score;
else score=score;
....
....
....
run;
do the same for other variables, Hope this helps
It is easy :
data want;
if _n_ eq 1 then set mean_data;
set have ;
score=coalesce( score ,mean_score) ;
forum=coalesce( forum,mean_forum);
...........
run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.