turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- I have 2 data sets one with mean value for each va...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-15-2013 07:47 AM

I have 2 data sets one with mean value for each variable(proc means output) and the other with values. I want to replace the missing values with mean value for each particular variable in the dataset.

Any help please.

I tried call symput but does it work for multiple variables?

Accepted Solutions

Solution

04-16-2013
02:42 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ron_Fehd_macro_maven

04-16-2013 02:42 PM

On Tue, 16 Apr 2013 17:45:04 +0100, Ian

<ian.wakeling@HANANI.QISTATS.CO.UK> wrote:

>How about using PROC STDIZE?

proc stdize data = old

out = new

missing = mean

reponly;

var _numeric_;

run;

>You can also impute with the median or read values from a second data set.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-15-2013 08:10 AM

Hi..

Not sure how you going to accomplish this with Call Symput..If possible can you provide your code which you have tried???

Other then that **coalesce function ** will be helpful for your requirement.

Regards.

Sanjeev.K

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-15-2013 01:07 PM

hi ,

try this...but............please atleast post your working / sample datasets.....

data samp;

infile datalines missover;

input id sal ;

if missing(sal) then sal=symget("avgsal");

datalines;

001 5000

002 7000

003

004 7832

005

;

run;

proc sql noprint ;

select avg(sal) into : avgsal from samp ;

run;

Regards

ALLU

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-15-2013 01:20 PM

PROC STDIZE, for 1 or more variables in the VAR statement and BY groups if you like.

infile datalines missover;

input id:$3. sal ;

datalines;

001 5000

002 7000

003

004 7832

005

;;;;

var sal;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-15-2013 01:29 PM

1. read in the data structure of your data.

make an array of all the numeric variables

2. read in the single row of the Means data set

whose variable have been autonamed

so that Mean is the prefix of each variable.

The issues are getting the names of the variables aligned with their means.

so that the array references are correct.

alternately you can use sql to extract the list of variable

and feed that list to Summary

which you can then use in the array statements.

DATA Update;*todo: (drop = Mean;

*get the data structure for the array Valu;

if 0 then set Library.Mydata;

array Valu(*) _numeric_;

*read the Means data set -- one row -- into the set of values for updating;

set Means;

array Mean(*);

do until(EndoFile);

set Library.MyData end = EndoFile;

do i = 1 to dim(Valu);

if Valu(I) eq . then Valu(I) = Mean(I);

end;

output;

end;

stop;

run;

PROC SQL; describe table &syslast;

quit;

Ron Fehd matching maven

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ron_Fehd_macro_maven

04-16-2013 06:45 AM

I have posted a general solution on SAS community wiki:

http://www.sascommunity.org/wiki/Replacing_Missing_with_Mean

Solution

04-16-2013
02:42 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ron_Fehd_macro_maven

04-16-2013 02:42 PM

On Tue, 16 Apr 2013 17:45:04 +0100, Ian

<ian.wakeling@HANANI.QISTATS.CO.UK> wrote:

>How about using PROC STDIZE?

proc stdize data = old

out = new

missing = mean

reponly;

var _numeric_;

run;

>You can also impute with the median or read values from a second data set.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ron_Fehd_macro_maven

04-18-2013 02:12 AM

This works Ron.

Thank:-)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-16-2013 02:00 AM

you can merge this 2 datasets with by variables which you have used in proc means to create multiple means values. then you can assign the mean values to missing values if needed.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-16-2013 02:29 AM

I think i will give clear idea.

First dataset with means.

mean_score mean_forum mean_subs

23.45 43.45 12.22

Second data set with original values.

Score forum subs

12 23 .

145 12 87

. 12 98

15 . 67

Now i want to replace the missing values in second data set with the mean value.

For example in score if any value is missing it should be replaced with mean_score.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-16-2013 02:46 AM

take all mean values to respective Macro variables. Then in the next step use symget or simple if condition to replace the values.

Data _null_;

set mean_data;

call symput("Mscore", strip(put(mean_score)));

call symput("Mforum", strip(put(mean_forum)));

call symput("Msubs", strip(put(mean_subs)));

run;

data hh;

set values;

if score=. then score=&mean_score;

else score=score;

....

....

....

run;

do the same for other variables, Hope this helps

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to venkatard

04-20-2013 05:40 AM

It is easy :

data want;

if _n_ eq 1 then set mean_data;

set have ;

score=coalesce( score ,mean_score) ;

forum=coalesce( forum,mean_forum);

...........

run;

Ksharp