DATA Step, Macro, Functions and more

I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

I have 2 data sets one with mean value for each variable(proc means output) and the other with values

I have 2 data sets one with mean value for each variable(proc means output) and the other with values. I want to replace the missing values with mean value for each particular variable in the dataset.

Any help please.

I tried call symput but does it work for multiple variables?


Accepted Solutions
Solution
‎04-16-2013 02:42 PM
Regular Contributor
Posts: 227

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to Ron_Fehd_macro_maven

On Tue, 16 Apr 2013 17:45:04 +0100, Ian

<ian.wakeling@HANANI.QISTATS.CO.UK> wrote:

>How about using PROC STDIZE?

proc stdize data    = old 

            out     = new

            missing = mean

            reponly;

            var       _numeric_;

run;

>You can also impute with the median or read values from a second data set.

View solution in original post


All Replies
Super Contributor
Posts: 276

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

Hi..

Not sure how you going to accomplish this with Call Symput..If possible can you provide your code which you have tried???

Other then that coalesce function will be helpful for your requirement.

Regards.

Sanjeev.K

Frequent Contributor
Posts: 97

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

hi ,

try this...but............please atleast post your working / sample datasets.....

data samp;
infile datalines missover;
input id sal ;
if missing(sal) then  sal=symget("avgsal");
datalines;
001 5000
002 7000
003
004 7832
005
;
run;

proc sql noprint ;
select avg(sal)  into : avgsal  from samp ;
run;

Regards

ALLU

Respected Advisor
Posts: 3,799

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

PROC STDIZE, for 1 or more variables in the VAR statement and BY groups if you like.

data samp;
   infile datalines missover;
  
input id:$3. sal ;
   datalines;
001 5000
002 7000
003
004 7832
005
;;;;
   run;
proc stdize reponly missing=mean out=poked;
   var sal;
   run;
Regular Contributor
Posts: 227

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

1. read in the data structure of your data.

make an array of all the numeric variables

2. read in the single row of the Means data set

whose variable have been autonamed

so that Mean is the prefix of each variable.

The issues are getting the names of the variables aligned with their means.

so that the array references are correct.

alternately you can use sql to extract the list of variable

and feed that list to Summary

which you can then use in the array statements.

DATA Update;*todo: (drop = MeanSmiley Happy;

*get the data structure for the array Valu;

if 0 then set Library.Mydata;

array Valu(*) _numeric_;

*read the Means data set -- one row -- into the set of values for updating;

set Means;

array Mean(*);

do until(EndoFile);

   set Library.MyData end = EndoFile;

   do i = 1 to dim(Valu);

      if Valu(I) eq . then Valu(I) = Mean(I);

      end;

   output;

   end;

stop;

run;

PROC SQL; describe table &syslast;

           quit;

Ron Fehd  matching maven

Regular Contributor
Posts: 227

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to Ron_Fehd_macro_maven

I have posted a general solution on SAS community wiki:

http://www.sascommunity.org/wiki/Replacing_Missing_with_Mean

Solution
‎04-16-2013 02:42 PM
Regular Contributor
Posts: 227

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to Ron_Fehd_macro_maven

On Tue, 16 Apr 2013 17:45:04 +0100, Ian

<ian.wakeling@HANANI.QISTATS.CO.UK> wrote:

>How about using PROC STDIZE?

proc stdize data    = old 

            out     = new

            missing = mean

            reponly;

            var       _numeric_;

run;

>You can also impute with the median or read values from a second data set.

Contributor
Posts: 63

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to Ron_Fehd_macro_maven

This works Ron.

Thank:-)

Contributor
Posts: 60

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

you can merge this 2 datasets with by variables which you have used in proc means to create multiple means values. then you can assign the mean values to missing values if needed.

Contributor
Posts: 63

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

I think i will give clear idea.

First dataset with means.

mean_score   mean_forum   mean_subs

  23.45               43.45               12.22

Second data set with original values.

Score    forum    subs

12            23       .

145           12      87  

.               12      98

15              .       67

Now i want to replace the missing values in second data set with the mean value.

For example in score if any value is missing it should be replaced with mean_score.

Contributor
Posts: 60

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

take all mean values to respective Macro variables. Then in the next step use symget or simple if condition to replace the values.

Data _null_;

     set mean_data;

     call symput("Mscore", strip(put(mean_score)));

     call symput("Mforum", strip(put(mean_forum)));

     call symput("Msubs", strip(put(mean_subs)));

run;

data hh;

     set values;

     if score=. then score=&mean_score;

     else score=score;

     ....

     ....

     ....

run;

do the same for other variables, Hope this helps

Super User
Posts: 10,023

Re: I have 2 data sets one with mean value for each variable(proc means output) and the other with values

Posted in reply to venkatard

It is easy :

data want;

if _n_ eq 1 then set mean_data;

set have ;

score=coalesce( score ,mean_score) ;

forum=coalesce( forum,mean_forum);

...........

run;

Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 383 views
  • 4 likes
  • 7 in conversation