I merged 2 data sets, so that it adds a column to the other set by doing the following:
/* read datasets */
libname mysas "Path to file folder";
data mysas.Surveys;
set mysas.surveyresults;
run;
data mysas.APIdata;
set mysas.api;
run;
libname mysas "Path to file folder";
data surveys2;
set mysas.surveyresults;
if not missing(q1); /* Remove rows with missing inputs */
run;
data mergeddat;
merge surveys2 (in=inS) mysas.api (in=inA);
by SchoolName;
if inS; /* Keep only schools from the survey results */
run;
proc sort data=mergeddat;
by SchoolName;
run;
/* Verify added column */
proc print data=mergeddat (obs=5);
run;
/* Verify row count */
proc contents data=mergeddat;
run;
In this merged data, there is a column with missing values. For those values, I want to compute the mean of that column, and then impute that mean for the missing values. I'm trying to compute the mean and make it a global macro variable:
/* Compute the mean of q2 and create a global macro variable */ proc means data=mysas.Surveys noprint; var q2; output out=q2_mean_data mean=mean_q2; run; /* Extract the computed mean into a macro variable */ data _null_; set q2_mean_data; call symputx('q2_mean', mean_q2); run; /* Impute missing values in q2 using the computed mean */ data mysas.Surveys_Imputed; set mysas.Surveys; if missing(q2) then q2 = &q2_mean.; run; /* Step 3: Display the data with imputed values */ proc print data=mysas.Surveys_Imputed (obs=10); /* Display first 10 rows */ run;
However, when I try to get a visual, nothing is outputting when I do proc print. I feel like my logic and approach should accomplish this. What am I doing wrong and how can I fix this?
data have;
set sashelp.class;
if _n_ in (4:8) then call missing(height,weight);
run;
proc stdize data=have out=want missing=mean reponly;
var height weight;
run;
Alternative to @Ksharp code.
I am using the
if _N_=1 then set
technique.
(as others told you before no macro variables are needed for this)
/* Alternative to Ksharp's code */
data have;
set sashelp.class;
if _n_ in (4:8) then call missing(height,weight);
run;
*proc stdize data=have out=want missing=mean reponly;
*var height weight;
*run;
proc means data=have noprint;
var height weight;
output out=have_means(drop=_:) mean= / autoname;
run;
data want(drop=height_mean weight_mean);
if _N_=1 then set have_means;
set have;
if height=. then height=height_mean;
if weight=. then weight=weight_mean;
run;
/* end of program */
Koen
I point out that for SASHELP.CLASS, there are a very small number of variables. If the real data set has lots of variables, and they each need to have their own mean imputed when a value is missing, then the approach from @sbxkoenk is a lot of typing, but PROC STDIZE is much less typing.
Fixing what I believe is a typo in @Ksharp 's program
proc stdize data=have out=want missing=mean reponly;
should say
proc stdize data=have out=want method=mean reponly;
Please post the complete log, so that we see what happened actually.
In this merged data, there is a column with missing values. For those values, I want to compute the mean of that column, and then impute that mean for the missing values. I'm trying to compute the mean and make it a global macro variable:
There is no need for a macro variable here. SAS has many PROCs to do what are standard statistical methods, which you should investigate (or ask us about) before you decide to create your own code to do a standard statistical method, and before you decide to create macro variables. As @Ksharp points out, PROC STDIZE will do this for you.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.