BookmarkSubscribeRSS Feed
Seanna
Calcite | Level 5

Hi,

   I am trying to get find predicted means but the version of SAS that I have will not let me do that through proc survey reg. I know that I can do it through proc glm, but I also need to use clustering. I tried the method described here,

http://support.sas.com/kb/24/497.html

But is there an alternative way to get predicted means?

Thank you.

13 REPLIES 13
PGStats
Opal | Level 21

What happened when you tried to apply the method posted in the KB?

PG

PG
Seanna
Calcite | Level 5

Doing it that way has worked, but I am doing many iterations and was hoping that I could find a way that I could do through a macro. Given that each of my trials has different conditions, I have to reenter the coefficients in proc surveyreg for each of them. I am hoping that there is a one-step alternative.

PGStats
Opal | Level 21

If you post sufficient detail, there might be a way to do exactly that : capturing the output from GLM (through ODS tables) and transfering them via MACRO variables to PROC SURVEYREG. I have not done this, but it is the sort of thing that can often be done.

PG

PG
Seanna
Calcite | Level 5

Great, thanks! My only other concern is that one of my class variables is closer to a (.75, 0.25) ratio and as far as I know GLM automatically outputs the coefficients as (0.5, 0.5). So, when I input the coefficients into SURVEYREG, I change them manually. Do you think there is a way to get around this, as well?

I'm sorry if there is a straightforward go-around, I am new to this. Thank you for your help.

PGStats
Opal | Level 21

Does that (.75-.25) ratio reflect the prevalence of the class in your observations? Could you post an example of the GLM call and the final SURVEYREG that you needed? That would help me understand. - PG

PG
Seanna
Calcite | Level 5

The (.75, .25) ratio refers to a weekend variable in a population that was oversampled for weekends, where whether the sample was taken on a weekday or weekend is something that we want to adjust for.

My proc glm statement is:

ods select LsMeanCoef;

proc glm data=soda.info;

where cals ne . &  income ne . & weekend ne . & inschool ne .& school=0;

class income weekend gender;

model cals= weekend gender age income;

weight 6yr;

lsmeans income /e;

run;

quit;

and the surveyreg statement is, where I've input the coefficients from the glm statement except for weekend, which I'm doing as 2/7 and 5/7

proc surveyreg data=soda.info;

where cals ne . & weekend ne . & income ne . & inschool ne . & school=0;

stratum stra;

cluster psu;

class income weekend gender;

model cals= weekend gender age income;

weight 6yr;

estimate 'lsmeans for school'

    intercept 1 weekend 0.2857 0.7143 gender 0.5 0.5 age 9.34660158

inschool 1 0;

estimate 'lsmeans for summer'

    intercept 1 weekend 0.2857 0.7143 gender 0.5 0.5 age 9.34660158

inschool 0 1;

run;

PGStats
Opal | Level 21

Great! I still have one question: The estimate statements involve class inschool which is not in the model but do not mention variable income from the model... I don't understand that aspect of the example.

Must leave now. Will be back tomorrow, in about 19hrs.

PG

PG
Seanna
Calcite | Level 5

I'm sorry, that was a typo, here is the corrected code.

ods select LsMeanCoef;

proc glm data=soda.info;

where cals ne . &  income ne . & weekend ne . & inschool ne .& school=0;

class income weekend gender;

model cals= weekend gender age income;

weight 6yr;

lsmeans income /e;

run;

quit;

proc surveyreg data=soda.info;

where cals ne . & weekend ne . & income ne . & inschool ne . & school=0;

stratum stra;

cluster psu;

class income weekend gender;

model cals= weekend gender age income;

weight 6yr;

estimate 'lsmeans for low income'

    intercept 1 weekend 0.2857 0.7143 gender 0.5 0.5 age 9.34660158

income 1 0;

estimate 'lsmeans for high income'

    intercept 1 weekend 0.2857 0.7143 gender 0.5 0.5 age 9.34660158

incomel 0 1;

run;

Thank you again for your help!

PGStats
Opal | Level 21

Hello Seanna, here is what I can propose:

First, run the following MACRO description :

%macro genEstStmt( lm_, mlsmf);
%global &lm_.;
data &lm_; set &lm_; order=_n_; param=scan(parameter,1); level=scan(parameter,2); run;

proc transpose data=&lm_ out=_&lm_.; var row:; by effect order param level notsorted; run;

proc sql;
update _&lm_. as L
set col1=coalesce((select factor from &mlsmf. as M where L.param=M.param and L.level=M.level), col1);

proc sort data=_&lm_.; by effect _NAME_ order; run;

data _null_;
length effectList $400;
retain effectList;
set _&lm_. end=lastObs;
by _LABEL_ param notsorted;
if first._LABEL_ then
effectList = cats(effectList, "ESTIMATE '", effect, " = ", _LABEL_,"'");
if first.param then effectList=catx(" ", effectList, param);
effectList = catx(" ", effectList, col1);
if last._LABEL_ then effectList=cats(effectList,";");
if lastObs then call symput("&lm_.", trim(effectList));
run;

proc sql; drop table _&lm_.; quit;

%mend;

Then, create a small dataset describing the factors that you want to change, such as :

data myLsMeansFactors;

input param $ level $ factor;

datalines;

weekend 0 0.2857

weekend 1 0.7143

;

WARNING: Verify that the levels match the factors properly - they may be inverted...

You only need to do these steps once per SAS session.

then insert the following statement inside the proc GLM:

ods output LsMeanCoef=myIncomeAnalysis;

and before the PROC SURVEYREG procedure, the MACRO call :

%genEstStmt(myIncomeAnalysis, myLsMeansFactors);

executing thet macro will create a macro variable with the same name as your LsMeanCoef dataset containing the required ESTIMATE statements.

All you need to do then is refer to the macro variable in place of your ESTIMATE statements.

Here is how the whole thing looked in my tests:


%macro genEstStmt( lm_, mlsmf);
%global &lm_.;
data &lm_; set &lm_; order=_n_; param=scan(parameter,1); level=scan(parameter,2); run;

proc transpose data=&lm_ out=_&lm_.; var row:; by effect order param level notsorted; run;

proc sql;
update _&lm_. as L
set col1=coalesce((select factor from &mlsmf. as M where L.param=M.param and L.level=M.level), col1);

proc sort data=_&lm_.; by effect _NAME_ order; run;

data _null_;
length effectList $400;
retain effectList;
set _&lm_. end=lastObs;
by _LABEL_ param notsorted;
if first._LABEL_ then
effectList = cats(effectList, "ESTIMATE '", effect, " = ", _LABEL_,"'");
if first.param then effectList=catx(" ", effectList, param);
effectList = catx(" ", effectList, col1);
if last._LABEL_ then effectList=cats(effectList,";");
if lastObs then call symput("&lm_.", trim(effectList));
run;

proc sql; drop table _&lm_.; quit;

%mend;

data myLsMeansFactors;
input param $ level $ factor;
datalines;
weekend 0 0.2857
weekend 1 0.7143
;

proc glm data=test;
ods select LsMeanCoef;
where cals ne . &  income ne . & weekend ne . & inschool ne .;
class income weekend gender;
model cals= weekend gender age income;
*weight 6yr;
lsmeans income /e;
ods output LsMeanCoef=myIncomeAnalysis;
run;
quit;

%genEstStmt(myIncomeAnalysis,myLsMeansFactors);

proc surveyreg data=test;
where cals ne . & weekend ne . & income ne . & inschool ne .;
*stratum stra;
*cluster psu;
class income weekend gender;
model cals= weekend gender age income;
*weight 6yr;
&myIncomeAnalysis;
run;

Give it a try. Tell me if it works for you.

PG

PG
Seanna
Calcite | Level 5

Thank you so much! I will try it in the morning and letyou know how it goes. Thank you again.

Seanna
Calcite | Level 5

It's working! However, I am getting slightly (like, 1 or 2 percent of the total value) differences between the estimates that I had gotten with my original code and those with the new ones. The difference between the two estimates for the income variable is the same with both codes. Is this probably just an issue of rounding at some point? As long as the difference is the same it is fine for my purposes, I'm just curious.

Thank you again, I really appreciate all of your help!

PGStats
Opal | Level 21

Seanna, if you frame proc surveyreg like this:

options symbolgen;

proc surveyreg...

...

run;

options nosymbolgen;

you will be able to see in the LOG the ESTIMATE statements and compare them with your original program.

PG

PG
Seanna
Calcite | Level 5

Ok, I fixed it. Thank you again so much, you're a lifesaver!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 3027 views
  • 0 likes
  • 2 in conversation