Hi,
I'm running a logistic regression in proc genmod for proportion data. I am regressing the share of white students in a school on whether the school is in a city or a suburb. How do I calculate the marginal effects for the city variable? I can't use the margins macro because I'm using the events/trial syntax for proportion data.
proc genmod data=dat;
class city;
model n_white/total_students = city/dist=bin link=logit;
run;
n_white = number of white students
total_students = number of students in the school
city = indicator for whether a school is in a city or suburb
I'm not sure what you mean by "marginal" in this situation where you have one X variable. Could you explain further?
Hi,
There are a number of other covariates also in the model, sorry! For example, income inequality ("gini", continuous), charter school ("charter", categorical), urbanicity ("urban", categorical), among many others.
The updated code could look like
proc genmod data=dat;
class city;
model n_white/total_students = city gini charter urban/dist=bin link=logit;
run;
Did you check ESTIMATE and SLICE statement.
Or you could use EFFECTPLOT to visualize this marginal effect.
You can use the Margins macro... just modify your aggregated (events/trials) data to create one observation with count for events and one observation with count for nonevents. Then, as mentioned in the Limitations section of the macro documentation, use the FREQ= option in the macro. For example:
data drug;
input drug$ x r n @@;
y=1; f=r; output;
y=0; f=n-r; output;
datalines;
A .1 1 10 A .23 2 12 A .67 1 9
B .2 3 13 B .3 4 15 B .45 5 16 B .78 5 13
C .04 0 10 C .15 0 11 C .56 1 12 C .7 2 12
D .34 5 10 D .6 5 9 D .7 8 10
E .2 12 20 E .34 15 20 E .56 13 15 E .8 17 20
;
%Margins(data = drug,
class = drug,
response = y,
roptions = event='1',
freq = f,
dist = binomial,
model = drug x,
margins = drug,
options = diff cl )
Interesting, I tried your method but got the following error:
ERROR: A cluster has been detected with the frequency counts specified by the FREQ statement unequal within the cluster. The frequency counts must be equal within clusters.
That sounds like a message you could get if you use the GEESUBJECT= option. It would help if you showed your DATA step code and your macro call.
Hi,
My data are 18,000+ observations, should I attach the csv here? I use the geesubject option because schools are nested within cities and suburbs.
In that case, you will need to fully expand your aggregated data so that one observation represents a single individual and not use the FREQ= option. For example:
data drug;
input drug$ x r n @@;
do i=1 to r; y=1; output; end;
do i=1 to n-r; y=0; output; end;
datalines;
A .1 1 10 A .23 2 12 A .67 1 9
B .2 3 13 B .3 4 15 B .45 5 16 B .78 5 13
C .04 0 10 C .15 0 11 C .56 1 12 C .7 2 12
D .34 5 10 D .6 5 9 D .7 8 10
E .2 12 20 E .34 15 20 E .56 13 15 E .8 17 20
;
I am hesitant to do it this way because my unit of analysis is the aggregated data (schools) not the individual student data. Is there a way to calculate the average marginal effect without using the margins macro for proc genmod?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.