Re: poisson regression estimate only for a subset of vriabke

raheleh22 · Posted 03-29-2023 01:41 PM

I have a data set which my outcome variable is count, and it is not over-dispersed so I am using proc genmod with dist= poisson and link=log. this is my model:

model counts= theme1 theme2 theme3 theme4 overall / dist=poisson link=log;
run;

the counts in my excel are per county and I wonder How I should get estimates according to my model per county.

I have tried if and where statement for the county variable but none of them worked.

any advice is appreciated.

Thanks

StatDave · Posted 03-29-2023 02:05 PM

It's not clear what your data set is like and what it is that you want. If you want a separate model for each county and if you have multiple observations for each county with a count and predictor values in each observation, then you can just add a BY COUNTY; statement in your GENMOD step (after sorting by COUNTY). Or, if you want a single model that adjusts for COUNTY differences, then maybe you need to put COUNTY in the CLASS and MODEL statements, possibly including interactions of COUNTY with your predictors if needed.

raheleh22 · Posted 03-29-2023 02:12 PM

I have shown my dataset below and this is the model I used:

data time1;
input county counts theme1 theme2 theme3 theme4 overall;
ln= log(counts);
datalines;

proc genmod data=time1;
class county;
model counts= theme1 theme2 theme3 theme4 Overall county/dist=poisson link=log offset=ln;
run;

after running this model I am getting 0 estimate for all counties. I ma not sure which step I am doing wrong.

I want to get estimates for each theme per county.

So this is how my dataset looks:

COUNTY	Counts	Theme1	Theme2	Theme3	Theme4	Overall
1	35	1	0.9286	0.8571	0.6429	1
2	112	0.2143	0.7143	0.5714	0.5714	0.4286
3	61	0.3571	0	0.2857	0.9286	0.2857
4	84	0.5714	0.5714	0.0714	0.3571	0.5
5	30	0.6429	0.3571	0.4286	0.7857	0.7143
6	8	0.1429	0.4286	0.3571	0.2143	0.2143
7	30	0.5	0.2857	0.4286	0.5	0.6429
8	9227	0	0.0714	0.7143	0.2857	0.0714
9	114	0.6429	0.5714	0.1429	0.4286	0.5714
10	93	0.9286	1	0.7857	0.8571	0.9286
11	2070	0.2143	0.1429	0.5714	0.6429	0.3571
12	1109	0.4286	0.2143	0.2143	0.0714	0.1429
13	28	0.7857	0.8571	1	0.1429	0.7857
14	99	0.0714	0.4286	0	0	0
15	65	0.8571	0.7857	0.9286	1	0.8571

StatDave · Posted 03-29-2023 03:30 PM

You shouldn't use the log of your response variable as an offset. The offset is just another predictor in the model with its parameter restricted to equal 1. As a result, you effectively are modeling a constant rate of 1. So, just remove OFFSET=LN from your MODEL statement.

StatDave · Posted 03-29-2023 03:36 PM

But you won't be able to estimate a parameter for every county AND estimate the parameters of the predictors since that results in trying to estimate more parameters than there are observations. The only way could estimate the parameters for the predictors separately for each county is by having a set of observations for every county. You can't do it with only one observation for each county.

raheleh22 · Posted 03-29-2023 03:46 PM

that is helpful. Now I am rearranging my dataset, so I have separated counts for each county. now the new dataset is only one county and includes counts, theme1 theme2, theme3, theme4, overall (all of these are continues) and year ( categorical). in my new model:

proc genmod data=county1;
class year(ref='1');
model counts= theme1 theme2 theme3 theme4 overall year/dist=poisson link=log;
run;

still the output is giving the estimates of year categories seperate and estimate of themes seperate.

so how can I get the estimate of themes by years categories?

Thanks a lot,

StatDave · Posted 03-29-2023 04:57 PM

To do that you would have to include interactions between YEAR and the other predictors such as
model counts = year theme1*year theme2*year theme3*year theme4*year / dist=poisson;
But you will have the same problem if the number of parameters exceeds the number of observations. For instance, if each observation in your new data set is for a separate year, then you essentially the same problem as before.

poisson regression estimate only for a subset of vriabke