BookmarkSubscribeRSS Feed
herokid123
Calcite | Level 5

Hello,

 

My name is Brandon and this is my first time posting on here. I am currently trying to overall determine if increased temperatures are leading to increased hospital admissions. I am running my code using SAS software version 9.4 (TS1M4). 

Attached will be the file I am working with, but a little more about what my objective is and my background.

 

I am trying to create a logistic model which will show the equation for modeling temperatures (the predictor variable) to the number of total hospital admissions (the response variable). Because I have daily counts in my data, years 2009-2015 for every day of the year with such a large sample size, I am assuming that my distribution is poisson (large n, small p). I don't have much background in coding, and this is my first time using SAS, but I know that I am trying to run this using the categorical variable, county (which I believe means it will be reported as a class variable), and using different months. 

I would love assistance in getting the correct output and code so that I can determine if increased temperatures, using average temperatures and max temperatures, are leading to increased hospital admissions in total. Then from there I believe I can investigate if the effect modification from each age and two particular diseases if I have the code for one model. (I believe it is just a matter of changing what the predictor and response variables).

 

Anyways, here is some of the code I have used thus far, but I get really large scaling estimates so I am not sure if I am running this procedure correctly or not. In addition, I get extremely large Wald Chi-Square values (ranging from 0.11 to 5,000,000).

========================================================================================================

 

proc import datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = churchill;
      sheet = "Churchill";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = clark;
      sheet = "Clark";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = douglas;
      sheet = "Douglas";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = elko;
      sheet = "Elko";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = esmeralda;
      sheet = "Esmeralda";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = eureka;
      sheet = "Eureka";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = humboldt;
      sheet = "Humboldt";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = lander;
      sheet = "Lander";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = lincoln;
      sheet = "Lincoln";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = lyon;
      sheet = "Lyon";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = mineral;
      sheet = "Mineral";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = nye;
      sheet = "Nye";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = pershing;
      sheet = "Pershing";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = storey;
      sheet = "Storey";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = washoe;
      sheet = "Washoe";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = white_pine;
      sheet = "White_Pine";
run;
 
proc import  datafile = '\\files\users\bwoudhuysen\Desktop\EDdata.XLSX'
      dbms = XLSX
      replace
      out = carson;
      sheet = "Carson";
run;
 
data all; set churchill clark douglas elko esmeralda eureka humboldt lander lincoln lyon mineral nye pershing storey washoe white_pine carson; 
run;
 
/* Add the appropriate variable to the data set */
data all;
set all;
lnPop = log(PopulationSize);
month=month(date);               /* If you want to analyze by month . . . */
year=year(date);               /*  . . . or year . . . */
monyr=year*100 + month;        /* . . . or month/year . . . */
if 4<=month<=8 then AprAug=1;  /* . . . or something custom. */
else                AprAug=0;
run;
 
 
/* Taking the month variable as an example, you can add it as a
   categorical covariate by adding it to both the CLASS and MODEL Statements */
 
proc genmod data=all descending;
   class county(param=ref ref ='Clark') month(param=ref ref = '3');
   model TotalAdmissions = avgtemp county month / dist = poisson TYPE3
     link = log
   offset=lnPop
   PSCALE
/* Or maybe use DSCALE? */
;
run;
 
/* You could also perform completely separate regressions for each period
   by putting the variable in the BY Statement.  However, to do that, the
   data set must first be sorted.
*/
 
proc sort data=all;
by month;
 
proc genmod data=all;
   by month;
 

Capture.PNG
12 REPLIES 12
JacobSimonsen
Barite | Level 11

Using Poisson regression on such count data is ok, but of some other reason than what you suggest. If you want to estimate the rate of event, then the likelihood function is the same as if the count data were Poisson distributed. This works even if n is not big and p (or the rate) is not small.

Secondly, it may be wise to aggregate your data before using proc genmod. Genmod is not smart enough to aggregate the data before it starts estimation, so if you have many records is faster to do the aggregate first. Just use proc means (with nway option) and summarize days and counts. Then put log(total_days) into the offset.

herokid123
Calcite | Level 5

This makes sense to aggregate first.

Excuse me for being new, but the what would the code for this look like?

 

I am guessing,

==================================================================================
data all; set churchill clark douglas elko esmeralda eureka humboldt lander lincoln lyon mineral nye pershing storey washoe white_pine carson;
run;

 

 

proc means data = all nway;
run;


proc sort data=all;
by month;


/* Add the appropriate variable to the data set */
data all;
set all;
TotalDays = log(43452);
month=month(date); /* If you want to analyze by month . . . */
year=year(date); /* . . . or year . . . */
monyr=year*100 + month; /* . . . or month/year . . . */
if 4<=month<=8 then AprAug=1; /* . . . or something custom. */
else AprAug=0;
run;

 

proc genmod data=all descending;
class county(param=ref ref ='Clark') month(param=ref ref = '3');
model TotalAdmissions = avgtemp county month / dist = poisson TYPE3
link = log
offset=TotalDays
DSCALE
;
run;

=====================================================================================

Unfortunately, I am still getting the same scaling issue with my models.

 

I am unsure if I am just doing this incorrectly or what, but thank you so much for your thoughts.

JacobSimonsen
Barite | Level 11

You can aggregate your data with proc means, and then run the GENMOD procedure:

proc means data=all noprint nway;
  var populationsize totaladmissions;
  class county month avgtemp ;
  output out=summary sum=populationsize totaladmissions;
run;

data summary;
  set summary;
lnpop=log(populationsize);
run;


proc genmod data=summary descending;
   class county(param=ref ref ='Clark') month(param=ref ref = '3');
   model TotalAdmissions = avgtemp county month / dist = poisson TYPE3
     link = log
   offset=lnPop;
run;
herokid123
Calcite | Level 5

Wouldn't I still need to use a scaling parameter? In this case, I applied the 

DSCALE  (after the offset or I could use PSCALE)

option and I get reasonable chi-square values, but my scaling estimate is 46.2103, but isn't this the case of overdispersion?

 

I feel like we are close I am just worried that the scale means there is an existence of overdispersion.

 

JacobSimonsen
Barite | Level 11

You shouldn't use the scale options in such model, not even if you observe a high degree of overdispersion. This is because "overdispersion" is a measure of model fit, which is a check of if data has the Poisson-distribution. But, in this case, Poisson regression is used just because the likelihood function to estimate the rate of events is similar to the likelihood function if you regard data as Poisson distributed. You can therefore here use poisson regression to make interference (estimation and hypothesis testing), but not for something like checking distribution of data. An other way to see that it would be wrong is that the overdispersion statistic would rely much on how much you aggregate data before estimation of parameters, which then would lead to that the degree of aggregation would affect standard errors and confidence intervals - and that would not make sense.

 

If you actually had Poisson distributed count data then it would make sense to check for overdispersion and other kind of model validation.

herokid123
Calcite | Level 5

Okay. I think I am understanding.

 

 

So for interpretation purposes given the following output (in the attachments)

would I just say that all the counties were significant in having an effect for modeling temperature and hospital admissions.

More notably, the overall effect is given by

Y = -9.9652 + 0.0385X

where Y is equal to the total number of hospital admissions, and X is the temperatue.

Thus, with increase in temperature there is a positive association with the outcome of total hospital admissions?

 

 

JacobSimonsen
Barite | Level 11

All counties (apart from the reference counti) are significant from the reference counti. Further, Counti is overall significant (the type 3 test).

 

With the equation "Y = -9.9652 + 0.0385X" you estimate the log(rate)  in the reference counti, x is avg-temperature. You should add the estimate of the counti difference if you want the estimate the log(rate) in some other counti.

 

Yes, avg temperature seem to be significant associated with increase in rate.

herokid123
Calcite | Level 5

Oh okay.

 

  So the log(Totaladmission) = -9.9652 + 0.0385(avg temperature), but what do you mean estimate for county difference?

Like specify what each county is predicting for it? 

 

And also thank you so much for everything you have helped me with so far! It has been super helpful.

JacobSimonsen
Barite | Level 11

In the class statement you specified a reference level (or otherwise per default it would select one itself). Íf you dont include counti in the equation for your estimation then it is the estimate for the reference level. The estimates for the other counties than the reference level is the difference relative to the reference level. So, if you would estimate the level for some other level than the reference level then you should include the estimated difference in the equation.

herokid123
Calcite | Level 5

Hi Jacob,

 

    So I took a step back and want to know what and why I am aggregating. It looks like for each county I combined all Januaries for the same average monthly temperature, and so no longer have a daily dataset. I have a monthly dataset and perhaps can answer the question if average monthly temperature is a predictor, but not on daily emergency visits.

 

Because of the huge scaling I am worried that poisson is now  apoor choice (with scaling ~45). So maybe I am just not sure what is up here now, and confusing myself.

herokid123
Calcite | Level 5

Again, I know you said scaling isn't an issue, but I am just worried that it is a potential issue with stating that my poisson regression is not a good model, but I don't know what would be a better model.

 

 

Thanks,

JacobSimonsen
Barite | Level 11

Think about if assumptions for Poisson regression is fulfilled. That is, persons should be independent from each other, and the rate should be piecewise constant. Further, for log-linear Poisson regression (the one with log as link-function), you also have to assume the multiplicative structure of the rate of events. But, you can not use deviance to justify Poisson regression as "Poisson regression"  in this case is not used to fit Poisson distributed data, but only as a trick to maximize the right likelihood function.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1804 views
  • 2 likes
  • 2 in conversation