BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bmac1
Obsidian | Level 7

Hello SAS users:

 

I am new to this forum and have a couple related questions about the confidence intervals and predicted values produced by Proc GAM. 

 

I am trying to derive these values for a time series analysis of yearly sea surface temperature (SST) data.

Question 1:

 

The confidence limits for the predicted SST in any given year should be given by SST + uclm_year and SST + lclm_year, where the uclm and lclm variables are the SAS output variables produced in the Output option of Proc GAM, and SST is the long-term mean of the original time series.  However, when I plot these confidence intervals they are not symmetric about the predicted fit, and indeed sometimes become very close to the fitted trend values.  These seems to be an error somewhere, but I don’t know how to fix it. 

 

Question 2:  As part of the output, SAS gives P_SST, which is the predicted (fitted) annual SST for each year in the time series.  I understand that P_SST should be equal to the sum of the long-term mean SST and the annual “year effect” given by P_year in the Output from Proc GAM.  However, when I do the addition manually to check the SAS output, the summation is not equal to P_SST produced by Proc GAM.  Why does it differ?

 

The code and data I have used are below.  I am using SAS version 9.4.

 

 

 

proc gam data = analysis;

  model sst = loess(year, DF = 8);

  output out = fit pred uclm lclm adiag std resid;

run;

 

year       sst

1870       9.8

1871       10.4

1872       10

1873       10

1874       9.8

1875       10.4

1876       9.6

1877       10.1

1878       10.5

1879       10

1880       10.7

1881       9.4

1882       8.7

1883       9.3

1884       9.2

1885       9.7

1886       9.4

1887       10.2

1888       10.6

1889       9.7

1890       9.6

1891       9.9

1892       9.9

1893       10.4

1894       9.4

1895       10.1

1896       9.5

1897       9.7

1898       9.7

1899       10.6

1900       10.4

1901       9.7

1902       10.3

1903       9.6

1904       9.7

1905       9.5

1906       9.4

1907       9.6

1908       9.3

1909       9.8

1910       9.6

1911       9.3

1912       9.1

1913       9.5

1914       9

1915       10.1

1916       10

1917       10

1918       8.9

1919       9

1920       8.8

1921       8.8

1922       9

1923       9.3

1924       9.5

1925       8.6

1926       9.5

1927       10.5

1928       10.3

1929       10.3

1930       9.4

1931       10.2

1932       10.1

1933       10.1

1934       9.4

1935       9.7

1936       10

1937       9.3

1938       9.2

1939       10.1

1940       9.9

1941       10.1

1942       9.7

1943       9.7

1944       9.5

1945       9.7

1946       9.5

1947       9.7

1948       10.2

1949       9.4

1950       10.3

1951       10

1952       10.2

1953       10.3

1954       10.1

1955       9.5

1956       9.9

1957       10.4

1958       11

1959       9.9

1960       10.6

1961       9.6

1962       9.7

1963       9

1964       9.7

1965       10.3

1966       10.4

1967       9.6

1968       10.2

1969       8.9

1970       8.7

1971       9.2

1972       8.6

1973       8.9

1974       9.6

1975       9.1

1976       8.7

1977       9.8

1978       9.6

1979       9.4

1980       10.2

1981       9.3

1982       9.1

1983       7.8

1984       8.8

1985       9.6

1986       9.5

1987       10.3

1988       9.6

1989       8.8

1990       9.5

1991       9.8

1992       9

1993       9.2

1994       9.5

1995       9.5

1996       9.5

1997       10.5

1998       10.7

1999       10.2

2000       10

2001       10.4

2002       9.8

2003       11.2

2004       10.5

2005       10.3

2006       10.3

2007       10.7

2008       10.7

2009       10.4

2010       11.6

2011       10.3

2012       11.6

2013       9.8

2014       10.7

2015       9.6

2016       10.9

2017       10.2

2018       9.2

2019       11.3

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Yes. The deviance is not reported in the output because it is not used for model fitting, in contrast to PROC GAM.

 

If you need the deviance, you can use the OUTPUT statement to request predicted means then compute deviance by using the formula provided in the GENMOD doc

 

proc gampl data=test seed=123;
model y=spline(x1) spline(x2);
output out=myout pred=mu;
id y;
run;

/* now use data step to compute deviance using myout */

For the deviance of the NULL model, you can use PROC GENMOD to run an intercept-only model. The Deviance statistic is in the “ModelFit” table:

 

proc genmod data=test;
model y=;
run;

View solution in original post

8 REPLIES 8
PGStats
Opal | Level 21

Look at

 

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_gam_details18.htm&docsetVersion=1... 

 

The confidence intervals for LCLM and UCLM are not mathematically the same as the plotted intervals.

PG
bmac1
Obsidian | Level 7

Thank you PGStats.

 

This is helpful, but I have difficulties to see how this solves my problem.  That is, I am still not able to get the confidence intervals plotted correctly.

 

Should not P_y = overall time series mean + P_x?  For the notation of my variables, this would be

P_sst = overall mean + P_year.

 

In my case, this does not work because when I manually do the addition of mean + P_year, it does not equal P_sst, as seen in the top panel of the attached figure.  I thought it should, and it has worked like this for me in the past with earlier versions of SAS.  I think this issue, if resolved, could help solve the other question in my original query.

 

Also, the P_y should lie in the middle of the upper and lower 95% CL, as given by mean + uclm_x and mean + lclm_x. But it does not, as seen in the lower panel of the attached figure file.

 

Am I misinterpreting the meaning of P_y and P_x, and if so, how should they be interpreted?

Rick_SAS
SAS Super FREQ

A few comments:

1. PROC GAM is an old procedure and its main purpose is understanding how a TRANSFORMATION of the explanatory variable can be used to predict the response. If you use 

proc gam data = analysis plots=component(clm);

...

you will see the transformation for the YEAR variable. 

 

2. The STD, UCLM, and LCLM options on the OUTPUT statement are NOT for the prediction limits for the response. They are associated with the transformation of the explanatory variable.

 

3. The GAM doc has an example that explains the difference between GAM and LOESS in terms of smoothing data.

 

4. IMHO, if you want prediction limits, you should use PROC LOESS or PROC GAMPL, which is a newer and more efficient version of GAM that is intended for predictive models. The following example creates a smoother with prediction limits for your data:

 

proc gampl data = analysis plots=components;
  model sst = spline(year / DF = 8);
  output out = fit pred upper lower;
run;

data All;
   merge analysis fit;
run;

proc sgplot data=All noautolegend;
   band x=year lower=Lower upper=Upper;
   scatter x=year y=sst;
   series x=year y=Pred;
run;
bmac1
Obsidian | Level 7

Dear Rick_SAS,


Many thanks for this reply – it was really helpful and I have done several analyses in the past few weeks using Proc GAMPL.


However I also need an estimate of overall explained deviance, and cannot find it in the Proc GAMPL output (Proc GAM provides it).

 

I also need an estimate of the null model deviance.

 

How can I find or derive these deviance estimates from the output provided by the procedure?


Thanks again for any suggestions.
Best regards
bmac1

Rick_SAS
SAS Super FREQ

Yes. The deviance is not reported in the output because it is not used for model fitting, in contrast to PROC GAM.

 

If you need the deviance, you can use the OUTPUT statement to request predicted means then compute deviance by using the formula provided in the GENMOD doc

 

proc gampl data=test seed=123;
model y=spline(x1) spline(x2);
output out=myout pred=mu;
id y;
run;

/* now use data step to compute deviance using myout */

For the deviance of the NULL model, you can use PROC GENMOD to run an intercept-only model. The Deviance statistic is in the “ModelFit” table:

 

proc genmod data=test;
model y=;
run;
bmac1
Obsidian | Level 7

HI Rick_SAS,

 

Thanks again. This helped a lot!  I ran some a few test examples today and have been able to get the deviances and AIC's for fitted models and the null models using your suggestions.  

 

Best regards

bmac1

StatDave
SAS Super FREQ

If your concern about deviance values from GAMPL is because you want a statistic providing an overall assessment of the fit of the model, then you can get an R-square value using the RsquareV macro. See Example 2 in the Results tab of the macro documentation which shows using the macro to assess the fit of a spline model fit in PROC GAMPL.

bmac1
Obsidian | Level 7

Hi, 

 

Thank you very much.  Yes, this is what I had in mind and I was not aware of this macro option.  It looks like a good way to get an overall fitness of the models.  I will probably however use the deviances and AIC outputs that I have been able to get thanks to the post by Rick_SAS.  I can see that the principles are somewhat similar so I have made a note of the documentation in case I decide to use this method.  

 

Best regards

bmac1

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 921 views
  • 2 likes
  • 4 in conversation