BookmarkSubscribeRSS Feed
Heejeong
Obsidian | Level 7

Hello,

 

I am running a simple PROC GLM predicting Norepinephrine levels (Norepinephrine) with a Stress Reactivity variable (srna_mlm2) and I also have a list of covariates. 

 

proc glm data=saved.Final4;
model Norepinephrine= cage ClinicSex marriedmidus work race_orig  cedu cHHtotalIncome EverSmokeReg Exercise20mins cSleepQual CNSmeds cBMI cCESD cNeuroticism  cChronCondNumb  cAnyStressWide_sum cna_mlm2 srna_mlm2; 
run;

 

After running my PROC GLM analysis, I want to save the fitted dataset which I will then use to plot a Figure. 

The main reason why I want to export the data is so that I can use different software (e.g., graphpadprism, excel) to plot the data and I just need the raw dataset with the predicted values. 

 

I had two follow-up questions:

1) I had originally plotted the figure using a PROC PLM and tried to export the Logfit  data (as the syntax below shows). But when I exported the dataset into a SPSS file, the number of observations was only 200 while my PROC GLM output indicated that the Number of Observations Used was 495. Would it be safe to assume that I have fewer observations in the Logfit data because depending on the range of the y-axis and the x-axis, not ALL observations are used to plot the figure? And ONLY the observations used in the figure are saved by the ods output fitplot=Logfit; syntax?

proc glm data=saved.Final4;
model Norepinephrine= cage ClinicSex marriedmidus work race_orig  cedu cHHtotalIncome EverSmokeReg Exercise20mins cSleepQual CNSmeds cBMI cCESD cNeuroticism  cChronCondNumb  cAnyStressWide_sum cna_mlm2 srna_mlm2; run;
store graph2;run;

proc plm restore=graph2 noinfo;
effectplot fit (x=srna_mlm2)/clm;
ods output fitplot=Logfit;
run;

Heejeong_0-1657587900965.png

2) Second, since I want to export the complete dataset from the PROC GLM analysis, I learned about the OUTPUT STATEMENT in Proc GLM and created the below syntax. With this syntax, the final dataset includes ALL observations (including "Number of Observations Reads"). In addition to predicted values, residuals and lower/upper confidence intervals? Would this dataset be the correct dataset to export and use to create the same Figure above? Or should I be creating a dataset that ONLY contains OBSERVATIONS USED?

proc glm data=saved.Final4;
model Norepinephrine= cage ClinicSex marriedmidus work race_orig  cedu cHHtotalIncome EverSmokeReg Exercise20mins cSleepQual CNSmeds cBMI cCESD cNeuroticism  cChronCondNumb  cAnyStressWide_sum cpa_mlm2 srpa_mlm2; run;
output out=out p=NorepinPredicted 
r=NorepinResidual lclm=NorepinLowercl uclm=NorepinUppercl;
run;

I apologize for my lack of knowledge using this rather simple statistical procedure and appreciate your patience and guidance as I figure this out!

5 REPLIES 5
ballardw
Super User

STORE data sets are for use by Proc PLM to score data and as such contain what is need to apply the model results to a different data set: i.e the parameters of the model variables. As such there is at best a casual relationship between the number of observations in a data set and the Score data set.

 

The OUTPUT statement is used to create a data set for each observation adding in such things as the predicted values and confidence limits (of mean or individual predictions); model fit influence parameters, residuals and more.

 

You have some variable names that I would interpret as likely CLASS variables. From the documentation:

Typical classification variables are Treatment, Sex, Race, Group.

 

 

PaigeMiller
Diamond | Level 26

MODERATORS: could you please merge this with @Heejeong earlier thread at https://communities.sas.com/t5/Statistical-Procedures/Outputting-Proc-PLM-data-into-excel-sheet/m-p/...

--
Paige Miller
PaigeMiller
Diamond | Level 26

1) I had originally plotted the figure using a PROC PLM and tried to export the Logfit  data (as the syntax below shows). But when I exported the dataset into a SPSS file, the number of observations was only 200 while my PROC GLM output indicated that the Number of Observations Used was 495. Would it be safe to assume that I have fewer observations in the Logfit data because depending on the range of the y-axis and the x-axis, not ALL observations are used to plot the figure? And ONLY the observations used in the figure are saved by the ods output fitplot=Logfit; syntax?

 

PLM creates 200 positions along the x-axis, computes the y-axis value and then draws the plot. These are NOT the observations from the original data set. I doubt that the choice of 200 has anything to do with the range of y-axis and range of x-axis.

 

The fact that the original data set had 495 observations is irrelevant here as those are used to determine the model fit, not the plot. Do not confuse data used to fit the model with data used to create the plot.

--
Paige Miller
Heejeong
Obsidian | Level 7

Thank you so much, @ballardw and @PaigeMiller for your responses.

I have been trying really hard to find additional information about exactly how Logfit data is created and what I should expect the data to look like, but couldn't find much information. So your responses are really valuable to me and I really appreciate them a lot. 

 

@PaigeMiller, thank you again for your response and I wanted to ask if the number 200 is determined by the size of the dataset I am using. Depending on the original size of the dataset, would some PROC PLM procedures create more than 200 positions along the x-axis? I just want to make sure that I understand this well! Your response also brings me a big relief and reassures me that I can export this dataset to re-create the same Figures in other programs. Would this be correct? Thank you for your continued guidance.

PaigeMiller
Diamond | Level 26

@Heejeong wrote:

 

@PaigeMiller, thank you again for your response and I wanted to ask if the number 200 is determined by the size of the dataset I am using. Depending on the original size of the dataset, would some PROC PLM procedures create more than 200 positions along the x-axis? I just want to make sure that I understand this well! Your response also brings me a big relief and reassures me that I can export this dataset to re-create the same Figures in other programs. Would this be correct? Thank you for your continued guidance.


I'm sure you know that if you are going to draw a perfectly straight line, you only need 2 data points. Apparently when PLM was programmed, they chose 200 data points as a default, which allows PLM to draw perfectly straight lines as well as many types of curved lines. Since this is the default, it has nothing to do with the size of the data set you are modeling. Since you can change the number 200 to anything else you want, it has nothing to do with the size of the data set your are modeling.

 

If you don't like 200, you can chose different number of points to plot using the GRIDSIZE= option of the EFFECTPLOT statement.

 

So, yes you can export the data set for plotting in other software. You might also want to try re-running this with GRIDSIZE=2 and GRIDSIZE=1000 so you can see for yourself what is happening, and how the plots won't change in the case of a perfectly straight line.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 536 views
  • 1 like
  • 3 in conversation