BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lizzy28
Quartz | Level 8

I use both SAS and STATA to run a log-linear regression with the same dataset. The coefficient magnitudes were somehow different. One of the variables in my dataset had 18% missing values. I was wondering whether it was because SAS applied imputation when running regression.

Anyone knows the difference between SAS and STATA in running regression for data with missing values?

Thanks a lot.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I don't know about STATA but SAS with the most of the regression procedures will remove any record from the analysis that has any of the model variables missing. The diagnostics of the procedure should tell you how many records were actually used.

View solution in original post

6 REPLIES 6
ballardw
Super User

I don't know about STATA but SAS with the most of the regression procedures will remove any record from the analysis that has any of the model variables missing. The diagnostics of the procedure should tell you how many records were actually used.

lizzy28
Quartz | Level 8

Thanks, Ballard.

But when I run the data with missing values excluded, the result was different from that given by not excluding missing values on purpose. The key difference is that coefficient values were totally different.

I'm attaching my data in my thread. The way I used was:

proc reg data=temp;

  model lcost_all_adj=age_diag_gp1 age_diag_gp2 age_diag_gp4 age_diag_gp5 mhi_grp2 mhi_grp3 mhi_grp4 days_debr_grp2 days_debr_grp3 days_debr_grp4 flap

                  negpthp days_dswi_grp2 days_dswi_grp3 days_dswi_grp4 los_cs_grp2 los_cs_grp3 los_cs_grp4 los_cs_grp5 comorbid_grp2 comorbid_grp3

                  sepsis transf_bleedcomp;

run;

This gave me corrected total of 1198, as shown below:

Number of Observations Read1198
Number of Observations Used1198

Analysis of Variance
SourceDFSum of SquaresMean SquareF ValuePr>F
Model23441.7092419.2047524.58<.0001
Error1174917.169030.78123
Corrected Total11971358.87827

However, when I particularly excluded the missing values, as below

proc reg data=temp;

  model lcost_all_adj=age_diag_gp1 age_diag_gp2 age_diag_gp4 age_diag_gp5 mhi_grp2 mhi_grp3 mhi_grp4 days_debr_grp2 days_debr_grp3 days_debr_grp4 flap

                  negpthp days_dswi_grp2 days_dswi_grp3 days_dswi_grp4 los_cs_grp2 los_cs_grp3 los_cs_grp4 los_cs_grp5 comorbid_grp2 comorbid_grp3

                  sepsis transf_bleedcomp;

  where mhi_ctg^=.;

run;

I have

Number of Observations Read992
Number of Observations Used992

Analysis of Variance
SourceDFSum of SquaresMean SquareF ValuePr > F
Model23381.08116.5687421.28<.0001
Error968753.83160.77875
Corrected Total9911134.913
ballardw
Super User

Since your variable mhi_ctg does not appear as a model variable in the first code then those records were not filtered out. When you add it in the second then you are excluding records that have non-missing values for all of the model variables, looks like about 200 of them. I would expect to get different results with about one-fifth of the records excluded.

lizzy28
Quartz | Level 8

Sorry for the confusion. mhi_grp1-4 were derived from mhi_ctg as mutually exclusive dummy variables. As mhi_grp2-4 were included in both regressions, I believe that they were supposed to run the same data set. Thanks.

lizzy28
Quartz | Level 8

I figured out what the problem was. After I recoded mhi_ctg into four dummy variables mhi_grp1-4, mhi_grp1 was excluded from the regression, and thus the observations with missing values in mhi_grp1 were treated the same way as the ones taking 1 in the variable.

Thank you, Ballard and Paige!

PaigeMiller
Diamond | Level 26

I was wondering whether it was because SAS applied imputation when running regression.

SAS does not impute missing values in regression. It does not include observations with missing values among the model terms in the regression calculations.

--
Paige Miller

sas-innovate-2024.png

 

Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.

 

Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer. 

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1877 views
  • 3 likes
  • 3 in conversation