Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- Re: How differently do SAS and STATA deal with missing values when run...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-15-2015 09:44 AM
(1898 views)

I use both SAS and STATA to run a log-linear regression with the same dataset. The coefficient magnitudes were somehow different. One of the variables in my dataset had 18% missing values. I was wondering whether it was because SAS applied imputation when running regression.

Anyone knows the difference between SAS and STATA in running regression for data with missing values?

Thanks a lot.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, Ballard.

But when I run the data with missing values excluded, the result was different from that given by not excluding missing values on purpose. The key difference is that coefficient values were totally different.

I'm attaching my data in my thread. The way I used was:

proc reg data=temp;

model lcost_all_adj=age_diag_gp1 age_diag_gp2 age_diag_gp4 age_diag_gp5 mhi_grp2 mhi_grp3 mhi_grp4 days_debr_grp2 days_debr_grp3 days_debr_grp4 flap

negpthp days_dswi_grp2 days_dswi_grp3 days_dswi_grp4 los_cs_grp2 los_cs_grp3 los_cs_grp4 los_cs_grp5 comorbid_grp2 comorbid_grp3

sepsis transf_bleedcomp;

run;

This gave me corrected total of 1198, as shown below:

Number of Observations Read | 1198 |
---|---|

Number of Observations Used | 1198 |

Analysis of Variance | |||||

Source | DF | Sum of Squares | Mean Square | F Value | Pr>F |

Model | 23 | 441.70924 | 19.20475 | 24.58 | <.0001 |
---|---|---|---|---|---|

Error | 1174 | 917.16903 | 0.78123 | ||

Corrected Total | 1197 | 1358.87827 |

However, when I particularly excluded the missing values, as below

proc reg data=temp;

model lcost_all_adj=age_diag_gp1 age_diag_gp2 age_diag_gp4 age_diag_gp5 mhi_grp2 mhi_grp3 mhi_grp4 days_debr_grp2 days_debr_grp3 days_debr_grp4 flap

negpthp days_dswi_grp2 days_dswi_grp3 days_dswi_grp4 los_cs_grp2 los_cs_grp3 los_cs_grp4 los_cs_grp5 comorbid_grp2 comorbid_grp3

sepsis transf_bleedcomp;

where mhi_ctg^=.;

run;

I have

Number of Observations Read | 992 |
---|---|

Number of Observations Used | 992 |

Analysis of Variance | |||||

Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |

Model | 23 | 381.081 | 16.56874 | 21.28 | <.0001 |

Error | 968 | 753.8316 | 0.77875 | ||

Corrected Total | 991 | 1134.913 |

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I figured out what the problem was. After I recoded mhi_ctg into four dummy variables mhi_grp1-4, mhi_grp1 was excluded from the regression, and thus the observations with missing values in mhi_grp1 were treated the same way as the ones taking 1 in the variable.

Thank you, Ballard and Paige!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I was wondering whether it was because SAS applied imputation when running regression.

SAS does not impute missing values in regression. It does not include observations with missing values among the model terms in the regression calculations.

--

Paige Miller

Paige Miller

📢

**ANNOUNCEMENT**

The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.** **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.