In today's data-driven world, organizations need tools that combine powerful analytics with accessibility. SAS enables users to build sophisticated statistical models, while Microsoft Excel remains a staple for data organization and reporting. By using SAS for Microsoft 365, users can seamlessly integrate SAS’s modeling capabilities with Excel’s familiar interface, leveraging the strengths of both platforms. In this post, we'll leverage SAS for Microsoft 365 to analyze factors contributing to a Congenital Heart Disease (CHD) diagnosis using a logistic regression model in SAS, and then highlight our findings in Excel.
Framingham Heart Study
For this analysis, we will be using the SASHELP.HEART table which provides results from the Framingham Heart Study which originates from a long-term cardiovascular study conducted in Framingham, Massachusetts, beginning in 1948. This dataset is widely used in epidemiological research to analyze risk factors for heart disease. It contains demographic, lifestyle, and clinical variables such as age, sex, weight, cholesterol levels, blood pressure, and smoking status, among others. The dataset has been instrumental in identifying key predictors of cardiovascular disease and remains a valuable resource for statistical modeling and machine learning applications in healthcare.
Inserting SAS Data into Excel
We'll start by creating a new workbook in Microsoft Excel, opening the SAS Viya pane, and navigating to the Data tab. From there, users can select a cataloged table or choose one from the CAS or SAS Compute Servers. Since SASHELP.HEART is located on the SAS Compute Server, we'll select All Tables from the dropdown menu, then navigate to SAS Studio Compute Context > SASHELP > HEART to open the table.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
After double-clicking the SASHELP.HEART table, the Results tab opens, allowing us to modify the table before inserting it into the workbook. To add all rows to the worksheet, we select All rows from the Insert rows dropdown menu. In the Table Options section, we check Display column labels and leave the remaining options unchecked.
In the Columns tab, we can choose which columns to insert into the workbook. We select the following:
Status: Alive or Dead
AgeCHDdiag: Age at CHD Diagnosis
Sex: Male or Female
AgeAtStart: Age at the start of the study
Chol_Status: Cholesterol Status: Desirable, Borderline, High
BP_Status: Blood Pressure Status: Optimal, Normal, High
Weight_Status: Weight Status: Underweight, Normal, Overweight
Smoking_Status: Smoking Status: Non-smoker, Light, Moderate, Heavy, Very Heavy
We will reorder the columns by selecting the Reorder icon , selecting the AgeCHDdiag column and using the arrows on the side to move it below AgeAtStart. If the arrows do not appear, try dragging the left-hand side of the SAS Viya pane to make it larger.
The Filter tab contains the Graphical and Expression builders to filter data by rows. The Graphical Builder provides a user-friendly, drag-and-drop interface for creating queries, making it easier for users to build and manipulate data without writing code. In contrast, the Expression Builder allows users to create custom expressions, calculations, and conditional logic using functions and operators, offering more flexibility for advanced data transformations.
We are not going to filter our data, so let's go to the Sort tab and sort the table by AgeAtStart in ascending order by selecting Add sort, choosing AgeAtStart from the dropdown menu and ensuring the Sort icon is set to ascending .
Select the Insert in document icon to insert the table into the Excel worksheet and label the worksheet tab Data. Notice in the data that some study participants have a value for Age CHD Diagnosed and others do not. We will run a logistic regression analysis to see what factors lead to participants being diagnosed with CHD.
Logistic Regression Analysis
Let's create another Excel worksheet called Analysis of CHD Diagnosis.
Navigating to the Programs tab of the SAS Viya pane, we'll use a SAS program to run a logistic regression analysis of factors contributing to CHD diagnosis.
In this SAS program, we are creating a new variable called CHDYes that is equal to 1 if the participant has a value for Age CHD Diagnosed and 0 if missing. Then, we use PROC LOGISTIC to run the a regression model with Sex, Age at Start, Cholesterol Status, Blood Pressure Status, Weight Status, and Smoking Status as explanatory variables for CHD Diagnosis.
/* Create a new dataset with CHDYes variable */
data heart_data;
set sashelp.heart;
if missing(AgeCHDDiag) then CHDYes = 0;
else CHDYes = 1;
run;
/* Perform logistic regression with specific output */
ods graphics on;
ods select ParameterEstimates OddsRatios ORPlot;
proc logistic data=heart_data descending plots=oddsratio;
class Sex Chol_Status(ref = "Desirable") BP_Status (ref = "Optimal")
Weight_Status (ref = "Normal") Smoking_Status (ref="Non-smoker") / param=ref;
model CHDYes = Sex AgeAtStart Chol_Status BP_Status Weight_Status Smoking_Status;
run;
ods graphics off;
Making sure the top left cell of the Excel worksheet is selected, click Insert in document to add the results of the regression analysis to the worksheet.
The regression analysis gave us a table of Maximum Likelihood Estimates along with a table and graph of Odds Ratio Estimates. To make it easier for viewers to hone in on statistically significant factors, we can use Excel to highlight the statistically significant factors in both tables.
From the analysis we see there are several statistically significant factors affecting the likelihood of a CHD diagnosis. For example, those who are classified as Overweight or Very Heavy smokers are more at risk of a CHD Diagnosis. We can use conditional formatting in Excel to highlight the cells with these values.
Males and those with a blood pressure status other than Optimal are more likely to be diagnosed with CHD. We can use Excel to create a new sheet called Blood Pressure Status by Sex and include a pivot table and pivot chart to understand the distribution of these two variables.
Conclusion
From this analysis, we were able to use the statistical modeling capabilities of SAS with the familiar interface of Microsoft Excel, all while never leaving the Excel app. To learn more about SAS for Microsoft 365 visit SAS for Microsoft 365 | SAS Support or sign up for the new SAS for Microsoft 365 course to become an expert user: Course: SAS for Microsoft 365.
Find more articles from SAS Global Enablement and Learning here.
... View more