Hi all,
Thank you in advance for your help. I was wondering, is it possible to have SAS print the number of observations used for each level of categorical variables in a multilevel model (i.e, those listed in the CLASS statement)?
Below is the code that I used to build my model. In the output, I see that the degrees of freedom are provided for each variable. As I complete my write-up, though, it would be very helpful to know the n for each level of the categorical variables so I can provide that to the reader.
I apologize if the code provided below is not as neat as it could be—I am still learning multilevel modeling, so any suggestions to improve the code beyond my question are welcome.
Please note that I am using SAS University Edition.
title trust cred -- full model;
proc mixed covtest;
class Instructor_Num rGender rMajor RrInt rLang
Rinst_gender inst_Comm_Hours inst_Grad_NotGrad inst_Masters inst_International;
model cred_trust =
rGender rMajor RrInt rLang
Rinst_gender inst_Comm_Hours inst_Grad_NotGrad inst_Masters inst_International
behav_eng_2_grand1
agentic_eng_2_grand1
cog_eng_2_grand1
em_eng_2_grand1
Autonomy_Support_grand1
Autonomy_Support_grand2
Autonomy_Support_grand2|Autonomy_Support_grand1
/solution cl ddfm=Satterthwaite;
random intercept agentic_eng_2_grand1 Autonomy_Support_grand1 / sub=Instructor_Num cl type=un;
Thank you!
For any future SAS users who need descriptive statistics for their categorical hierarchical data, the code below was provided by a SAS support technician and exactly addressed this issue.
Note that outp=outdata
needs to be added to the model statement in your proc mixed model. The student-level (i.e., Level 1) variables in this example are rGender and rClass; while the instructor-level variables (i.e., Level 2) are Inst_Gender and Inst_Inter.
I hope this helps!
proc mixed covtest;
class Instructor_Num rGender rClass Inst_gender Inst_inter;
model cred_trust = rGender rClass Inst_gender Inst_inter /solution cl ddfm=Satterthwaite outp=outdata;
random intercept / sub=Instructor_Num cl type=vc;
run;
//*student level variables*//
proc freq data=outdata;
where resid is not missing;
tables rGender rClass;
run;
//*instructor level variables*//
proc sort data=outdata nodupkey out=out;
where resid is not missing;
by instructor_num;
run;
proc freq data=out;
tables Inst_Gender Inst_Inter;
run;
Hint:
Provide a simpler example using either a SAS supplied data set or your data. Something that you can show what you want. Then show the desired output for the given data.
I suspect that some form of Proc Freq would provide what you want but I can't tell if you want each level of individual variables OR the combination of all of the Class variables.
I think you might try:
Proc freq data=yourdata;
where not missing(Instructor_Num) and not missing(rGender) ;/* continue until you have all of the
variables included with not missing(variable) */ tables Instructor_Num* rGender* rMajor* RrInt* rLang* Rinst_gender *inst_Comm_Hours* inst_Grad_NotGrad* inst_Masters* inst_International / list; run;
It may be that you would use proc freq to generate an output data set so you could use options in proc print or other report procedure to make a prettier output.
Thank you so much for your reply. I ran the code that you provided and I think it is nearly what I was hoping for (i.e., it could just need a few changes that I am not informed enough to know how to change).
You're absolutely right that I should have included a sample of my dataset and simpler code. Even after following the directions on the SAS website, I wasn't sure how to export just a sample of 100 observations from my larger dataset into a .SAS file. However, I was able to export it to an Excel file.
I really appreciate your help. I'll try to be less ambiguous in my explanation as I break down the model below. The code and dataset below are for a 2-level model where Instructor_Num is the grouping variable. There are two student-level (Level-1) categorical variables, rGender (female = 1, male = 0) and rClass (Sr. =1, Jr. = 2, Soph = 3, Fresh = 4) and two instructor-level categorical variables Inst_Gender (female = 1, male = 0) and whether or not they are international instructors (not international = 1, international = 0)—all predicting a measure of trust.
In my final model (i.e., not the sample one here), I have 1146 used observations and
702 unused observations. For each level of the categorical IVs listed above, how can I get SAS to print the number of observations used in the final model for that level? I know how to run frequencies in SAS, but how do I find out how many actual observations were included in my model? SAS, of course, knows this number because it's used to calculate the degrees of freedom for each variable.
Using the sample dataset provided, ideally, the table would look something like the one below (or at least provide the same information in a different format):
Variable | Level | n |
rGender | Male (0) | 48 |
rGender | Female (1) | 37 |
rClass | Sr (1) | 0 |
rClass | Jr (2) | 18 |
rClass | Soph (3) | 27 |
rClass | Fresh (4) | 38 |
Inst_Gender | Male (0) | 1 |
Inst_Gender | Female (1) | 1 |
Inst_Inter | International (0) | 1 |
Inst_Inter | Not International | 1 |
Please note that there are only two instructors in the sample dataset, a non-international female and an international male. Since instructor_number is the grouping variable in my multilevel models, SAS already knows to treat inst_gender and inst_inter as level-2 variables; whereas when I just run frequencies, the output simply tells me the number of times each integer appears for those variables.
title SAS help 100 obs;
proc mixed covtest;
class Instructor_Num rGender rClass Inst_gender Inst_inter;
model cred_trust = rGender rClass Inst_gender Inst_inter /solution cl ddfm=Satterthwaite;
random intercept / sub=Instructor_Num cl type=vc;
If this the answer is not immediately clear, please let me know and I will try to repost the question to the community with a clearer explanation of my question.
Thank you!
the two steps are
1. Get the complete cases that are analyzed by the procedure. This is the set of observations for which none of the continuous or CLASS variables have missing values.
2. Perform a PROC FREQ analysis to count the levels of the CLASS variables.
Here is an example with a data set in SASHELP. If it's not quite what you want, you can follow up with questions that use this data set, which we can all access.
%let dsname = sashelp.heart;
%let contVars = MRW Cholesterol Systolic;
%let classVars = Sex Chol_Status BP_Status Weight_Status Smoking_Status;
data CompleteCases;
set &dsname(keep=&contVars &classVars);
if ^cmiss(of _ALL_);
run;
proc freq data=CompleteCases;
tables &classVars / nocum;
run;
I am still trying to figure out why you believe that this is important to PROC MIXED. The output will give the number of subjects included in the analysis. Because MIXED is likelihood based, it can use all cases that are not completely missing covariates or are not missing the response variable. Using ddfm=satterthwaite or ddfm=kr2 will calculate degrees of freedom based on matrix equations that involve copies of the mixed model equations If you are concerned about assumptions of missing (completely) at random, then you should look at various arrangements of your dataset. If it is apparent that there is a pattern to the missing values, then you may need to consider multiple imputation and all the work that method implies. However, if there is no apparent pattern to missing values, then you have some assurance that the missing at random assumption holds, and that a mixed model approach is appropriate for analyzing the data. One of the great advantages (in my opinion) of mixed model analysis is that it cares little about missing data, whereas ordinary least squares requires that there be no missing data.
So how to get the numbers? Consider PROC SUMMARY, which will give the numbers for each level of the CLASS variables as well as for the interactions. This is a great tool for finding patterns of missing values.
SteveDenham
Steve's response made me realize that I was thinking about a GLM analysis in which only complete cases are used. As Steve points out, that is not the case with PROC MIXED. Sorry that I muddied the issue.
If you want to follow Steve's recommendations, you can use PROC MI and the NIMPUTE=0 option to examine the pattern of missing values. If that is not sufficient, there are also ways to visualize patterns of missing values in SAS.
Here's the PROC SUMMARY code:
proc summary data=one;
class Instructor_Num rGender rClass Inst_gender inst_inter;
var cred_trust;
output out=counts2 n=n nmiss=nmiss;
run;
SteveDenham
Hi Steve,
Thank you very much for your detailed reply. With your help and input from @ballardw, I believe I have much of the information I need about the level-1 student variables (rGender and rClass). I am very appreciative. I do have one follow-up question regarding the level-2 variables, though.
After seeing the output from the code provided by the helpful users in these threads, I believe I can now explain further why I was focused on (somehow) getting the requested information directly from the PROC MIXED model. This is because, as you know, PROC MIXED takes into account the hierarchical nature of the data (students nested within instructors) when I indicate Instructor_Num as the grouping variable—and I do not know how to replicate this using the summary, frequency, or tabulate procedures.
To explain, looking at the attached sample dataset, when I run the code that you helpfully provided, it says that there are dozens of observations of female and male instructors in the dataset. In reality, there are only two instructors in the attached spreadsheet, instructor one (instructor_num = 1) is female (inst_gender = 1) and instructor two (instructor_num = 2) is male (inst_gender = 0).
Ideally, when analyzing the full dataset, I would like to run a procedure that tells me the number of female and male instructors, as well as my other level-2 categorical variables. Is this possible?
Thank you again.
1) you don't show which code you used that shows "that there are dozens of observations of female instructors".
2) Your example data shows dozens of records where the instructor is female anything that does not directly tie the insturctor id to gender will do that.
Which likely means that you did not describe the requirement carefully enough.
3) XLSX files are not data sets. Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
First, thank you for all of the help. I want to mention again that my issue is partially resolved, I have the student-level n's. I will address the points you raised here to clarify my remaining problem.
1) I have tried the code suggested by @SteveDenham, @Rick_SAS, and yourself and I will attempt to clarify the issue I was referring to in my previous reply. As an example, when I amend the code that you helpfully provided to only examine the instructor_number variable and gender (for the sake of this discussion), it provides the table below:
Instructor_Num Inst_gender Frequency Percent CumulativeFrequency CumulativePercent1 female2 male
22 | 22.22 | 22 | 22.22 |
77 | 77.78 | 99 | 100.00 |
In reality, the instructor_num is a grouping variable and inst_gender is a level-2 variable—so there are actually only one female instructor (instructor_num = 1) and one male (inst_number = 2) in the sample dataset. More details are provided below. As a note, the code provided by the other contributors has the same issue. It outputs the raw number of times observations appear—which does not provide the information I need for my level-2 variables.
2) Your example data shows dozens of records where the instructor is female anything that does not directly tie the instructor id to gender will do that.
Which likely means that you did not describe the requirement carefully enough.
2) This point gets to the fundamental issue that I am having. The data are nested. In the dataset, each row represents one student (which is typical, of course). However, the variables that relate to instructors are at level 2 and grouped by the Intructor_Num variable. In the PROC MIXED function, on the random line of my code (re-posted below), the sub= line tells SAS that Instructor_ID is the grouping variable that the rows of data are nested within (it need to be in the CLASS statement as well). The nested structure has not been taken into consideration using the methods suggested so far (at least as far as I have been able to work out).
In my sample dataset, the first 22 rows all have instructor_num = 1. Similarly, inst_gender and inst_inter also have all 1s for the first 22 rows. In my multilevel model, SAS identifies that all rows (students) where instructor_num=1 refer to students who had Instructor 1 as a teacher who is a female, non-international instructor. This allowed me to use the PROC MIXED function. However, the result is that the recommended procedures to "count" the observations do not work at the teacher level since an instructor with 22 students will have their gender indicated 22 times, and this would be the case for each instructor.
So, yes, I understand your point that when you open the dataset, it appears as though there are dozens of records where the instructor is female—but since they are all Instructor 1, it refers to one person.
Ultimately, I have over 50 instructors who are teaching +1,000 students in the complete dataset. The instructor number and the other categorical demographics for each instructor are provided for every row in the dataset (i.e., every student). Is there a way to examine the n's for the different levels of these teacher variables? Please note that I don't need to see interactions among them. For example, I would need the number of female/male instructors and the number of international/domestic instructors, but not an interaction between the two demographics (inst_gender*inst_inter).
3) I apologize for only sharing Excel files. SAS University Edition does not allow me to export .sas7bdat files and I was not sure of a better way to share a subsample of my larger dataset. Before my initial post, I worked for hours to find a way to export the more accessible SAS dataset to post in this forum; and I attempted to do so again after reading your reply. Unfortunately, I have still not been able to export data to a SAS data file. I've tried several methods, including attempting to export from SAS to .sas7bdat and SPSS to .sas7bdat, as well as other SAS data formats. I doubt this will be more helpful, but I've attached an SPSS datafile in this post in case it is somehow more usable.
Previously-posed multilevel SAS code for reference since I refer to it in this reply:
title SAS help 100 obs;
proc mixed covtest;
class Instructor_Num rGender rClass Inst_gender Inst_inter;
model cred_trust = rGender rClass Inst_gender Inst_inter /solution cl ddfm=Satterthwaite;
random intercept / sub=Instructor_Num cl type=vc;
@nbrophy wrote:
Using the sample dataset provided, ideally, the table would look something like the one below (or at least provide the same information in a different format):
Variable Level n rGender Male (0) 48 rGender Female (1) 37 rClass Sr (1) 0 rClass Jr (2) 18 rClass Soph (3) 27 rClass Fresh (4) 38 Inst_Gender Male (0) 1 Inst_Gender Female (1) 1 Inst_Inter International (0) 1 Inst_Inter Not International 1
You can make an output table similar to the above with proc tabulate .
proc tabulate data=have;
class rgender rclass inst_gender inst_inter/ missing;
table rgender rclass inst_gender inst_inter ,
n
;
run;
Which will display the levels, including missing. If you have formats to display custom text for numeric variables and use them then the text would display instead of the numeric value. If you want both then either 1) modify the format (easiest) or you will need to go through some code that isn't trivial.
A custom format like:
Proc format;
value mygender
1 = 'Female (1)'
0 = 'Male (0)'
;
would display the desired value in proc tabulate with a format statement like:
format rgender inst_gender mygender. ;
Note that multiple variables with the same coding can share formats.
For any future SAS users who need descriptive statistics for their categorical hierarchical data, the code below was provided by a SAS support technician and exactly addressed this issue.
Note that outp=outdata
needs to be added to the model statement in your proc mixed model. The student-level (i.e., Level 1) variables in this example are rGender and rClass; while the instructor-level variables (i.e., Level 2) are Inst_Gender and Inst_Inter.
I hope this helps!
proc mixed covtest;
class Instructor_Num rGender rClass Inst_gender Inst_inter;
model cred_trust = rGender rClass Inst_gender Inst_inter /solution cl ddfm=Satterthwaite outp=outdata;
random intercept / sub=Instructor_Num cl type=vc;
run;
//*student level variables*//
proc freq data=outdata;
where resid is not missing;
tables rGender rClass;
run;
//*instructor level variables*//
proc sort data=outdata nodupkey out=out;
where resid is not missing;
by instructor_num;
run;
proc freq data=out;
tables Inst_Gender Inst_Inter;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.