I am still a novice ... so easily confused! I was told to compute the means for all explanatory variables (SAS 9.4). But, the variables are all categorical variables. For example, I tried to get the means for the two health reasons (physical and mental health), coded as 1 and 2. I used the following code:
Since you want to compute a mean of a categorical variable, would you please explain in your own words what that means? Don't even use SAS or other computer language in your explanation, just try to articulate what the mean of a categorical variable really means.
To be clear, I don't see the point in computing means ... However, since I must ...
The means is an average of all counts within an observation. So let's take health reason. I want to know how many people in my sample have physical illness vs. those with mental illness. A proc freq provides a descriptive table and tells me that (making up a number here) 122, 000 have PI and 132, 000 have MI; the percentage is provided in a SAS output so I know my sample consists of 46% (again only for example sake). However, I need to also do this for the regression model in which I have 10 explanatory variables. The means and STD do not get automatically generated in a proc gen. How do I get the same information with the model as I did with the descriptive stat?
Okay, good, we agree that it doesn't make sense to compute a mean of a categorical variable. The only descriptive statistic that we can compute is the percent in each category.
However, I need to also do this for the regression model in which I have 10 explanatory variables.
I don't know what this means, or why you need to do this for a regression model.
The means and STD do not get automatically generated in a proc gen. How do I get the same information with the model as I did with the descriptive stat?
Do you mean proc reg? Why do you need means and standard deviations here, if we have already established that these can't be computed for categorical variables. Explain.
Oh good. So I am not a complete idiot! This is the answer I received when I asked that very question:
"Generally, for explanatory variable descriptives one would do means and standard deviations for continuous variables. For categorical variables it is best to present the proportion of the sample in each category. You will need to also make a distinction between a person, versus observation months on a person within a spell, versus multiple spells of a person in the program. The data is currently structured so that observation months are stacked. If you simple do a proc mean (or other stat) for a variable, you will get a mean value for all the observations month across multiple spells within and across persons. To have a person file, you need to keep only one observation per person and then do a mean (or other stat). To have a spell file, you need to keep only one observation per spell and then do mean (or other stat). What observation to keep for a person or a spell is also critical. For example, only the last observation in a spell has info about exit"
Again. I am using the proc gen procedure with a clog-log model.
@pammers wrote:
"Generally, for explanatory variable descriptives one would do means and standard deviations for continuous variables. For categorical variables it is best to present the proportion of the sample in each category. You will need to also make a distinction between a person, versus observation months on a person within a spell, versus multiple spells of a person in the program. The data is currently structured so that observation months are stacked. If you simple do a proc mean (or other stat) for a variable, you will get a mean value for all the observations month across multiple spells within and across persons. To have a person file, you need to keep only one observation per person and then do a mean (or other stat). To have a spell file, you need to keep only one observation per spell and then do mean (or other stat). What observation to keep for a person or a spell is also critical. For example, only the last observation in a spell has info about exit"
Without knowing your data, a lot of the above is gibberish. I thought only witches and warlocks had multiple spells. But in general, I have not come across the noun "spell" in any particular field of endeavor other than writing and casting spells over people. Most of the above quote from your professor (?) is meaningless to me in the statistical context of fitting a model.
@pammers wrote:
To be clear, I don't see the point in computing means ... However, since I must ...
You would never compute the mean of a categorical variable so don't bother.
If you recoded them as 0/1 and then calculate the mean you'll get the EXACT same information as the percentage. Check it. If they're off you likely haven't accounted for missing the same in both procedures.
Provide a worked small example of what you want. A small input set and the desired result.
Otherwise you are talking around a bunch of next to nonsense.
IF you have an order to your categorical variable, which you have not stated or shown in any way, the concept of "median" as the middle value might apply.
If I have values of a, b, c, a, c, d, p, d, q where the "order" is normal alphabetical order then the data could be reordered to
a,a,b,c,c,d,d,p,q.
Of the 9 elements shown then the "median" would be the second c as 4 values come before and after.
If you have an even number such as 18 elements, then the 9th and 10th would have to be considered and tie breaking becomes an issue if both values are different. If the 9th were "m" and the 10th were "s" what value to pick as the middle might be more problematic.
And your apparent quote helps not as it appears to be a response to something without context.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.