- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am still a novice ... so easily confused! I was told to compute the means for all explanatory variables (SAS 9.4). But, the variables are all categorical variables. For example, I tried to get the means for the two health reasons (physical and mental health), coded as 1 and 2. I used the following code:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since you want to compute a mean of a categorical variable, would you please explain in your own words what that means? Don't even use SAS or other computer language in your explanation, just try to articulate what the mean of a categorical variable really means.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To be clear, I don't see the point in computing means ... However, since I must ...
The means is an average of all counts within an observation. So let's take health reason. I want to know how many people in my sample have physical illness vs. those with mental illness. A proc freq provides a descriptive table and tells me that (making up a number here) 122, 000 have PI and 132, 000 have MI; the percentage is provided in a SAS output so I know my sample consists of 46% (again only for example sake). However, I need to also do this for the regression model in which I have 10 explanatory variables. The means and STD do not get automatically generated in a proc gen. How do I get the same information with the model as I did with the descriptive stat?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Okay, good, we agree that it doesn't make sense to compute a mean of a categorical variable. The only descriptive statistic that we can compute is the percent in each category.
However, I need to also do this for the regression model in which I have 10 explanatory variables.
I don't know what this means, or why you need to do this for a regression model.
The means and STD do not get automatically generated in a proc gen. How do I get the same information with the model as I did with the descriptive stat?
Do you mean proc reg? Why do you need means and standard deviations here, if we have already established that these can't be computed for categorical variables. Explain.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Oh good. So I am not a complete idiot! This is the answer I received when I asked that very question:
"Generally, for explanatory variable descriptives one would do means and standard deviations for continuous variables. For categorical variables it is best to present the proportion of the sample in each category. You will need to also make a distinction between a person, versus observation months on a person within a spell, versus multiple spells of a person in the program. The data is currently structured so that observation months are stacked. If you simple do a proc mean (or other stat) for a variable, you will get a mean value for all the observations month across multiple spells within and across persons. To have a person file, you need to keep only one observation per person and then do a mean (or other stat). To have a spell file, you need to keep only one observation per spell and then do mean (or other stat). What observation to keep for a person or a spell is also critical. For example, only the last observation in a spell has info about exit"
Again. I am using the proc gen procedure with a clog-log model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@pammers wrote:
"Generally, for explanatory variable descriptives one would do means and standard deviations for continuous variables. For categorical variables it is best to present the proportion of the sample in each category. You will need to also make a distinction between a person, versus observation months on a person within a spell, versus multiple spells of a person in the program. The data is currently structured so that observation months are stacked. If you simple do a proc mean (or other stat) for a variable, you will get a mean value for all the observations month across multiple spells within and across persons. To have a person file, you need to keep only one observation per person and then do a mean (or other stat). To have a spell file, you need to keep only one observation per spell and then do mean (or other stat). What observation to keep for a person or a spell is also critical. For example, only the last observation in a spell has info about exit"
Without knowing your data, a lot of the above is gibberish. I thought only witches and warlocks had multiple spells. But in general, I have not come across the noun "spell" in any particular field of endeavor other than writing and casting spells over people. Most of the above quote from your professor (?) is meaningless to me in the statistical context of fitting a model.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. I realize that but not sure how to do that. I used
by first.X (x being my variable). Anyway exhausted last night so maybe I will figure it out today. I used proc genmod to generate the model. Sorry, I confused you on that. So to get the means of each variable I have been using this code:
Proc means Data=onespell (the constructed variable)
Class X
Var (all my list of dependent variables)
Thank you for your patience and kindness.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@pammers wrote:
To be clear, I don't see the point in computing means ... However, since I must ...
You would never compute the mean of a categorical variable so don't bother.
If you recoded them as 0/1 and then calculate the mean you'll get the EXACT same information as the percentage. Check it. If they're off you likely haven't accounted for missing the same in both procedures.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Provide a worked small example of what you want. A small input set and the desired result.
Otherwise you are talking around a bunch of next to nonsense.
IF you have an order to your categorical variable, which you have not stated or shown in any way, the concept of "median" as the middle value might apply.
If I have values of a, b, c, a, c, d, p, d, q where the "order" is normal alphabetical order then the data could be reordered to
a,a,b,c,c,d,d,p,q.
Of the 9 elements shown then the "median" would be the second c as 4 values come before and after.
If you have an even number such as 18 elements, then the 9th and 10th would have to be considered and tie breaking becomes an issue if both values are different. If the 9th were "m" and the 10th were "s" what value to pick as the middle might be more problematic.
And your apparent quote helps not as it appears to be a response to something without context.