Solved: Re: What procedure to use for conditional probability table

jengarcia1122 · Posted 11-03-2014 07:11 PM

Hello SAS community,

I wanted to ask what SAS procedure would be adequate to make a conditional probability table. For example, a table that can display a probability for gender given age or something of that nature.

I was reading up on the proc logistic procedure and proc freq procedure. However, I don't have a model to do proc logistic as of yet, the data is pretty raw and I wanted to have some quick descriptive statistics involving age for some variables.

One more question for the community, if I do any modeling like in proc logistic, does my data need to be normally distributed?

Thank you.

Reeza · Posted 11-03-2014 10:10 PM

It stratifies the data somewhat.

Basically, it says do this for every distinct value of Age. It also requires the data to be sorted on the BY value.

If you're going to start doing it for multiple variables you may want to look into some automation features such as macro's or call execute.

If this is clinical reporting there are quite a few examples of macros on clinical reporting on lexjansen.com

View solution in original post

ballardw · Posted 11-03-2014 07:38 PM

Proc freq will probably give you what you need for many uses. The main question is how to treat missing data. If you want to include missing as a valid category then you need to specify that otherwise a record with any of the requested variables is excluded from the results.

proc freq data=sashelp.class;

tables sex*weight /nofreq norow nocol ;

run;

will give cell percentages or sample probability of sex and weight. The total row and column have the marginal distribution. You can pipe the output to a dataset if you want to manipulate the data further. Note: FREQ only provides one output data set per Table statement and only the last one requested.

And since Proc Logistic wants yes/no outcomes and deals with small numbers of categories for the independent variables, the data does not have to be normally distributed.

jengarcia1122 · Posted 11-03-2014 08:32 PM

Thank you for having me think about missing values. For the variable age that is the conditional variable and another variable, these do not have many missing values, less than 30%. Other variables do and I will assign weights when I come to them.

I think proc freq would work fine for descriptive statistics even if it's one table at a time.

Age is a continuous variable, but I'm thinking about converting this variable into a dichotomous variable or higher order categorical. Will proc freq also work if I leave the variable age as continuous or a higher order categorical variable?

Does a by statement after the tables statement in proc freq calculate conditional probability, for example if you code after the tables statement "by age"

or does it stratify the data somehow?

Thank you for your swift response!

Reeza · Posted 11-03-2014 10:10 PM

It stratifies the data somewhat.

Basically, it says do this for every distinct value of Age. It also requires the data to be sorted on the BY value.

If you're going to start doing it for multiple variables you may want to look into some automation features such as macro's or call execute.

If this is clinical reporting there are quite a few examples of macros on clinical reporting on lexjansen.com

jengarcia1122 · Posted 11-05-2014 09:20 AM

Thank you Reeza for your insight

ballardw · Posted 11-04-2014 12:46 PM

Many groupings of data can be accomplished with custom formats with no need to add variables. Example:

proc format library=work;

value agegroup

0 - 12 = 'Pre-teen'

13-19 = 'Teen'

20-35 = '20-35'

36- 65= '36-65'

66-high= '66+'

;

run;

proc freq data=yourdataset;

tables age;

format age agegroup. ;

run;

And my previous comment related to single output datasets per TABLE statement. You can have more than one in a procedure call, especially if you want different options for some tables.

And you can look at way too many tables use () to create groups or variable lists. Some example tables statements:

tables (var1 var2 var3) * (var7 var8 var9); will produce cross tabs of var1 with var7, var8 and var9, var2 with var7,var8 and var9 and var3 with var7, var8 and var9 for a total of 9 tables. For readability you may want to consider using the LIST option if using 3 or more levels of crosses.

One thing to know: Proc freq with no table statements will generate one-level summaries for EVERY variable in your data set. Also there are key words you can use such as _numeric_ or _character_ to do something with all of those.

jengarcia1122 · Posted 11-05-2014 09:19 AM

Thank you ballardw for your insight, it's very appreciated!

Registration is open

SAS Training: Just a Click Away