turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- What procedure to use for conditional probability ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2014 07:11 PM

Hello SAS community,

I wanted to ask what SAS procedure would be adequate to make a conditional probability table. For example, a table that can display a probability for gender given age or something of that nature.

I was reading up on the proc logistic procedure and proc freq procedure. However, I don't have a model to do proc logistic as of yet, the data is pretty raw and I wanted to have some quick descriptive statistics involving age for some variables.

One more question for the community, if I do any modeling like in proc logistic, does my data need to be normally distributed?

Thank you.

Accepted Solutions

Solution

11-03-2014
10:10 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2014 10:10 PM

It stratifies the data somewhat.

Basically, it says do this for every distinct value of Age. It also requires the data to be sorted on the BY value.

If you're going to start doing it for multiple variables you may want to look into some automation features such as macro's or call execute.

If this is clinical reporting there are quite a few examples of macros on clinical reporting on lexjansen.com

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2014 07:38 PM

Proc freq will probably give you what you need for many uses. The main question is how to treat missing data. If you want to include missing as a valid category then you need to specify that otherwise a record with any of the requested variables is excluded from the results.

proc freq data=sashelp.class;

tables sex*weight /nofreq norow nocol ;

run;

will give cell percentages or sample probability of sex and weight. The total row and column have the marginal distribution. You can pipe the output to a dataset if you want to manipulate the data further. Note: FREQ only provides one output data set per Table statement and only the last one requested.

And since Proc Logistic wants yes/no outcomes and deals with small numbers of categories for the independent variables, the data does not have to be normally distributed.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2014 08:32 PM

Thank you for having me think about missing values. For the variable age that is the conditional variable and another variable, these do not have many missing values, less than 30%. Other variables do and I will assign weights when I come to them.

I think proc freq would work fine for descriptive statistics even if it's one table at a time.

Age is a continuous variable, but I'm thinking about converting this variable into a dichotomous variable or higher order categorical. Will proc freq also work if I leave the variable age as continuous or a higher order categorical variable?

Does a by statement after the tables statement in proc freq calculate conditional probability, for example if you code after the tables statement "by age"

or does it stratify the data somehow?

Thank you for your swift response!

Solution

11-03-2014
10:10 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2014 10:10 PM

It stratifies the data somewhat.

Basically, it says do this for every distinct value of Age. It also requires the data to be sorted on the BY value.

If you're going to start doing it for multiple variables you may want to look into some automation features such as macro's or call execute.

If this is clinical reporting there are quite a few examples of macros on clinical reporting on lexjansen.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2014 09:20 AM

Thank you Reeza for your insight

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-04-2014 12:46 PM

Many groupings of data can be accomplished with custom formats with no need to add variables. Example:

proc format library=work;

value agegroup

0 - 12 = 'Pre-teen'

13-19 = 'Teen'

20-35 = '20-35'

36- 65= '36-65'

66-high= '66+'

;

run;

proc freq data=yourdataset;

tables age;

format age agegroup. ;

run;

And my previous comment related to single output datasets per TABLE statement. You can have more than one in a procedure call, especially if you want different options for some tables.

And you can look at way too many tables use () to create groups or variable lists. Some example tables statements:

tables (var1 var2 var3) * (var7 var8 var9); will produce cross tabs of var1 with var7, var8 and var9, var2 with var7,var8 and var9 and var3 with var7, var8 and var9 for a total of 9 tables. For readability you may want to consider using the LIST option if using 3 or more levels of crosses.

One thing to know: Proc freq with no table statements will generate one-level summaries for EVERY variable in your data set. Also there are key words you can use such as _numeric_ or _character_ to do something with all of those.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2014 09:19 AM

Thank you ballardw for your insight, it's very appreciated!