BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jengarcia1122
Calcite | Level 5

Hello SAS community,

I wanted to ask what SAS procedure would be adequate to make a conditional probability table. For example, a table that can display a probability for gender given age or something of that nature.

I was reading up on the proc logistic procedure and proc freq procedure. However, I don't have a model to do proc logistic as of yet, the data is pretty raw and I wanted to have some quick descriptive statistics involving age for some variables.

One more question for the community, if I do any modeling like in proc logistic, does my data need to be normally distributed?

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

It stratifies the data somewhat.

Basically, it says do this for every distinct value of Age. It also requires the data to be sorted on the BY value.

If you're going to start doing it for multiple variables you may want to look into some automation features such as macro's or call execute.

If this is clinical reporting there are quite a few examples of macros on clinical reporting on lexjansen.com

View solution in original post

6 REPLIES 6
ballardw
Super User

Proc freq will probably give you what you need for many uses. The main question is how to treat missing data. If you want to include missing as a valid category then you need to specify that otherwise a record with any of the requested variables is excluded from the results.

proc freq data=sashelp.class;

tables sex*weight /nofreq norow nocol ;

run;

will give cell percentages or sample probability of sex and weight. The total row and column have the marginal distribution. You can pipe the output to a dataset if you want to manipulate the data further. Note: FREQ only provides one output data set per Table statement and only the last one requested.

And since Proc Logistic wants yes/no outcomes and deals with small numbers of categories for the independent variables, the data does not have to be normally distributed.

jengarcia1122
Calcite | Level 5

Thank you for having me think about missing values. For the variable age that is the conditional variable and another variable, these do not have many missing values, less than 30%. Other variables do and I will assign weights when I come to them.

I think proc freq would work fine for descriptive statistics even if it's one table at a time.

Age is a continuous variable, but I'm thinking about converting this variable into a dichotomous variable or higher order categorical. Will proc freq also work if I leave the variable age as continuous or a higher order categorical variable?

Does a by statement after the tables statement in proc freq calculate conditional probability, for example if you code after the tables statement "by age"

or does it stratify the data somehow?

Thank you for your swift response!

Reeza
Super User

It stratifies the data somewhat.

Basically, it says do this for every distinct value of Age. It also requires the data to be sorted on the BY value.

If you're going to start doing it for multiple variables you may want to look into some automation features such as macro's or call execute.

If this is clinical reporting there are quite a few examples of macros on clinical reporting on lexjansen.com

jengarcia1122
Calcite | Level 5

Thank you Reeza for your insight

ballardw
Super User

Many groupings of data can be accomplished with custom formats with no need to add variables. Example:

proc format library=work;

value agegroup

0 - 12 = 'Pre-teen'

13-19 = 'Teen'

20-35 = '20-35'

36- 65= '36-65'

66-high= '66+'

;

run;

proc freq data=yourdataset;

tables age;

format age agegroup. ;

run;

And my previous comment related to single output datasets per TABLE statement. You can have more than one in a procedure call, especially if you want different options for some tables.

And you can look at way too many tables use () to create groups or variable lists. Some example tables statements:

tables (var1 var2 var3) * (var7 var8 var9); will produce cross tabs of var1 with var7, var8 and var9, var2 with var7,var8 and var9 and var3 with var7, var8 and var9 for a total of 9 tables. For readability you may want to consider using the LIST option if using 3 or more levels of crosses.

One thing to know: Proc freq with no table statements will generate one-level summaries for EVERY variable in your data set. Also there are key words you can use such as _numeric_ or _character_ to do something with all of those.

jengarcia1122
Calcite | Level 5

Thank you ballardw for your insight, it's very appreciated!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 4812 views
  • 7 likes
  • 3 in conversation