Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- aggregating frequencies

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-07-2017 01:25 PM - edited 09-07-2017 01:26 PM

Hello,

I have a question about aggregating frequencies. In short, each observation has a family ID and a marker that says if the observation is for an adult or a child. (example data below). NOTE: the real dataset included 1000s of families.

FamID adlt chld

001 1 0

001 0 1

001 0 1

002 1 0

002 1 0

002 0 1

002 0 1

002 0 1

002 0 1

003 1 0

003 0 1

003 0 1

004 1 0

004 0 1

004 0 1

I need to determine how many families have 1 child, or 2 children, etc...

The output should look like the following:

#Child #families

1 0

2 3

3 0

4 1

Can anyone offer advise on how to approach this? I tried the following without success:

PROC SUMMARY data=in;

class FamID;

var chld adlt;

output out=agg sum=chld adlt;

run;

PROC TABULATE data=in;

class FamID;

var chld;

table chld;

run;

I'm sure that I'm overthinking this, but I'm stuck. Any help would be appreciated.

Thank you.

Accepted Solutions

Solution

09-07-2017
02:09 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MillerEL

09-07-2017 01:42 PM

Many ways to skin this cat ... I tend to favor PROC FREQ for counting. Start by adding NWAY to the end of your PROC SUMMARY statement. Then continue:

proc freq data=AGG;

tables chld;

run;

The CHLD column is the number of children, and the FREQUENCY column is the number of families having that number of children.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MillerEL

09-07-2017 01:29 PM

Your approach is correct, but the data for proc tabulate should be the output from the PROC SUMMARY not the original data set.

The logic is essentially:

1. Find the number of kids per family - add child numbers using PROC SUMMARY

2. Summarize data from Step 1 using PROC FREQ or PROC TABULATE. Note that you don't want the CLASS variable in the PROC TABULATE because you're counting overall, not by FAMILY ID.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

09-07-2017 02:11 PM

Perfect. Thank you!

Solution

09-07-2017
02:09 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MillerEL

09-07-2017 01:42 PM

Many ways to skin this cat ... I tend to favor PROC FREQ for counting. Start by adding NWAY to the end of your PROC SUMMARY statement. Then continue:

proc freq data=AGG;

tables chld;

run;

The CHLD column is the number of children, and the FREQUENCY column is the number of families having that number of children.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

09-07-2017 02:11 PM

Perfect! Thank you!