Hi,
Currently this selection is done manually so I have no code to share. I believe I can automate the process in SAS and I thought it would be good to ask here in case anyone can give me some pointers. I have looked for like questions but didn't find anything.
The end goal is to identify a sample that represents group characteristics with as few of the observations as possible. Based on group totals formulas and rounding are used to generate a list of “need” characteristic totals. After an observation is selected to be a part of the sample they need to be deducted from the needed count (Need_initial), until a sample is selected that represents the total desired count for each characteristic.
I have added a pretend dataset that represents a group. In the table below I put the initial total counts (Need initial), an example of an observation selected for the sample, and then how that first selection impacted the needed totals (Need_2). Sometimes subcategory counts may not be in line with the category total. For example, in this case the sample only needs 1 observation with patterns but the sample should include 1 pattern_dots and 1 pattern_plaid regardless of whether that combination can be identified within one observation.
Need_Initial | Selection_1 | Need_2 | |
Dinner | 14 | 1 | 13 |
Likes_talking | 8 | 1 | 7 |
Visiting | 8 | 1 | 7 |
Hates_talking | 7 | 0 | 7 |
Green | 3 | 0 | 3 |
Dark_Yellow | 3 | 1 | 2 |
Dark_Green | 2 | 1 | 1 |
Dark_White | 2 | 0 | 2 |
Blue | 1 | 0 | 1 |
Yellow | 1 | 1 | 0 |
White | 1 | 0 | 1 |
Red | 1 | 0 | 1 |
Black | 1 | 1 | 0 |
Grey | 1 | 0 | 1 |
Light_Blue | 1 | 0 | 1 |
Light_orange | 1 | 0 | 1 |
Light_red | 1 | 0 | 1 |
Dark_Black | 1 | 1 | 0 |
Patterns | 1 | 1 | 0 |
Pattern_Dots | 1 | 0 | 1 |
Pattern_Plaid | 1 | 1 | 0 |
This looks like a constrained optimization problem. The "need" characteristics are constraints. The number of observations in the data set is a quantity that you are trying to minimize, subject to the constraints.
Do you have a license for SAS/OR software? Or SAS/IML?
It looks like I do have SAS/IML!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.