Hi,
Currently this selection is done manually so I have no code to share. I believe I can automate the process in SAS and I thought it would be good to ask here in case anyone can give me some pointers. I have looked for like questions but didn't find anything.
The end goal is to identify a sample that represents group characteristics with as few of the observations as possible. Based on group totals formulas and rounding are used to generate a list of “need” characteristic totals. After an observation is selected to be a part of the sample they need to be deducted from the needed count (Need_initial), until a sample is selected that represents the total desired count for each characteristic.
I have added a pretend dataset that represents a group. In the table below I put the initial total counts (Need initial), an example of an observation selected for the sample, and then how that first selection impacted the needed totals (Need_2). Sometimes subcategory counts may not be in line with the category total. For example, in this case the sample only needs 1 observation with patterns but the sample should include 1 pattern_dots and 1 pattern_plaid regardless of whether that combination can be identified within one observation.
Need_Initial | Selection_1 | Need_2 | |
Dinner | 14 | 1 | 13 |
Likes_talking | 8 | 1 | 7 |
Visiting | 8 | 1 | 7 |
Hates_talking | 7 | 0 | 7 |
Green | 3 | 0 | 3 |
Dark_Yellow | 3 | 1 | 2 |
Dark_Green | 2 | 1 | 1 |
Dark_White | 2 | 0 | 2 |
Blue | 1 | 0 | 1 |
Yellow | 1 | 1 | 0 |
White | 1 | 0 | 1 |
Red | 1 | 0 | 1 |
Black | 1 | 1 | 0 |
Grey | 1 | 0 | 1 |
Light_Blue | 1 | 0 | 1 |
Light_orange | 1 | 0 | 1 |
Light_red | 1 | 0 | 1 |
Dark_Black | 1 | 1 | 0 |
Patterns | 1 | 1 | 0 |
Pattern_Dots | 1 | 0 | 1 |
Pattern_Plaid | 1 | 1 | 0 |
This looks like a constrained optimization problem. The "need" characteristics are constraints. The number of observations in the data set is a quantity that you are trying to minimize, subject to the constraints.
Do you have a license for SAS/OR software? Or SAS/IML?
It looks like I do have SAS/IML!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.