We need to apply de-identification rules in order to protect PII. The de-identification rules include combining specific data and creating a new race called "Combined".
I am having a hard time finding SAS code that will handle this. My data (not real) is below.
If count<10 or iepcount<10
then (not sure how to do this, but basically I want to combine the next smallest value of race by district until the count and/or iepcount is 10 or more and call this race "All the rest" 🙂 or something more defining.)
data WORK.EC;
infile datalines;
input district:4. race:$1. count:12. iepcount:12. pctec:10.2;
datalines;
datalines:
0001 | H | 10 | 100 | 0.10 |
0001 | W | 66 | 726 | 0.09 |
0002 | W | 21 | 147 | 0.14 |
0003 | M | 7 | 49 | 0.14 |
0003 | W | 12 | 160 | 0.08 |
0004 | W | 46 | 705 | 0.07 |
0005 | W | 13 | 56 | 0.23 |
0006 | W | 14 | 116 | 0.12 |
0007 | W | 55 | 504 | 0.11 |
0101 | W | 10 | 100 | 0.10 |
0102 | H | 10 | 100 | 0.10 |
0102 | W | 10 | 143 | 0.07 |
0103 | W | 12 | 124 | 0.10 |
0104 | M | 1 | 1 | 1.00 |
0104 | W | 59 | 1160 | 0.05 |
0105 | H | 12 | 48 | 0.25 |
0105 | M | 2 | 4 | 0.50 |
0105 | W | 48 | 576 | 0.08 |
0205 | A | 9 | 81 | 0.11 |
0205 | B | 98 | 1067 | 0.09 |
0205 | H | 62 | 641 | 0.10 |
0205 | M | 51 | 325 | 0.16 |
0205 | W | 27 | 287 | 0.09 |
0204 | M | 3 | 5 | 0.67 |
0204 | N | 1 | 1 | 1.00 |
0204 | W | 122 | 1861 | 0.07 |
0240 | W | 13 | 85 | 0.15 |
@Marlene54 wrote:
We need to apply redaction rules in order to protect PII. The redaction rules include combining specific data and creating a new race called "Combined".
I am having a hard time finding SAS code that will handle this. My data (not real) is below.
If count<10 or iepcount<10
then (not sure how to do this, but basically I want to combine the next smallest value by district until the count and/or iepcount is 10 or more and call this race combined.)
data WORK.EC; infile datalines; input district:4. race:$1. count:12. iepcount:12. pctec:10.2; datalines;
datalines:
0001 H 10 100 0.10 0001 W 66 726 0.09 0002 W 21 147 0.14 0003 M 7 49 0.14 0003 W 12 160 0.08 0004 W 46 705 0.07 0005 W 13 56 0.23 0006 W 14 116 0.12 0007 W 55 504 0.11 0101 W 10 100 0.10 0102 H 10 100 0.10 0102 W 10 143 0.07 0103 W 12 124 0.10 0104 M 1 1 1.00 0104 W 59 1160 0.05 0105 H 12 48 0.25 0105 M 2 4 0.50 0105 W 48 576 0.08 0205 A 9 81 0.11 0205 B 98 1067 0.09 0205 H 62 641 0.10 0205 M 51 325 0.16 0205 W 27 287 0.09 0204 M 3 5 0.67 0204 N 1 1 1.00 0204 W 122 1861 0.07 0240 W 13 85 0.15
When you say things like "combine the next smallest value by district until the count and/or iepcount is 10" you are not redacting but aggregating. Redaction is to "remove" and you don't seem to be saying you want things removed.
Combine exactly what? Next smallest value by district of what?
All the data I have worked with that had any "racial" component in reporting has the categories defined when races are combined such as "white and black" or "White non-Hispanic, Black non-Hispanic, Native American non-hispanic" as a single "group". Then the content is known. Randomly aggregating race by count of some other variable sounds extremely odd and very hard to use in any other manner. At which point, why would race be considered at all???
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.