We need to apply de-identification rules in order to protect PII. The de-identification rules include combining specific data and creating a new race called "Combined".
I am having a hard time finding SAS code that will handle this. My data (not real) is below.
If count<10 or iepcount<10
then (not sure how to do this, but basically I want to combine the next smallest value of race by district until the count and/or iepcount is 10 or more and call this race "All the rest" 🙂 or something more defining.)
data WORK.EC;
infile datalines;
input district:4. race:$1. count:12. iepcount:12. pctec:10.2;
datalines;
datalines:
0001 | H | 10 | 100 | 0.10 |
0001 | W | 66 | 726 | 0.09 |
0002 | W | 21 | 147 | 0.14 |
0003 | M | 7 | 49 | 0.14 |
0003 | W | 12 | 160 | 0.08 |
0004 | W | 46 | 705 | 0.07 |
0005 | W | 13 | 56 | 0.23 |
0006 | W | 14 | 116 | 0.12 |
0007 | W | 55 | 504 | 0.11 |
0101 | W | 10 | 100 | 0.10 |
0102 | H | 10 | 100 | 0.10 |
0102 | W | 10 | 143 | 0.07 |
0103 | W | 12 | 124 | 0.10 |
0104 | M | 1 | 1 | 1.00 |
0104 | W | 59 | 1160 | 0.05 |
0105 | H | 12 | 48 | 0.25 |
0105 | M | 2 | 4 | 0.50 |
0105 | W | 48 | 576 | 0.08 |
0205 | A | 9 | 81 | 0.11 |
0205 | B | 98 | 1067 | 0.09 |
0205 | H | 62 | 641 | 0.10 |
0205 | M | 51 | 325 | 0.16 |
0205 | W | 27 | 287 | 0.09 |
0204 | M | 3 | 5 | 0.67 |
0204 | N | 1 | 1 | 1.00 |
0204 | W | 122 | 1861 | 0.07 |
0240 | W | 13 | 85 | 0.15 |
@Marlene54 wrote:
We need to apply redaction rules in order to protect PII. The redaction rules include combining specific data and creating a new race called "Combined".
I am having a hard time finding SAS code that will handle this. My data (not real) is below.
If count<10 or iepcount<10
then (not sure how to do this, but basically I want to combine the next smallest value by district until the count and/or iepcount is 10 or more and call this race combined.)
data WORK.EC; infile datalines; input district:4. race:$1. count:12. iepcount:12. pctec:10.2; datalines;
datalines:
0001 H 10 100 0.10 0001 W 66 726 0.09 0002 W 21 147 0.14 0003 M 7 49 0.14 0003 W 12 160 0.08 0004 W 46 705 0.07 0005 W 13 56 0.23 0006 W 14 116 0.12 0007 W 55 504 0.11 0101 W 10 100 0.10 0102 H 10 100 0.10 0102 W 10 143 0.07 0103 W 12 124 0.10 0104 M 1 1 1.00 0104 W 59 1160 0.05 0105 H 12 48 0.25 0105 M 2 4 0.50 0105 W 48 576 0.08 0205 A 9 81 0.11 0205 B 98 1067 0.09 0205 H 62 641 0.10 0205 M 51 325 0.16 0205 W 27 287 0.09 0204 M 3 5 0.67 0204 N 1 1 1.00 0204 W 122 1861 0.07 0240 W 13 85 0.15
When you say things like "combine the next smallest value by district until the count and/or iepcount is 10" you are not redacting but aggregating. Redaction is to "remove" and you don't seem to be saying you want things removed.
Combine exactly what? Next smallest value by district of what?
All the data I have worked with that had any "racial" component in reporting has the categories defined when races are combined such as "white and black" or "White non-Hispanic, Black non-Hispanic, Native American non-hispanic" as a single "group". Then the content is known. Randomly aggregating race by count of some other variable sounds extremely odd and very hard to use in any other manner. At which point, why would race be considered at all???
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.