BookmarkSubscribeRSS Feed
Marlene54
Fluorite | Level 6

We need to apply de-identification rules in order to protect PII.  The de-identification rules include combining specific data and creating a new race called "Combined".

I am having a hard time finding SAS code that will handle this.  My data (not real) is below.

If count<10 or iepcount<10

    then (not sure how to do this, but basically I want to combine the next smallest value of race by district until the count and/or iepcount is 10 or more and call this race "All the rest" 🙂 or something more defining.)

 

 

data WORK.EC;
   infile datalines;
   input district:4. race:$1. count:12. iepcount:12. pctec:10.2;
 datalines;

 

 

datalines:

0001H101000.10
0001W667260.09
0002W211470.14
0003M7490.14
0003W121600.08
0004W467050.07
0005W13560.23
0006W141160.12
0007W555040.11
0101W101000.10
0102H101000.10
0102W101430.07
0103W121240.10
0104M111.00
0104W5911600.05
0105H12480.25
0105M240.50
0105W485760.08
0205A9810.11
0205B9810670.09
0205H626410.10
0205M513250.16
0205W272870.09
0204M350.67
0204N111.00
0204W12218610.07
0240W13850.15

 

3 REPLIES 3
ballardw
Super User

@Marlene54 wrote:

We need to apply redaction rules in order to protect PII.  The redaction rules include combining specific data and creating a new race called "Combined".

I am having a hard time finding SAS code that will handle this.  My data (not real) is below.

If count<10 or iepcount<10

    then (not sure how to do this, but basically I want to combine the next smallest value by district until the count and/or iepcount is 10 or more and call this race combined.)

 

data WORK.EC;
   infile datalines;
   input district:4. race:$1. count:12. iepcount:12. pctec:10.2;
 datalines;

 

datalines:

0001 H 10 100 0.10
0001 W 66 726 0.09
0002 W 21 147 0.14
0003 M 7 49 0.14
0003 W 12 160 0.08
0004 W 46 705 0.07
0005 W 13 56 0.23
0006 W 14 116 0.12
0007 W 55 504 0.11
0101 W 10 100 0.10
0102 H 10 100 0.10
0102 W 10 143 0.07
0103 W 12 124 0.10
0104 M 1 1 1.00
0104 W 59 1160 0.05
0105 H 12 48 0.25
0105 M 2 4 0.50
0105 W 48 576 0.08
0205 A 9 81 0.11
0205 B 98 1067 0.09
0205 H 62 641 0.10
0205 M 51 325 0.16
0205 W 27 287 0.09
0204 M 3 5 0.67
0204 N 1 1 1.00
0204 W 122 1861 0.07
0240 W 13 85 0.15

 


When you say things like "combine the next smallest value by district until the count and/or iepcount is 10" you are not redacting but aggregating. Redaction is to "remove" and you don't seem to be saying you want things removed.

 

Combine exactly what? Next smallest value by district of what?

 

All the data I have worked with that had any "racial" component in reporting has the categories defined when races are combined such as "white and black" or "White non-Hispanic, Black non-Hispanic, Native American non-hispanic" as a single "group". Then the content is known. Randomly aggregating race by count of some other variable sounds extremely odd and very hard to use in any other manner. At which point, why would race be considered at all???

Marlene54
Fluorite | Level 6
Perhaps better words might be de-identification or anonymization.
Marlene54
Fluorite | Level 6
I rewrote my example and clarified that the count is by district and race. I just need to combine the counts until they are greater than or equal to10. For example district 1003 shows 7 (eccount), 49 (iepcount) M (multi-racial) students and 12 (eccount), 160 (iepcount) W (white) students. Because the eccount for multi-racial is less than 10, I need to combine it with the 12 white students and add together the iepcount as well as rename the combination race as "combined", or something more appropriate. I used a generic only because I thought it would be more difficult to also combine the race field.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 607 views
  • 0 likes
  • 2 in conversation