BookmarkSubscribeRSS Feed
Sarah_Caro
Fluorite | Level 6

Hi everyone. I need to perform suppression for a dataset with thousands of rows so trying to avoid doing this manually in excel. My dataset is already aggregated, a picture is below. Within each group of campus, academicyear, award, and ethnicity, I need to suppress the number in field "countstring." ("Countstring" is a copy of the field 'count' but is in character format to allow replacing with asterisks in SAS). Ideally I'd be replacing the value in "countstring" according to the following rules:

 

1. Any cell size smaller than 6 is automatically suppressed.

 

2. When the first cell in a group is suppressed, the next-largest value is also suppressed. For example in the group "9. Unknown" starting in row 287 of my image, row 288 with a count of 7 will need to be suppressed, to avoid backing into the count in row 287. (Although totals are not in this dataset, they are published elsewhere).

 

3. If a cell is larger than 5, but all of the cells above it in the group sum to less than 6, then it needs to be suppressed (for example, row 270 would need to be suppressed because the two rows above it sum to only 4). 

 

4. However, if multiple cells smaller than 6 are suppressed and they sum to more than 5, then the next cell does not need to be suppressed (for example, even though the cell above row 267 is less than 6, complementary suppression does not need to be applied to the value of '7' in row 267, because the sum of the suppressed values in that group is more than 5.

 

5. zeroes should not be suppressed.

 

I am unsure how to write this in SAS. I've done some web searching, but none of the examples seem to be the same as my case. Anyone have experience with this type of suppression, or just how to refer to the rows above, to write some rules, within a group? Many thanks.

photo for sas forum.JPG

2 REPLIES 2
mkeintz
PROC Star

@Sarah_Caro   Please help us help you.  Provide the sample data in the form of a working DATA step, so that suggested solutions can be tested.   Most of us are not going to manually type in the data you present as an image.  So all suggestions are likely to be untested.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Seems like you just need to keep a running sum and a running count of small values and/or hidden values.

 

Since you haven't shared any data lets assume you have two variables to identify the groups, BYVAR1 and BYVAR2.  That COUNT has the numeric count and CHAR_COUNT is the variable you want to suppress.

 

The running sum is simple.

The count of how many you have hidden might be harder.  Might be easiest to just check whether or not you ended up hiding the value.

data want;
  do until (last.byvar2);
    set have;
    by byvar1 byvar2 count;
    sum = sum(sum,0,count);
    /* rules here */
    n_hidden = sum(n_hidden,char_count='*');
    output;
  end;
run;

So your rule 1 is just:  (0 < COUNT < 6)

Rule2 is (count ne 0) and n_hidden=1 

Rule3 is (count > 5 and (sum-count) < 6)

Rule4 is I cannot see how this happens

Rule5 is part of rule 1 and 2

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 687 views
  • 0 likes
  • 3 in conversation