I need to suppress data according to the following rules:
1. If any cell is less than five (includes 0), the cell is suppressed.
2. When there is suppression of a single cell, the next smallest cell(s) is suppressed until the total of the data in the suppressed cells adds up to at least five.
3. If any single category contains all of the data, the entire group of data is suppressed.
Here is sample data:
School Proficiency level 2011 2012 2013 2014
Washington HS 1 1 0 1 2
Washington HS 2 2 3 2 2
Washington HS 3 6 3 2 2
Washington HS 4 18 12 9 4
Monroe HS 1 10 12 4 6
Monroe HS 2 4 5 4 0
Monroe HS 3 8 7 9 5
Monroe HS 4 20 15 18 10
Truman HS 1 0 20 15 14
Truman HS 2 0 18 16 19
Truman HS 3 0 34 20 28
Truman HS 4 60 28 25 24
In this example, the data of concern are the 4 numbers within each year for each school.
So, for example, for Washington HS for 2011, there are 4 numbers: 1, 2, 6, 18. As 1 and 2 are both less than 5, they will need to be suppressed. Since
the total of 3 ( 1 + 2) is less than 5, the 6 will also need to be suppressed.
Another example: Washington HS for 2012, there are 4 numbers: 0, 3, 3, 12. The first 3 will need to be suppressed as they are all less than 5. Since the total of these 3 numbers is 6, we keep the 12.
Another example: Washington HS for 2014, there are 4 numbers: 2, 2, 2, 4. Since all are under 5, all will need to be suppressed.
Another example: Truman HS for 2011, there are 4 numbers: 0, 0, 0, 60. Since all of the data are in one cell, all cells must be suppressed.
Can this be achieved in SAS?
Thank you.
Not sure if this is what you need, but take a look,
Data Have;
Input School:$&20. Proficiency_level:$1. (y2011-y2014) (:8.);
cards;
Washington HS 1 1 0 1 2
Washington HS 2 2 3 2 2
Washington HS 3 6 3 2 2
Washington HS 4 18 12 9 4
Monroe HS 1 10 12 4 6
Monroe HS 2 4 5 4 0
Monroe HS 3 8 7 9 5
Monroe HS 4 20 15 18 10
Truman HS 1 0 20 15 14
Truman HS 2 0 18 16 19
Truman HS 3 0 34 20 28
Truman HS 4 60 28 25 24
;
Run;
data want;
do until (last.school);
set have;
by school notsorted;
array y(4) y2011-y2014;
array t(4) _temporary_;
array m(4) _temporary_;
do i=1 to 4;
if y(i) <5 then
t(i)+y(i);
else m(i)=min(m(i),y(i));
end;
end;
do until (last.school);
set have;
by school notsorted;
do i=1 to 4;
if y(i) <5 then
y(i)=-1*y(i);
else if y(i)=m(i) and .< t(i) < 5 then
y(i)=-1*y(i);
end;
output;
end;
call missing (of m(*), of t(*));
drop i;
run;
Haikuo
What is the output supposed to look like? Are the values within a year set to missing?
I think your Rule 3 should read:
3. If any single category contains all of the data greater than 0, the entire group of data is suppressed.
Since SAS uses a special value to indicate missing 0 by default is data...
A dash is used to indicate suppressed data.
That's a good point about rule 3. If all data are zero, there is no need to suppress. The dots to indicate missing are changed to zeroes.
Is it just for presenting as a kind of alternate printing (format fcmp) or do you want to do calculations on the vaules. The answer to your question is "yes" it can be done with sas.
The next thing is what you want exactly in detail.
The data set with the suppressed values will be exported either to Excel or a text file. The values need to be suppressed for the purpose of confidentiality of the students. The finished product would look just like the sample table but with a dash replacing the suppressed values.
Not sure if this is what you need, but take a look,
Data Have;
Input School:$&20. Proficiency_level:$1. (y2011-y2014) (:8.);
cards;
Washington HS 1 1 0 1 2
Washington HS 2 2 3 2 2
Washington HS 3 6 3 2 2
Washington HS 4 18 12 9 4
Monroe HS 1 10 12 4 6
Monroe HS 2 4 5 4 0
Monroe HS 3 8 7 9 5
Monroe HS 4 20 15 18 10
Truman HS 1 0 20 15 14
Truman HS 2 0 18 16 19
Truman HS 3 0 34 20 28
Truman HS 4 60 28 25 24
;
Run;
data want;
do until (last.school);
set have;
by school notsorted;
array y(4) y2011-y2014;
array t(4) _temporary_;
array m(4) _temporary_;
do i=1 to 4;
if y(i) <5 then
t(i)+y(i);
else m(i)=min(m(i),y(i));
end;
end;
do until (last.school);
set have;
by school notsorted;
do i=1 to 4;
if y(i) <5 then
y(i)=-1*y(i);
else if y(i)=m(i) and .< t(i) < 5 then
y(i)=-1*y(i);
end;
output;
end;
call missing (of m(*), of t(*));
drop i;
run;
Haikuo
This works. Thank you so much. The only problem is that I need the suppressed value to be represented by a dash, not a negative number. Since a dash is a character value, I added the following code:
data want1;
set want;
sy2011 = put(y2011,$4.0);
sy2012 = put(y2012,$4.0);
sy2013 = put(y2013,$4.0);
sy2014 = put(y2014,$4.0);
if y2011 < 1 then sy2011 = '-';
if y2012 < 1 then sy2012 = '-';
if y2013 < 1 then sy2013 = '-';
if y2014 < 1 then sy2014 = '-';
drop y2011-y2014;
run;
(By the way, how do I paste into this box without having to reformat? Part of my response was cut off and extra lines were added between the code.)
Appearance of values is what custom formats are for. I would actually have been tempted to assign a special missing value (.A to .Z) and then have a format that displays that value as a dash. Then you don't need to proliferate variables and can still do arithmetic or summaries on the recoded variable.
And this editor REALLY sucks, apparently worse when using Internet Explorer, but paste in to Notepad or similar text editor (not Wordpad as it attempts to use HTML when pasted from the enhanced editor) and the copy/paste into this forum.
Very interesting question . But you didn't post the output yet ?
Data Have; Input School:$&20. Proficiency_level:$1. (y2011-y2014) (:8.); cards; Washington HS 1 1 0 1 2 Washington HS 2 2 3 2 2 Washington HS 3 6 3 2 2 Washington HS 4 18 12 9 4 Monroe HS 1 10 12 4 6 Monroe HS 2 4 5 4 0 Monroe HS 3 8 7 9 5 Monroe HS 4 20 15 18 10 Truman HS 1 0 20 15 14 Truman HS 2 0 18 16 19 Truman HS 3 0 34 20 28 Truman HS 4 60 28 25 24 ; Run; options missing='-'; data want; set have; by School notsorted; retain found_2011 found_2012 found_2013 found_2014 sum_2011 sum_2012 sum_2013 sum_2014 ; if first.school then do; call missing(of found_:); sum_2011=5; sum_2012=5; sum_2013=5; sum_2014=5; end; sum_2011+(-1*y2011); sum_2012+(-1*y2012); sum_2013+(-1*y2013); sum_2014+(-1*y2014); if not found_2011 then call missing(y2011); if not found_2012 then call missing(y2012); if not found_2013 then call missing(y2013); if not found_2014 then call missing(y2014); if sum_2011 le 0 then found_2011=1; if sum_2012 le 0 then found_2012=1; if sum_2013 le 0 then found_2013=1; if sum_2014 le 0 then found_2014=1; drop sum_: found_: ; run;
Xia Keshan
Thanks for trying, Ksharp, but the output using your code is not entirely correct.
This is what the output is supposed to look like:
School Proficiency_level y2011 y2012 y2013 y2014
Washington HS 1 - - - -
Washington HS 2 - - - -
Washington HS 3 - - - -
Washington HS 4 18 12 9 -
Monroe HS 1 10 12 - 6
Monroe HS 2 - 5 - -
Monroe HS 3 - 7 9 -
Monroe HS 4 20 15 18 10
Truman HS 1 - 20 15 14
Truman HS 1 - 18 16 19
Truman HS 1 - 34 20 28
Truman HS 1 - 28 25 24
Ha. It is clarified now .
Data Have; Input School:$&20. Proficiency_level:$1. (y2011-y2014) (:8.); cards; Washington HS 1 1 0 1 2 Washington HS 2 2 3 2 2 Washington HS 3 6 3 2 2 Washington HS 4 18 12 9 4 Monroe HS 1 10 12 4 6 Monroe HS 2 4 5 4 0 Monroe HS 3 8 7 9 5 Monroe HS 4 20 15 18 10 Truman HS 1 0 20 15 14 Truman HS 2 0 18 16 19 Truman HS 3 0 34 20 28 Truman HS 4 60 28 25 24 ; Run; options missing='-'; data want; set have; by School notsorted; retain found_2011 found_2012 found_2013 found_2014 sum_2011 sum_2012 sum_2013 sum_2014 lt_five_2011 lt_five_2012 lt_five_2013 lt_five_2014 ; if first.school then do; call missing(of found_: lt_five_:); sum_2011=5; sum_2012=5; sum_2013=5; sum_2014=5; end; if y2011 lt 5 then lt_five_2011=1; if y2012 lt 5 then lt_five_2012=1; if y2013 lt 5 then lt_five_2013=1; if y2014 lt 5 then lt_five_2014=1; if lt_five_2011 then sum_2011+(-1*y2011); if lt_five_2012 then sum_2012+(-1*y2012); if lt_five_2013 then sum_2013+(-1*y2013); if lt_five_2014 then sum_2014+(-1*y2014); if not found_2011 and lt_five_2011 then call missing(y2011); if not found_2012 and lt_five_2012 then call missing(y2012); if not found_2013 and lt_five_2013 then call missing(y2013); if not found_2014 and lt_five_2014 then call missing(y2014); if sum_2011 le 0 and lt_five_2011 then found_2011=1; if sum_2012 le 0 and lt_five_2012 then found_2012=1; if sum_2013 le 0 and lt_five_2013 then found_2013=1; if sum_2014 le 0 and lt_five_2014 then found_2014=1; drop sum_: found_: lt_five:; run;
I don't understand
Washington HS 1 - - - -
Washington HS 2 - - - -
Washington HS 3 - - - -
Washington HS 4 18 12 9 4 <--why be missing
Xia Keshan
Washington HS - 2014, level 4 - The 4 is suppressed because it is less than 5 - see rule # 1.
Ou, That is more complicated than I image. Try this one, and let me if it is worked.
Data Have; Input School:$&20. Proficiency_level:$1. (y2011-y2014) (:8.); cards; Washington HS 1 1 0 1 2 Washington HS 2 2 3 2 2 Washington HS 3 6 3 2 2 Washington HS 4 18 12 9 4 Monroe HS 1 10 12 4 6 Monroe HS 2 4 5 4 0 Monroe HS 3 8 7 9 5 Monroe HS 4 20 15 18 10 Truman HS 1 0 20 15 14 Truman HS 2 0 18 16 19 Truman HS 3 0 34 20 28 Truman HS 4 60 28 25 24 ; Run; options missing='-'; data want; set have; by School notsorted; retain sum_2011 sum_2012 sum_2013 sum_2014 lt_five_2011 lt_five_2012 lt_five_2013 lt_five_2014 ; if first.school then do; call missing(of lt_five_:); sum_2011=5; sum_2012=5; sum_2013=5; sum_2014=5; end; if y2011 lt 5 then lt_five_2011=1; if y2012 lt 5 then lt_five_2012=1; if y2013 lt 5 then lt_five_2013=1; if y2014 lt 5 then lt_five_2014=1; if lt_five_2011 then do;sum_2011+(-1*y2011); call missing(y2011); end; if lt_five_2012 then do;sum_2012+(-1*y2012); call missing(y2012); end; if lt_five_2013 then do;sum_2013+(-1*y2013); call missing(y2013); end; if lt_five_2014 then do;sum_2014+(-1*y2014); call missing(y2014); end; if sum_2011 le 0 and lt_five_2011 then do;lt_five_2011=.; sum_2011=5;end; if sum_2012 le 0 and lt_five_2012 then do;lt_five_2012=.; sum_2012=5;end; if sum_2013 le 0 and lt_five_2013 then do;lt_five_2013=.; sum_2013=5;end; if sum_2014 le 0 and lt_five_2014 then do;lt_five_2014=.; sum_2014=5;end; drop sum_: lt_five:; run;
Xia Keshan
Or a simpler one :
Data Have;
Input School:$&20. Proficiency_level:$1. (y2011-y2014) (:8.);
cards;
Washington HS 1 1 0 1 2
Washington HS 2 2 3 2 2
Washington HS 3 6 3 2 2
Washington HS 4 18 12 9 4
Monroe HS 1 10 12 4 6
Monroe HS 2 4 5 4 0
Monroe HS 3 8 7 9 5
Monroe HS 4 20 15 18 10
Truman HS 1 0 20 15 14
Truman HS 2 0 18 16 19
Truman HS 3 0 34 20 28
Truman HS 4 60 28 25 24
;
Run;
options missing='-';
data want;
set have;
by School notsorted;
retain sum_2011 sum_2012 sum_2013 sum_2014
lt_five_2011 lt_five_2012 lt_five_2013 lt_five_2014 ;
if first.school then do;
call missing(of lt_five_:);
sum_2011=5; sum_2012=5; sum_2013=5; sum_2014=5;
end;
if y2011 lt 5 then lt_five_2011=1;
if y2012 lt 5 then lt_five_2012=1;
if y2013 lt 5 then lt_five_2013=1;
if y2014 lt 5 then lt_five_2014=1;
if lt_five_2011 then do;sum_2011+(-1*y2011); call missing(y2011); end;
if lt_five_2012 then do;sum_2012+(-1*y2012); call missing(y2012); end;
if lt_five_2013 then do;sum_2013+(-1*y2013); call missing(y2013); end;
if lt_five_2014 then do;sum_2014+(-1*y2014); call missing(y2014); end;
if sum_2011 le 0 then do;lt_five_2011=.; sum_2011=5;end;
if sum_2012 le 0 then do;lt_five_2012=.; sum_2012=5;end;
if sum_2013 le 0 then do;lt_five_2013=.; sum_2013=5;end;
if sum_2014 le 0 then do;lt_five_2014=.; sum_2014=5;end;
drop sum_: lt_five:;
run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.