BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Butterfly
Calcite | Level 5

I need to suppress data according to the following rules:

1. If any cell is less than five (includes 0), the cell is suppressed.

2. When there is suppression of a single cell, the next smallest cell(s) is suppressed until the total of the data in the suppressed cells adds up to at least five.

3. If any single category contains all of the data, the entire group of data is suppressed.

Here is sample data:

School                          Proficiency level            2011     2012     2013     2014

Washington HS                         1                      1               0          1               2

Washington HS                         2                      2               3          2               2

Washington HS                         3                      6               3          2               2

Washington HS                         4                      18              12         9               4

Monroe HS                               1                      10              12         4               6

Monroe HS                               2                      4               5          4               0

Monroe HS                               3                      8               7          9               5

Monroe HS                               4                      20              15         18         10

Truman HS                                1                      0               20         15         14

Truman HS                                2                      0               18         16         19

Truman HS                                3                      0               34         20         28

Truman HS                                4                      60              28         25         24

In this example, the data of concern are the 4 numbers within each year for each school.

So, for example, for Washington HS for 2011, there are 4 numbers: 1, 2, 6, 18.  As 1 and 2 are both less than 5, they will need to be suppressed.   Since
the total of 3 ( 1 + 2) is less than 5, the 6 will also need to be suppressed.

Another example: Washington HS for 2012, there are 4 numbers: 0, 3, 3, 12.  The first 3 will need to be suppressed as they are all less than 5.  Since the total of these 3 numbers is 6, we keep the 12.

Another example: Washington HS for 2014, there are 4 numbers: 2, 2, 2, 4.  Since all are under 5, all will need to be suppressed.

   

Another example: Truman HS for 2011, there are 4 numbers: 0, 0, 0, 60.  Since all of the data are in one cell, all cells must be suppressed.

   

Can this be achieved in SAS?
Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Not sure if this is what you need, but take a look,

Data Have;

     Input School:$&20.         Proficiency_level:$1.            (y2011-y2014) (:8.);

     cards;

Washington HS                         1                      1               0          1               2

Washington HS                         2                      2               3          2               2

Washington HS                         3                      6               3          2               2

Washington HS                         4                      18              12         9               4

Monroe HS                               1                      10              12         4               6

Monroe HS                               2                      4               5          4               0

Monroe HS                               3                      8               7          9               5

Monroe HS                               4                      20              15         18         10

Truman HS                                1                      0               20         15         14

Truman HS                                2                      0               18         16         19

Truman HS                                3                      0               34         20         28

Truman HS                                4                      60              28         25         24

;

Run;

data want;

     do until (last.school);

           set have;

           by school notsorted;

           array y(4) y2011-y2014;

           array t(4) _temporary_;

           array m(4) _temporary_;

           do i=1 to 4;

                if y(i) <5 then

                     t(i)+y(i);

                else m(i)=min(m(i),y(i));

           end;

     end;

     do until (last.school);

           set have;

           by school notsorted;

           do i=1 to 4;

                if y(i) <5 then

                     y(i)=-1*y(i);

                else if y(i)=m(i) and .< t(i) < 5 then

                     y(i)=-1*y(i);

           end;

           output;

     end;

     call missing (of m(*), of t(*));

     drop i;

run;

Haikuo

View solution in original post

13 REPLIES 13
ballardw
Super User

What is the output supposed to look like? Are the values within a year set to missing?

I think your Rule 3 should read:

3. If any single category contains all of the data greater than 0, the entire group of data is suppressed.

Since SAS uses a special value to indicate missing 0 by default is data...

Butterfly
Calcite | Level 5

A dash is used to indicate suppressed data.

That's a good point about rule 3.  If all data are zero, there is no need to suppress.  The dots to indicate missing are changed to zeroes.

jakarman
Barite | Level 11

Is it just for presenting as a kind of alternate printing (format fcmp) or do you want to do calculations on the vaules. The answer to your question is "yes" it can be done with sas.

The next thing is what you want exactly in detail.

---->-- ja karman --<-----
Butterfly
Calcite | Level 5

The data set with the suppressed values will be exported either to Excel or a text file.  The values need to be suppressed for the purpose of confidentiality of the students.  The finished product would look just like the sample table but with a dash replacing the suppressed values.

Haikuo
Onyx | Level 15

Not sure if this is what you need, but take a look,

Data Have;

     Input School:$&20.         Proficiency_level:$1.            (y2011-y2014) (:8.);

     cards;

Washington HS                         1                      1               0          1               2

Washington HS                         2                      2               3          2               2

Washington HS                         3                      6               3          2               2

Washington HS                         4                      18              12         9               4

Monroe HS                               1                      10              12         4               6

Monroe HS                               2                      4               5          4               0

Monroe HS                               3                      8               7          9               5

Monroe HS                               4                      20              15         18         10

Truman HS                                1                      0               20         15         14

Truman HS                                2                      0               18         16         19

Truman HS                                3                      0               34         20         28

Truman HS                                4                      60              28         25         24

;

Run;

data want;

     do until (last.school);

           set have;

           by school notsorted;

           array y(4) y2011-y2014;

           array t(4) _temporary_;

           array m(4) _temporary_;

           do i=1 to 4;

                if y(i) <5 then

                     t(i)+y(i);

                else m(i)=min(m(i),y(i));

           end;

     end;

     do until (last.school);

           set have;

           by school notsorted;

           do i=1 to 4;

                if y(i) <5 then

                     y(i)=-1*y(i);

                else if y(i)=m(i) and .< t(i) < 5 then

                     y(i)=-1*y(i);

           end;

           output;

     end;

     call missing (of m(*), of t(*));

     drop i;

run;

Haikuo

Butterfly
Calcite | Level 5

This works.  Thank you so much.  The only problem is that I need the suppressed value to be represented by a dash, not a negative number.  Since a dash is a character value, I added the following code:

data want1;

set want;

sy2011 = put(y2011,$4.0);

sy2012 = put(y2012,$4.0);

sy2013 = put(y2013,$4.0);

sy2014 = put(y2014,$4.0);

if y2011 < 1 then sy2011 = '-';

if y2012 < 1 then sy2012 = '-';

if y2013 < 1 then sy2013 = '-';

if y2014 < 1 then sy2014 = '-';

drop y2011-y2014;

run;

(By the way, how do I paste into this box without having to reformat?  Part of my response was cut off and extra lines were added between the code.)

ballardw
Super User

Appearance of values is what custom formats are for. I would actually have been tempted to assign a special missing value (.A to .Z) and then have a format that displays that value as a dash. Then you don't need to proliferate variables and can still do arithmetic or summaries on the recoded variable.

And this editor REALLY sucks, apparently worse when using Internet Explorer, but paste in to Notepad or similar text editor (not Wordpad as it attempts to use HTML when pasted from the enhanced editor) and the copy/paste into this forum.

Ksharp
Super User

Very interesting question . But you didn't post the output yet ?

Data Have;
     Input School:$&20.         Proficiency_level:$1.            (y2011-y2014) (:8.);
     cards;
Washington HS                         1                      1               0          1               2
Washington HS                         2                      2               3          2               2
Washington HS                         3                      6               3          2               2
Washington HS                         4                      18              12         9               4
Monroe HS                               1                      10              12         4               6
Monroe HS                               2                      4               5          4               0
Monroe HS                               3                      8               7          9               5
Monroe HS                               4                      20              15         18         10
Truman HS                                1                      0               20         15         14
Truman HS                                2                      0               18         16         19
Truman HS                                3                      0               34         20         28
Truman HS                                4                      60              28         25         24
;
Run;
options missing='-';
data want;
 set have;
 by School notsorted;
 retain found_2011      found_2012 found_2013 found_2014 sum_2011 sum_2012 sum_2013 sum_2014 ;
 if first.school then do;
   call missing(of found_:); 
   sum_2011=5; sum_2012=5;      sum_2013=5;       sum_2014=5;
 end;

 sum_2011+(-1*y2011);
 sum_2012+(-1*y2012);
 sum_2013+(-1*y2013);
 sum_2014+(-1*y2014);

 if not found_2011 then call missing(y2011);
 if not found_2012 then call missing(y2012);
 if not found_2013 then call missing(y2013);
 if not found_2014 then call missing(y2014);

 if sum_2011 le 0 then found_2011=1;
 if sum_2012 le 0 then found_2012=1;
 if sum_2013 le 0 then found_2013=1;
 if sum_2014 le 0 then found_2014=1;
drop sum_: found_: ;
run;

Xia Keshan

Butterfly
Calcite | Level 5

Thanks for trying, Ksharp, but the output using your code is not entirely correct.

This is what the output is supposed to look like:

School  Proficiency_level y2011 y2012 y2013 y2014

Washington HS  1  - - - -

Washington HS  2  - - - -

Washington HS  3  - - - -

Washington HS  4  18 12 9 -

Monroe HS       1  10 12 - 6

Monroe HS       2  - 5 - -

Monroe HS       3  - 7 9 -

Monroe HS       4  20 15 18 10

Truman HS       1  - 20 15 14

Truman HS       1  - 18 16 19

Truman HS       1  - 34 20 28

Truman HS       1  - 28 25 24

Ksharp
Super User

Ha. It is clarified now .

Data Have;
     Input School:$&20.         Proficiency_level:$1.            (y2011-y2014) (:8.);
     cards;
Washington HS                         1                      1               0          1               2
Washington HS                         2                      2               3          2               2
Washington HS                         3                      6               3          2               2
Washington HS                         4                      18              12         9               4
Monroe HS                               1                      10              12         4               6
Monroe HS                               2                      4               5          4               0
Monroe HS                               3                      8               7          9               5
Monroe HS                               4                      20              15         18         10
Truman HS                                1                      0               20         15         14
Truman HS                                2                      0               18         16         19
Truman HS                                3                      0               34         20         28
Truman HS                                4                      60              28         25         24
;
Run;
options missing='-';
data want;
 set have;
 by School notsorted;
 retain found_2011      found_2012 found_2013 found_2014 sum_2011 sum_2012 sum_2013 sum_2014
        lt_five_2011 lt_five_2012 lt_five_2013  lt_five_2014 ;
 if first.school then do;
   call missing(of found_: lt_five_:); 
   sum_2011=5; sum_2012=5;      sum_2013=5;       sum_2014=5;
 end;

if y2011 lt 5 then      lt_five_2011=1;
if y2012 lt 5 then      lt_five_2012=1;
if y2013 lt 5 then      lt_five_2013=1;
if y2014 lt 5 then      lt_five_2014=1;

if lt_five_2011 then sum_2011+(-1*y2011);
if lt_five_2012 then sum_2012+(-1*y2012);
if lt_five_2013 then sum_2013+(-1*y2013);
if lt_five_2014 then sum_2014+(-1*y2014);

 if not found_2011 and lt_five_2011 then call missing(y2011);
 if not found_2012 and lt_five_2012 then call missing(y2012);
 if not found_2013 and lt_five_2013 then call missing(y2013);
 if not found_2014 and lt_five_2014 then call missing(y2014);

 if sum_2011 le 0 and lt_five_2011 then found_2011=1;
 if sum_2012 le 0 and lt_five_2012 then found_2012=1;
 if sum_2013 le 0 and lt_five_2013 then found_2013=1;
 if sum_2014 le 0 and lt_five_2014 then found_2014=1;
drop sum_: found_: lt_five:;
run;

I don't understand

Washington HS  1  - - - -

Washington HS  2  - - - -

Washington HS  3  - - - -

Washington HS  4  18 12 9 4  <--why be missing

Xia Keshan

Butterfly
Calcite | Level 5

Washington HS - 2014, level 4 - The 4 is suppressed because it is less than 5 - see rule # 1.

Ksharp
Super User

Ou, That is more complicated than I image. Try this one, and let me if it is worked.

Data Have;
     Input School:$&20.         Proficiency_level:$1.            (y2011-y2014) (:8.);
     cards;
Washington HS                         1                      1               0          1               2
Washington HS                         2                      2               3          2               2
Washington HS                         3                      6               3          2               2
Washington HS                         4                      18              12         9               4
Monroe HS                               1                      10              12         4               6
Monroe HS                               2                      4               5          4               0
Monroe HS                               3                      8               7          9               5
Monroe HS                               4                      20              15         18         10
Truman HS                                1                      0               20         15         14
Truman HS                                2                      0               18         16         19
Truman HS                                3                      0               34         20         28
Truman HS                                4                      60              28         25         24
;
Run;
options missing='-';
data want;
 set have;
 by School notsorted;
 retain sum_2011 sum_2012 sum_2013 sum_2014
        lt_five_2011 lt_five_2012 lt_five_2013  lt_five_2014 ;
 if first.school then do;
   call missing(of lt_five_:); 
   sum_2011=5; sum_2012=5;      sum_2013=5;       sum_2014=5;
 end;

if y2011 lt 5 then      lt_five_2011=1;
if y2012 lt 5 then      lt_five_2012=1;
if y2013 lt 5 then      lt_five_2013=1;
if y2014 lt 5 then      lt_five_2014=1;

if lt_five_2011 then do;sum_2011+(-1*y2011); call missing(y2011); end;
if lt_five_2012 then do;sum_2012+(-1*y2012); call missing(y2012); end;
if lt_five_2013 then do;sum_2013+(-1*y2013); call missing(y2013); end;
if lt_five_2014 then do;sum_2014+(-1*y2014); call missing(y2014); end;

 if sum_2011 le 0 and lt_five_2011 then do;lt_five_2011=.;     sum_2011=5;end;
 if sum_2012 le 0 and lt_five_2012 then do;lt_five_2012=.;     sum_2012=5;end;
 if sum_2013 le 0 and lt_five_2013 then do;lt_five_2013=.;     sum_2013=5;end;
 if sum_2014 le 0 and lt_five_2014 then do;lt_five_2014=.;     sum_2014=5;end;
drop sum_: lt_five:;
run;

Xia Keshan

Ksharp
Super User

Or a simpler one :

Data Have;

     Input School:$&20.         Proficiency_level:$1.            (y2011-y2014) (:8.);

     cards;

Washington HS                         1                      1               0          1               2

Washington HS                         2                      2               3          2               2

Washington HS                         3                      6               3          2               2

Washington HS                         4                      18              12         9               4

Monroe HS                               1                      10              12         4               6

Monroe HS                               2                      4               5          4               0

Monroe HS                               3                      8               7          9               5

Monroe HS                               4                      20              15         18         10

Truman HS                                1                      0               20         15         14

Truman HS                                2                      0               18         16         19

Truman HS                                3                      0               34         20         28

Truman HS                                4                      60              28         25         24

;

Run;

options missing='-';

data want;

set have;

by School notsorted;

retain sum_2011 sum_2012 sum_2013 sum_2014

        lt_five_2011 lt_five_2012 lt_five_2013  lt_five_2014 ;

if first.school then do;

   call missing(of lt_five_:);

   sum_2011=5; sum_2012=5;      sum_2013=5;       sum_2014=5;

end;

if y2011 lt 5 then      lt_five_2011=1;

if y2012 lt 5 then      lt_five_2012=1;

if y2013 lt 5 then      lt_five_2013=1;

if y2014 lt 5 then      lt_five_2014=1;

if lt_five_2011 then do;sum_2011+(-1*y2011); call missing(y2011); end;

if lt_five_2012 then do;sum_2012+(-1*y2012); call missing(y2012); end;

if lt_five_2013 then do;sum_2013+(-1*y2013); call missing(y2013); end;

if lt_five_2014 then do;sum_2014+(-1*y2014); call missing(y2014); end;

if sum_2011 le 0  then do;lt_five_2011=.; sum_2011=5;end;

if sum_2012 le 0  then do;lt_five_2012=.; sum_2012=5;end;

if sum_2013 le 0  then do;lt_five_2013=.; sum_2013=5;end;

if sum_2014 le 0  then do;lt_five_2014=.; sum_2014=5;end;

drop sum_: lt_five:;

run;

Xia Keshan

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2051 views
  • 0 likes
  • 5 in conversation