BookmarkSubscribeRSS Feed
dstuder
Obsidian | Level 7

Hello everyone!

 

I have the following sample data:

DATA offenders;
INPUT group :$1. age :3. residenceStatus :$1. sex :$1.;
INFILE DATALINES DSD;
DATALINES;
A,10,A,m
A,14,A,m
A,11,B,m
B,21,A,f
B,20,B,m
B,,B,m
B,34,C,m
A,60,C,f
A,32,C,f
A,23,B,f
A,11,B,m
A,9,B,m
A,9,C,m
C,55,A,m
C,56,A,m
C,70,B,w
C,31,B,w
C,45,B,w
C,66,A,w
C,80,A,w
C,,C,w
C,53,A,w
C,38,C,m
D,10,C,m
;
RUN;


PROC FORMAT;
VALUE agegrp 0-10 = "0-10"
11-20 = "11-20"
21-30 = "21-30"
31-40 = "31-40"
41-high = "41+"
. = "unknown";
RUN;

And the following cross table:

PROC TABULATE;
	class group sex; 
	class age / preloadfmt missing;
	class residenceStatus;
	TABLE group, (ALL="Total residence status" residenceStatus="") * (ALL="Total age" age="") * (ALL="Total sex" sex="") /misstext="0";
	FORMAT age agegrp.;
RUN;

As to privacy concerns I have to anonymize all the data if the total number is below a certain threshold. Let's suppose that if this threshold is 5 all the values on that line should be masked by "x":

 

 9  3  6  0  0  0  0  3  3 ...
 4  x  x  x  x  x  x  x  x ...  (4 is smaller than 5)
10  0  3  7  1  0  1  0  0 ... 
 1  x  x  x  x  x  x  x  x  ... (1 is smaller than 5)

Is it possible to do this in SAS at all? My first idea was to export it as an excel-sheet and then use VBA-Script instead. Thanks for your ideas!

5 REPLIES 5
ballardw
Super User

You are going to have two basic choices:

1) summarize the data and do some severe data modification such as comparing all of the values needed for a row and setting key value to all of them when the rule is met prior to display.

   Which may be awkward getting all the summaries to compare as need since you showing several nesting levels in you columns. Proc Summary with all of your class variables as class in Proc summary should work but you'll need to do some post processing to get the "totals" as separate values, based on _type_ out of proc summary.

This approach may yield a set more appropriate for Proc Print than tabulate.

 

2) Proc Report will let you examine the statistics generated and modify other "stuff" but may not be trivial and past my experience level at this time. Maybe calling @Cynthia_sas can provide a suggestion

 

PGStats
Opal | Level 21

Could be done like this:

 

proc summary data=offenders(keep=group sex age residenceStatus) nway completetypes;
format age ageGrp.;
class group sex residenceStatus;
class age / preloadfmt;
output out=offTableRaw;
run;

proc sql;
create table offTable as
select 
	group,
	age,
	sex,
	residenceStatus,
	case when sum(_freq_) < 5 then .x else _freq_ end as x
from 
	offTableRaw
group by group;
quit;

PROC TABULATE data=offTable format=3.0;
	class group sex age; 
	class residenceStatus;
	var x;
	TABLE group, 
		(ALL="Total residence status" residenceStatus="") * 
		(ALL="Total age" age="") * 
		(ALL="Total sex" sex="") * 
		x = "" * sum="" /misstext="x";
RUN;
PG
ChrisNZ
Tourmaline | Level 20

This does what you want, but it's a bit convoluted.

Proc report might be better.

I set all anonymised data to negative numbers and format that to x.

The total is formatted to its value.

If there is only one record, a second one is created to ensure the total has a unique value in the row.

Ensure there is no sparse data to get rid of the zeros.

proc sql ;
  create table SMALL as 
  select unique GROUP, count(*) as N
  from OFFENDERS group by GROUP having count(*) < 5;
quit;

proc sort data=OFFENDERS; 
  by GROUP;
run;

data REPORT; 
  length STR $800;
  retain STR;
  merge OFFENDERS SMALL(in=SMALL); 
  by GROUP;
  if SMALL then do;
    GRPNO+1;
    LINENO+1;
    NB=-1;
    if last.GROUP then do;
      NB=-1000*GRPNO;
      STR=catx(' ',STR,NB-LINENO+1-first.GROUP,'="',N,'"');
      LINENO=0;
    end; 
  end; 
  else NB=1; 
  output;
  if first.GROUP and last.GROUP then do;
    if SEX='m' then SEX='f'; else SEX='m';
    NB=-1; output;
  end; 
  call symput('fmt', STR);
run; 
%put &=fmt;

proc format;
  value grp  0-10  = "0-10"
            11-20  = "11-20"
            21-30  = "21-30"
            31-40  = "31-40"
            41-high= "41+"
                 . = "unknown";
  value anon .='0'  &fmt. 0-high=[5.] other='x';
run;


proc tabulate data=REPORT missing ;
	class GROUP SEX AGE RESIDENCESTATUS;    
  var NB;
	table GROUP, (ALL="Total residence status" RESIDENCESTATUS="") 
             * (ALL="Total age" AGE="") 
             * (ALL="Total sex" SEX="") 
             * NB=' '*sum=' '*f=anon. ;
  format AGE agegrp. ;
run;

ChrisNZ_1-1589414957050.png

 

 

 

RichardDeVen
Barite | Level 11

A PROC REPORT offers much finer control over conditional data value rendering.

 

A stacked column specification (level1, level2, level3)of /ACROSS columns is equivalent to crossed classes level1 * level2 * level3 in a TABULATE TABLE statement.  REPORT does not have an in-built ALL concept like TABULATE, so additional data processing is needed to 'synthetically' create rows for all combinations of the ALL states to be grouped.  The AGEGRP format also needs to be modified to handle a coded value representing an ALL grouping (I chose negative one (-1)) 

 

Example:

 

PROC FORMAT;
  VALUE agegrp (notsorted)
    -1 = " Total age "
    .  = "unknown"
     0-10 = "0-10"
    11-20 = "11-20"
    21-30 = "21-30"
    31-40 = "31-40"
    41-high = "41+"
  ;
RUN;

 

 Data processing to add observations for ALL 'grouping' in a PROC REPORT across variable.

 

data have_all;
  length status sex $50;
  set have;

  h2 = age;  drop h2;
  h3 = sex;  drop h3;

  output;
         
  * outputs to force the tiers of ALL groupings;                        
  *----;                         * ALL tierage;

  sex    = ' Total sex';
  output;                        * . . A ;

  *----;

  age = -1;
  sex = h3; 
  output;                        * . A . ;

  sex = ' Total sex';
  output;                        * . A A ;

  *----;

  status = ' Total residence status';  
  age = h2;
  sex = h3;
  output;                        * A . . ;

  sex = ' Total sex';
  output;                        * A . A ;

  age = -1;
  sex = h3;
  output;                        * A A . ;

  sex = ' Total sex';
  output;                        * A A A ;
run;

 

PROC Report with COMPUTE block that applies masking format XFORMAT on a conditional basis

 

proc format;
  value XFORMAT low-high, other = 'X';

proc report data=have_all missing;
  columns  
    group 
    (status,age,sex)
    sentinel
  ;

  define group / group;
  define status / ' ' across NOZERO;
  define age    / ' ' across NOZERO format=agegrp. preloadfmt order=data;
  define sex    / ' ' across NOZERO;
  define sentinel / computed noprint;

  * _c1_ is group;
  * _c2_ is all*all*all total;
  * NOTE: _c#_ references will include NOZERO columns that do not render;

  compute before;
    * count the number of columns created by stacking the across columns;
    length colref var $32;

    do index = 3 by 1 until (missing(var));
      colref = cats('_c',index,'_');
      var = vnamex(colref);
    end;
    topcol = index - 1;
  endcomp;

  compute sentinel;
    if missing (topcol) then return;
    do index = 3 to topcol;
      colref = cats('_c',index,'_');

      * conditionally apply the masking format;
      if _c2_ < 5 then call define(trim(colref), 'FORMAT', 'XFORMAT.');
    end;
    
  endcomp;
run;

 

Compare Tabulate to Report output

RichardADeVenezia_0-1589450691774.png

 

ChrisNZ
Tourmaline | Level 20

Something a bit simpler: Find the rows to hide and apply the hiding format in a compute block.

proc sql noprint;
  select quote(put(NAME,$1.)) into :hide separated by ',' from SASHELP.CLASS group by put(NAME,$1.) having count(*)=1; 
quit;
%put &=hide;

proc format ;
  value anon other='x' ; 
run;

%macro hide;
  %local i;
  %do i=2 %to 13; 
    if NAME in:(&hide) then call define(&i.+1,'format', 'anon.') ;
  %end;
%mend;

options missing='0';
proc report data=sashelp.class;
  columns NAME  (N (SEX, (AGE ,N))) DUMMY;
  define NAME / group format=$1.;
  define SEX / across;
  define AGE / across;
  define DUMMY / noprint;
  compute DUMMY;
    %hide
  endcomp;
run;
options missing='.';

ChrisNZ_0-1589496112567.png

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 867 views
  • 1 like
  • 5 in conversation