Hello folks, I need to mask values of three data columns using the following condition:
the table (named: tbl25) i am trying to mask values looks like below, data points are made-up:
I figured out a very clumsy way of doing it and below is my code, as you can see there're three issues with it:
Given this is just one of the many tables i need to mask values, can anyone provide a concise method to achieve the same goal? many many thanks!!!
data want1;
flg=0;
array s(8);
do i = 1 to 8;
set tbl25;
if nopos LE 11 AND NOPOS NE 0 then flg=1;
else s(i) = NOPOS; /*<11 became missing*/
end;
if nmiss(flg) ne 1 then do;
second = min(of s(*));
end;
do i = 1 to 8;
if s(i) = second and second ne 0 then do;
s(i)=-100; /*complementary became -100*/
end;
end;
do i = 1 to 8;
nopos = s(i);
output;
end;
keep nopos;
run;
**;
data want2;
array s(8);
do i = 1 to 8;
set tbl25;
if yespos LE 11 AND yesPOS NE 0 then flg=1;
else s(i) = yesPOS; /*<11 became missing*/
end;
if nmiss(flg) ne 1 then do;
second = min(of s(*));
end;
do i = 1 to 8;
if s(i) = second and second ne 0 then do;
s(i)=-100; /*complementary became -100*/
end;
end;
do i = 1 to 8;
yespos = s(i);
output;
end;
keep yespos;
run;
**;
data want3;
array s(8);
do i = 1 to 8;
set tbl25;
if totcnt LE 11 AND totcnt NE 0 then flg=1;
else s(i) = totcnt; /*<11 became missing*/
end;
if nmiss(flg) ne 1 then do;
second = min(of s(*));
end;
do i = 1 to 8;
if s(i) = second and second ne 0 then do;
s(i)=-100; /*complementary became -100*/
end;
end;
do i = 1 to 8;
totcnt = s(i);
output;
end;
keep totcnt;
run;
data want1; set want1;
id=_N_; run;
data want2; set want2;
id=_N_; run;
data want3; set want3;
id=_N_; run;
proc sort data=want1; by id; run;
proc sort data=want2; by id; run;
proc sort data=want3; by id; run;
data all;
merge want1(in=a) want2(in=b) want3(in=c);
by id;
if a or b or c; run;
You effectively have to read your dataset twice, because if you have only 1 value in a column that is not zero (I take that to mean greater than zero) and less than eleven, you need to know what the second smallest value is, in order to indicate it as complimentary suppression.
You can do that via a single data step, as below (I'm using .P for primary suppression and .C for complimentary).
data have ;
input eth $3. nopos yespos totcnt ;
cards ;
gp1 111 100 200
gp2 119 100 60
gp3 15 7 800
gp4 100 584 693
gp5 3 5 8
gp6 999 300 400
gp7 593 2400 3000
gp8 5081 9000 4000
run;
data want (drop=i);
set have (in=firstpass) have (in=secondpass);
array vars {*} nopos yespos totcnt ;
array count_primaries{3} _temporary_;
array smallest_complimentary{3} _temporary_;
if firstpass then do i=1 to dim(vars);
if 0<vars{i}<11 then count_primaries{i}+1;
else smallest_complimentary{i}=min(smallest_complimentary{i},vars{i});
end;
if secondpass;
do i=1 to dim(vars);
if 0<vars{i}<11 then vars{i}=.P;
if count_primaries{i}=1 and vars{i}=smallest_complimentary{i} then vars{i}=.C;
end;
run;
I don't understand what this means: "if only one value was masked, mask the second smallest value (complementary suppression)".
Walk us through some example data indicating which is second smallest (in each variable across all three) describe what role that ETH variable may have.
Show what the output for this is supposed to be.
Arrays only use values from one observation so that is pretty much a complete bust if you were expecting that loop to use different observations.
Hello @ballardw
sorry for not explaining it well, so take my following table as an example: when i mask the column "nopos", value 3 will be masked as a primary suppression because it's smaller than 11, while all the rest of the values are > =11, I need to mask 15 as a complementary suppression because gp8 is my column total, and one can easily derive 3 (although masked) using gp8 (total) minus the rest of the values if 15 is not suppressed.
for the column "yespos", things will be easier because i have two values < 11 (7,5) so all I need to do is the primary suppression.
eth is my ethnicity class variable and each gp represents a race/ethnicity whereas gp8 represents the total row.
Hope this makes sense.
It helps if you post sample data as a DATA step with CARDS data, instead of a picture. Like:
data have ;
input eth $3. nopos yespos totcnt ;
cards ;
gp1 111 100 200
gp2 119 100 60
gp3 15 7 800
gp4 100 584 693
gp5 3 5 8
gp6 999 300 400
gp7 593 2400 3000
gp8 5081 9000 4000
;
For the first step, you can use an array to recode all the numeric values that are between 0 and 10 (your text says less than 11, but your code says less than or equal to 11):
data m1 ;
set have ;
array nums {*} _numeric_ ;
do i=1 to dim(nums) ;
if 0<=nums{i}<11 then nums{i}=.M ;
end ;
drop i ;
run ;
For the second step, I use PROC SQL to test whether there is one missing value in a column and whether a value is equal to the minimum of the column:
proc sql ;
create table m2 as
select eth
,ifn( nopos=min( nopos) and nmiss( nopos)=1,.S, nopos) as nopos
,ifn(yespos=min(yespos) and nmiss(yespos)=1,.S,yespos) as yespos
,ifn(totcnt=min(totcnt) and nmiss(totcnt)=1,.S,totcnt) as totcnt
from m1
;
quit ;
Since you have a lot of datasets to mask, you could use a macro to generate the list of IFN() expressions for the select statement.
Result is (I think) what you want for this data. Assuming you want to mask TOTCNT as well:
Obs eth nopos yespos totcnt 1 gp1 111 100 200 2 gp2 119 100 S 3 gp3 S M 800 4 gp4 100 584 693 5 gp5 M M M 6 gp6 999 300 400 7 gp7 593 2400 3000 8 gp8 5081 9000 4000
One potential issue with this approach is that in the second step, if there are two records that are tied for having the minimum value, they will both be masked. Which may not be what you want. If in that circumstance you would only want to mask one value, I think I would use a DATA step approach for the second step.
I used special missing values .M and .S for masking just to differentiate results from each step. When you print the results, you could format these values as asterisks or whatever you want.
And what do you intend to do if you have a tie for second smallest? I.e. assume nopos has a single 3, and two 15's.
Oh yes, and what if a variable has NO values less than 11 but not equal to zero?
You effectively have to read your dataset twice, because if you have only 1 value in a column that is not zero (I take that to mean greater than zero) and less than eleven, you need to know what the second smallest value is, in order to indicate it as complimentary suppression.
You can do that via a single data step, as below (I'm using .P for primary suppression and .C for complimentary).
data have ;
input eth $3. nopos yespos totcnt ;
cards ;
gp1 111 100 200
gp2 119 100 60
gp3 15 7 800
gp4 100 584 693
gp5 3 5 8
gp6 999 300 400
gp7 593 2400 3000
gp8 5081 9000 4000
run;
data want (drop=i);
set have (in=firstpass) have (in=secondpass);
array vars {*} nopos yespos totcnt ;
array count_primaries{3} _temporary_;
array smallest_complimentary{3} _temporary_;
if firstpass then do i=1 to dim(vars);
if 0<vars{i}<11 then count_primaries{i}+1;
else smallest_complimentary{i}=min(smallest_complimentary{i},vars{i});
end;
if secondpass;
do i=1 to dim(vars);
if 0<vars{i}<11 then vars{i}=.P;
if count_primaries{i}=1 and vars{i}=smallest_complimentary{i} then vars{i}=.C;
end;
run;
Thanks for sharing a great DATA step solution @mkeintz .
I was thinking along the same lines but with a double DOW loop. I don't think I've seen this approach of using SET statement concatenation with a subsetting IF to achieve a similar effect. I'm sure you've probably showed it before, but either I didn't notice what you were doing or I managed to forget it.
One thing I like about the DATA step approach is it's easy to extend to handle ties differently. So if in the case where there is a tie you only want to mask one value, you can use an additional array to keep track of whether or not you have masked a value yet, e.g.:
*I added duplicate records for gp3 with nopos=15, but only one value will be masked;
data have ;
input eth $3. nopos yespos totcnt ;
cards ;
gp1 111 100 200
gp2 119 100 60
gp3 15 7 800
gp3 15 7 800
gp4 100 584 693
gp5 3 5 8
gp6 999 300 400
gp7 593 2400 3000
gp8 5081 9000 4000
run;
data want (drop=i);
set have (in=firstpass) have (in=secondpass);
array vars {*} nopos yespos totcnt ;
array count_primaries{3} _temporary_;
array smallest_complimentary{3} _temporary_;
array masked_complimentary{3} _temporary_;
if firstpass then do i=1 to dim(vars);
if 0<vars{i}<11 then count_primaries{i}+1;
else smallest_complimentary{i}=min(smallest_complimentary{i},vars{i});
end;
if secondpass;
do i=1 to dim(vars);
if 0<vars{i}<11 then vars{i}=.P;
if count_primaries{i}=1 and vars{i}=smallest_complimentary{i} and not (masked_complimentary{i}) then do ;
vars{i}=.C;
masked_complimentary{i}=1 ;
end ;
end;
run;
proc print data=want ;
run ;
@mkeintz thanks so very much!! this works exactly as what I'd want, thank you sooo much!!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.