I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.
SNO FIRSTNAME LASTNAME
1 JAYVADAN DARJI
2 KAPIL CHAUDHARY
In the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.
The desired output that I want
SNO FIRSTNAME LASTNAME MASKED_NAME
1 JAYVADAN DARJI JXXXXXXN DXXXI
2 KAPIL CHAUDHARY KXXXL CXXXXXXXY
Thanks a lot in advance.
@Sanchit1 wrote:
I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.
SNO FIRSTNAME LASTNAME 1 JAYVADAN DARJI 2 KAPIL CHAUDHARY
In the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.
The desired output that I want
SNO FIRSTNAME LASTNAME MASKED_NAME 1 JAYVADAN DARJI JXXXXXXN DXXXI 2 KAPIL CHAUDHARY KXXXL CXXXXXXXY
Thanks a lot in advance.
One way:
data example; input SNO FIRSTNAME :$10. LASTNAME :$10.; length masked_name $ 25; masked_name=catx(' ',cats(substr(firstname,1,1),repeat('X',length(firstname)-3),substr(firstname,length(firstname))), cats(substr(lastname,1,1),repeat('X',length(lastname)-3),substr(lastname,length(lastname))) ); datalines; 1 JAYVADAN DARJI 2 KAPIL CHAUDHARY ;
Set the length of masked name to the longest expected value, remember to include the space between the names.
The Repeat function creates a string of the characters repeated the number of times in the second parameter. SUBSTR extracts parts of a string, LENGTH returns the number of characters in a string ignoring trailing blanks, CATS contcatenates strings stripping blanks, CATX inserts a specified character between strings when concatenated
Hi
Here is an approach that uses Regular Expression
data have;
length sno 4 FIRSTNAME $20 LASTNAME $20;
input SNO FIRSTNAME $ LASTNAME $;
datalines;
1 JAYVADAN DARJI
2 KAPIL CHAUDHARY
;
run;
data want(drop=m_:);
set have;
length m_FIRSTNAME m_LASTNAME masked_name $60;
m_FIRSTNAME=cat(SUBSTR(FIRSTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, FIRSTNAME),2));
m_LASTNAME=cat(SUBSTR(LASTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, LASTNAME),2));
masked_name=catx(' ',m_FIRSTNAME,m_LASTNAME);
run;
Hope this helps,
Ahmed
You can replace the center substring of each 'word' with X's by using SUBSTR on the LEFT SIDE of the equals sign.
Example:
data have; length firstname lastname $50; input firstname & lastname &; datalines; JAYVADAN DARJI KAPIL CHAUDHARY BOBBY JOE MCALLISTER KAPIL A CHAUDHARY JR ; data want; set have; length masked_name $100; _name = catx(' ', firstname, lastname); do _n_ = 1 to countw(_name); _word = scan(_name, _n_); select (length(_word)); when (1) _word = '*'; when (2) _word = '**'; otherwise substr(_word,2,length(_word)-2) = repeat('*', length(_word)); end; masked_name = catx(' ', masked_name, _word); end; drop _:; run;
Output data
data have;
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN DARJI
KAPIL CHAUDHARY
BOBBY JOE MCALLISTER
KAPIL A CHAUDHARY JR
;
data want;
set have;
new_firstname=firstname;
new_lastname=lastname;
n=length(new_firstname);
do i=1 to n;
if i ne 1 and i ne n then substr(new_firstname,i,1)='X';
end;
n=length(new_lastname);
do i=1 to n;
if i ne 1 and i ne n then substr(new_lastname,i,1)='X';
end;
drop i n;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.