I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.
SNO FIRSTNAME LASTNAME
1 JAYVADAN DARJI
2 KAPIL CHAUDHARYIn the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.
The desired output that I want
SNO FIRSTNAME LASTNAME MASKED_NAME
1 JAYVADAN DARJI JXXXXXXN DXXXI
2 KAPIL CHAUDHARY KXXXL CXXXXXXXYThanks a lot in advance.
@Sanchit1 wrote:
I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.
SNO FIRSTNAME LASTNAME 1 JAYVADAN DARJI 2 KAPIL CHAUDHARYIn the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.
The desired output that I want
SNO FIRSTNAME LASTNAME MASKED_NAME 1 JAYVADAN DARJI JXXXXXXN DXXXI 2 KAPIL CHAUDHARY KXXXL CXXXXXXXYThanks a lot in advance.
One way:
data example;
input SNO FIRSTNAME :$10. LASTNAME :$10.;
length masked_name $ 25;
masked_name=catx(' ',cats(substr(firstname,1,1),repeat('X',length(firstname)-3),substr(firstname,length(firstname))),
cats(substr(lastname,1,1),repeat('X',length(lastname)-3),substr(lastname,length(lastname)))
);
datalines;
1 JAYVADAN DARJI
2 KAPIL CHAUDHARY
;
Set the length of masked name to the longest expected value, remember to include the space between the names.
The Repeat function creates a string of the characters repeated the number of times in the second parameter. SUBSTR extracts parts of a string, LENGTH returns the number of characters in a string ignoring trailing blanks, CATS contcatenates strings stripping blanks, CATX inserts a specified character between strings when concatenated
Hi
Here is an approach that uses Regular Expression
data have;
length sno 4 FIRSTNAME $20 LASTNAME $20;
input SNO FIRSTNAME $ LASTNAME $;
datalines;
1 JAYVADAN DARJI
2 KAPIL CHAUDHARY
;
run;
data want(drop=m_:);
set have;
length m_FIRSTNAME m_LASTNAME masked_name $60;
m_FIRSTNAME=cat(SUBSTR(FIRSTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, FIRSTNAME),2));
m_LASTNAME=cat(SUBSTR(LASTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, LASTNAME),2));
masked_name=catx(' ',m_FIRSTNAME,m_LASTNAME);
run;
Hope this helps,
Ahmed
You can replace the center substring of each 'word' with X's by using SUBSTR on the LEFT SIDE of the equals sign.
Example:
data have;
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN DARJI
KAPIL CHAUDHARY
BOBBY JOE MCALLISTER
KAPIL A CHAUDHARY JR
;
data want;
set have;
length masked_name $100;
_name = catx(' ', firstname, lastname);
do _n_ = 1 to countw(_name);
_word = scan(_name, _n_);
select (length(_word));
when (1) _word = '*';
when (2) _word = '**';
otherwise substr(_word,2,length(_word)-2) = repeat('*', length(_word));
end;
masked_name = catx(' ', masked_name, _word);
end;
drop _:;
run;
Output data
data have;
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN DARJI
KAPIL CHAUDHARY
BOBBY JOE MCALLISTER
KAPIL A CHAUDHARY JR
;
data want;
set have;
new_firstname=firstname;
new_lastname=lastname;
n=length(new_firstname);
do i=1 to n;
if i ne 1 and i ne n then substr(new_firstname,i,1)='X';
end;
n=length(new_lastname);
do i=1 to n;
if i ne 1 and i ne n then substr(new_lastname,i,1)='X';
end;
drop i n;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.