BookmarkSubscribeRSS Feed
Sanchit1
Calcite | Level 5

I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.

SNO FIRSTNAME LASTNAME 
1   JAYVADAN  DARJI
2   KAPIL     CHAUDHARY

In the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.

The desired output that I want

SNO FIRSTNAME LASTNAME   MASKED_NAME 
1   JAYVADAN  DARJI      JXXXXXXN DXXXI
2   KAPIL     CHAUDHARY  KXXXL CXXXXXXXY

Thanks a lot in advance.

5 REPLIES 5
ballardw
Super User

@Sanchit1 wrote:

I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.

SNO FIRSTNAME LASTNAME 
1   JAYVADAN  DARJI
2   KAPIL     CHAUDHARY

In the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.

The desired output that I want

SNO FIRSTNAME LASTNAME   MASKED_NAME 
1   JAYVADAN  DARJI      JXXXXXXN DXXXI
2   KAPIL     CHAUDHARY  KXXXL CXXXXXXXY

Thanks a lot in advance.


One way:

data example;
  input SNO FIRSTNAME :$10. LASTNAME :$10.;
  length masked_name $ 25;
  masked_name=catx(' ',cats(substr(firstname,1,1),repeat('X',length(firstname)-3),substr(firstname,length(firstname))),
                       cats(substr(lastname,1,1),repeat('X',length(lastname)-3),substr(lastname,length(lastname)))
                  );
datalines;
1   JAYVADAN  DARJI
2   KAPIL     CHAUDHARY
;

Set the length of masked name to the longest expected value, remember to include the space between the names.

The Repeat function creates a string of the characters repeated the number of times in the second parameter. SUBSTR extracts parts of a string, LENGTH returns the number of characters in a string ignoring trailing blanks, CATS contcatenates strings stripping blanks, CATX inserts a specified character between strings when concatenated

Reeza
Super User
Each column will have only one word? No Van der Bergens?

AhmedAl_Attar
Ammonite | Level 13

Hi

Here is an approach that uses  Regular Expression

data have;
length sno 4 FIRSTNAME $20 LASTNAME $20;
input SNO FIRSTNAME $ LASTNAME $;
datalines; 
1 JAYVADAN DARJI
2 KAPIL CHAUDHARY
;
run;

data want(drop=m_:);
	set have;
	length m_FIRSTNAME m_LASTNAME masked_name $60;
	m_FIRSTNAME=cat(SUBSTR(FIRSTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, FIRSTNAME),2));
	m_LASTNAME=cat(SUBSTR(LASTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, LASTNAME),2));
	masked_name=catx(' ',m_FIRSTNAME,m_LASTNAME);
run;

Hope this helps,

Ahmed

RichardDeVen
Barite | Level 11

You can replace the center substring of each 'word' with X's by using SUBSTR on the LEFT SIDE of the equals sign.

 

Example:

data have; 
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN  DARJI
KAPIL     CHAUDHARY
BOBBY JOE  MCALLISTER 
KAPIL     A CHAUDHARY JR
;

data want;
  set have;

  length masked_name $100;

  _name = catx(' ', firstname, lastname);

  do _n_ = 1 to countw(_name);
    _word = scan(_name, _n_);

    select (length(_word));
      when (1) _word = '*';
      when (2) _word = '**';
      otherwise substr(_word,2,length(_word)-2) = repeat('*', length(_word));
    end;

    masked_name = catx(' ', masked_name, _word);
  end;  

  drop _:;
run;

Output data

RichardADeVenezia_0-1601060699140.png

 

Ksharp
Super User
data have; 
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN  DARJI
KAPIL     CHAUDHARY
BOBBY JOE  MCALLISTER 
KAPIL     A CHAUDHARY JR
;

data want;
  set have;
  new_firstname=firstname; 
  new_lastname=lastname;

  n=length(new_firstname);
  do i=1 to n;
    if i ne 1 and i ne n then substr(new_firstname,i,1)='X';
  end;

  n=length(new_lastname);
  do i=1 to n;
    if i ne 1 and i ne n then substr(new_lastname,i,1)='X';
  end;
  drop i n;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 640 views
  • 3 likes
  • 6 in conversation