Re: Issue in masking in sas

Sanchit1 · Posted 09-25-2020 12:21 PM

I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.

SNO FIRSTNAME LASTNAME 
1   JAYVADAN  DARJI
2   KAPIL     CHAUDHARY

In the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.

The desired output that I want

SNO FIRSTNAME LASTNAME   MASKED_NAME 
1   JAYVADAN  DARJI      JXXXXXXN DXXXI
2   KAPIL     CHAUDHARY  KXXXL CXXXXXXXY

Thanks a lot in advance.

ballardw · Posted 09-25-2020 12:55 PM

@Sanchit1 wrote:

I need to do masking of my data in order to hide it from the target audience. Please find below existing table that I have right now. There are 3 columns in that table.
SNO FIRSTNAME LASTNAME 
1   JAYVADAN  DARJI
2   KAPIL     CHAUDHARY
In the desired output, I want 4 columns with masking done as below. MASKED_NAME column would be an addition column the desired output.

The desired output that I want
SNO FIRSTNAME LASTNAME   MASKED_NAME 
1   JAYVADAN  DARJI      JXXXXXXN DXXXI
2   KAPIL     CHAUDHARY  KXXXL CXXXXXXXY
Thanks a lot in advance.

One way:

data example;
  input SNO FIRSTNAME :$10. LASTNAME :$10.;
  length masked_name $ 25;
  masked_name=catx(' ',cats(substr(firstname,1,1),repeat('X',length(firstname)-3),substr(firstname,length(firstname))),
                       cats(substr(lastname,1,1),repeat('X',length(lastname)-3),substr(lastname,length(lastname)))
                  );
datalines;
1   JAYVADAN  DARJI
2   KAPIL     CHAUDHARY
;

Set the length of masked name to the longest expected value, remember to include the space between the names.

The Repeat function creates a string of the characters repeated the number of times in the second parameter. SUBSTR extracts parts of a string, LENGTH returns the number of characters in a string ignoring trailing blanks, CATS contcatenates strings stripping blanks, CATX inserts a specified character between strings when concatenated

Reeza · Posted 09-25-2020 01:05 PM

Each column will have only one word? No Van der Bergens?

AhmedAl_Attar · Posted 09-25-2020 01:48 PM

Hi

Here is an approach that uses Regular Expression

data have;
length sno 4 FIRSTNAME $20 LASTNAME $20;
input SNO FIRSTNAME $ LASTNAME $;
datalines; 
1 JAYVADAN DARJI
2 KAPIL CHAUDHARY
;
run;

data want(drop=m_:);
	set have;
	length m_FIRSTNAME m_LASTNAME masked_name $60;
	m_FIRSTNAME=cat(SUBSTR(FIRSTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, FIRSTNAME),2));
	m_LASTNAME=cat(SUBSTR(LASTNAME,1,1),SUBSTR(prxchange('s/(\w)(.*?)(?=\w)/x/', -1, LASTNAME),2));
	masked_name=catx(' ',m_FIRSTNAME,m_LASTNAME);
run;

Hope this helps,

Ahmed

RichardDeVen · Posted 09-25-2020 03:07 PM

You can replace the center substring of each 'word' with X's by using SUBSTR on the LEFT SIDE of the equals sign.

Example:

data have; 
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN  DARJI
KAPIL     CHAUDHARY
BOBBY JOE  MCALLISTER 
KAPIL     A CHAUDHARY JR
;

data want;
  set have;

  length masked_name $100;

  _name = catx(' ', firstname, lastname);

  do _n_ = 1 to countw(_name);
    _word = scan(_name, _n_);

    select (length(_word));
      when (1) _word = '*';
      when (2) _word = '**';
      otherwise substr(_word,2,length(_word)-2) = repeat('*', length(_word));
    end;

    masked_name = catx(' ', masked_name, _word);
  end;  

  drop _:;
run;

Output data

Ksharp · Posted 09-26-2020 07:46 AM

data have; 
length firstname lastname $50;
input firstname & lastname &;
datalines;
JAYVADAN  DARJI
KAPIL     CHAUDHARY
BOBBY JOE  MCALLISTER 
KAPIL     A CHAUDHARY JR
;

data want;
  set have;
  new_firstname=firstname; 
  new_lastname=lastname;

  n=length(new_firstname);
  do i=1 to n;
    if i ne 1 and i ne n then substr(new_firstname,i,1)='X';
  end;

  n=length(new_lastname);
  do i=1 to n;
    if i ne 1 and i ne n then substr(new_lastname,i,1)='X';
  end;
  drop i n;
run;

Issue in masking in sas