BookmarkSubscribeRSS Feed
sanjay1
Obsidian | Level 7

Hi Team,

 

I am working on a sas dataset which have account numbers. I want to mask the last 5 digits of the Number with "X";

 

Here is the dataset;

 

Data test;

input acct;

cards;

676869845614567

7689645983

;

run;

 

Desired output

 

acct

6768698456XXXX

76896XXXXX

 

Please suggest any approach

 

Thanks,

Sanjay

 

 

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

Do you want to change the actual values or simply apply a format that masks the last 5 digits?

PeterClemmensen
Tourmaline | Level 20

If you're looking for a format solution, then do 

 

proc format;
    picture mask (default=15)
        other = '0000000000XXXXX' (mult=0.00001);
run;

data test;
    value=676869845614567;output;
    value=7689645983; output;
    format value mask.;
run;

 

Result:

 

value
6768698456XXXXX
76896XXXXX
mkeintz
PROC Star

This is possibly one of the few times it might make sense to use numeric variables instead of character variables for account id's.  @PeterClemmensen 's suggestion allows you to keep actual ACCT values for subsequent matching purposes, while masking a trailing digits when the variable is displayed.  Of course, this approach is really only effective when those data consumers unauthorized to see the actual values don't have direct access to the data variable being formatted.

 

The trade-off in @Kurt_Bremser 's suggestion is that once the character variable ACCT has been changed (replacing trailing characters with XXXXX), it may no longer be  specific (in fact making it non-specific is the point).  Not only will the modified value no longer be useable as a match key, it could easily create false duplicates within the data set - say if you want the average number of records per account.  So make sure you do all your data aggregation and matching prior to any report generation.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

If the goal was just to mask account IDs for output, my method would be sufficient. If the goal is to anonymize the data and keep it usable, then a surrogate key needs to be created.

mkeintz
PROC Star
Agreed
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PeterClemmensen
Tourmaline | Level 20

@mkeintz I agree.

 

@sanjay1 If value is really character and you do not want to alter the actual value, you can still do this, though it requires a bit more work and the code is not as slick

 

proc fcmp outlib=work.functions.fun; 
   function mask(string $) $;
      substr(string,max(1, length(string)-4),5)='XXXXX';
      return (string);
   endsub;
run;

options cmplib=work.functions;
proc format;
    value $ mask (default=200) other=[mask()];
run;

data test;
   value='676869845614567'; output;
   value='7689645983'; output;
 
   format value $mask.;
run;
 
proc print data=test;
run;
ed_sas_member
Meteorite | Level 14

Hi @sanjay1 

 

You can try this 

data want;
	set test;
	acct_new = compress(put(acct,20.));
	substr(acct_new,length(acct_new)-4) = "XXXXX";
run;

(NB: the put() function is not useful if the act variable type is already character.)

Kurt_Bremser
Super User

Account numbers are not numbers per se, as they are not used for calculations. Therefore store them as character.

Then masking is quite easy:

data have;
input acct :$20.;
cards;
676869845614567
7689645983
;
run;

data want;
set have;
substr(acct,length(acct)-4) = 'XXXXX';
run;
PGStats
Opal | Level 21

Just for completeness sake:

 

data have;
input acct :$20.;
cards;
676869845614567
7689645983
1234
;

data want;
set have;
masked = prxchange("s/\d{5}\s*$/XXXXX/o", 1, acct);
run;

proc print; run;
Obs. 	acct 	masked
1 	676869845614567 	6768698456XXXXX
2 	7689645983 	76896XXXXX
3 	1234 	1234
PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5926 views
  • 9 likes
  • 6 in conversation