Hi Team,
I am working on a sas dataset which have account numbers. I want to mask the last 5 digits of the Number with "X";
Here is the dataset;
Data test;
input acct;
cards;
676869845614567
7689645983
;
run;
Desired output
acct
6768698456XXXX
76896XXXXX
Please suggest any approach
Thanks,
Sanjay
Do you want to change the actual values or simply apply a format that masks the last 5 digits?
If you're looking for a format solution, then do
proc format;
picture mask (default=15)
other = '0000000000XXXXX' (mult=0.00001);
run;
data test;
value=676869845614567;output;
value=7689645983; output;
format value mask.;
run;
Result:
value 6768698456XXXXX 76896XXXXX
This is possibly one of the few times it might make sense to use numeric variables instead of character variables for account id's. @PeterClemmensen 's suggestion allows you to keep actual ACCT values for subsequent matching purposes, while masking a trailing digits when the variable is displayed. Of course, this approach is really only effective when those data consumers unauthorized to see the actual values don't have direct access to the data variable being formatted.
The trade-off in @Kurt_Bremser 's suggestion is that once the character variable ACCT has been changed (replacing trailing characters with XXXXX), it may no longer be specific (in fact making it non-specific is the point). Not only will the modified value no longer be useable as a match key, it could easily create false duplicates within the data set - say if you want the average number of records per account. So make sure you do all your data aggregation and matching prior to any report generation.
If the goal was just to mask account IDs for output, my method would be sufficient. If the goal is to anonymize the data and keep it usable, then a surrogate key needs to be created.
@mkeintz I agree.
@sanjay1 If value is really character and you do not want to alter the actual value, you can still do this, though it requires a bit more work and the code is not as slick
proc fcmp outlib=work.functions.fun;
function mask(string $) $;
substr(string,max(1, length(string)-4),5)='XXXXX';
return (string);
endsub;
run;
options cmplib=work.functions;
proc format;
value $ mask (default=200) other=[mask()];
run;
data test;
value='676869845614567'; output;
value='7689645983'; output;
format value $mask.;
run;
proc print data=test;
run;
Hi @sanjay1
You can try this
data want;
set test;
acct_new = compress(put(acct,20.));
substr(acct_new,length(acct_new)-4) = "XXXXX";
run;
(NB: the put() function is not useful if the act variable type is already character.)
Account numbers are not numbers per se, as they are not used for calculations. Therefore store them as character.
Then masking is quite easy:
data have;
input acct :$20.;
cards;
676869845614567
7689645983
;
run;
data want;
set have;
substr(acct,length(acct)-4) = 'XXXXX';
run;
Just for completeness sake:
data have;
input acct :$20.;
cards;
676869845614567
7689645983
1234
;
data want;
set have;
masked = prxchange("s/\d{5}\s*$/XXXXX/o", 1, acct);
run;
proc print; run;
Obs. acct masked 1 676869845614567 6768698456XXXXX 2 7689645983 76896XXXXX 3 1234 1234
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.