I am going to copy a dataset from one location to another in which it has sensitive information in some columns so in my script while copying the sensitive column data should be masked or replaced with any random series r names like test1, test2,...testn
libname loc1 "destination path";
libname loc2 "source path";
data loc1.datasetname_masked;
set loc2.datasetname_sensitive;
run;
in above datasetname_sensitive have a column(field) called phone_number the entries in field phone_number should be replaced with test1, test2. test3... testn
the dataset is like this
I need like this after copying with Phone_number field replaced with masking text(any)
the sensitive field can have any values eg:number, text or any special characters (here number) and it can be replaced with any random text say XXX, yyyy(my motive is to just mask) the entries
Thanks in advance.
Yes, 'to' is the destination path. In my example it is the libname that I specify above.
'Work' is the libname of the library in which your data is stored. In my example, my data sets are in the temporary work library, so change this to the library in which your data is stored.
Yes exactly. The Select Statement lists the data sets you want to copy. If you omit this statement, SAS will copy all data sets from the library.
The Modify Statement lists the individual data set you want to edit. For example this
modify data1;
format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
means that we want to modify the data1 data set (in the to library). The Format Statement then formats the listed variables in data1.
Is this both numeric and character values that you want to mask?
An obvious solution would be simply to change the value of the variable you want to mask?
You could also create formats to mask your values like below. Be aware that this simply formats the values, but does not change the underlying value.
proc format;
value $ cmask other = 'XXXX';
value nmask other = 'XXXX';
run;
data test;
input cvar1 $ cvar2 $ nvar1 nvar2;
datalines;
a b 1 2
c d 3 4
e f 5 6
;
data want;
set test;
format cvar2 $cmask. nvar2 nmask.;
run;
Result:
cvar1 cvar2 nvar1 nvar2 a XXXX 1 XXXX c XXXX 3 XXXX e XXXX 5 XXXX
Thanks Draycut,
The phone number field is just an example and I have many other fields like name, address, relationship status and all personal information of my clients which I need to be masked while copying the dataset to other users in my team. I don't want to show these details to them and they wont need these fields as well. so while copying the dataset to them I need to give a copy of the SAS dataset in which the sensitive fields are masked or replaced. this is my actual need.
Thanks again for your help in this.
Its working fine for single dataset. That's Great. I have a doubt here I am going to create a single common code for copying any dataset from one loc to another. i.e dataset1, dataset2,dataset3 and so on. dataset1 have 2 fields to be masked eg: phone_num and name. dataset2 have 1field to masked say date of birth and dataset3 has 3fields to be masked eg: mobile_no, gender and salary to be masked.
its is possible to use format condition in if statement,
proc format;
value $ mask other = 'ZZZZ';
run;
data want;
set test;
format field1_dataset1 $mask. field2_dataset1 $mask. field1_dataset2 $mask. field1_dataset3 $mask. field2_dataset3 $mask. field3_dataset3 $mask.;
run;
how to use one format function for all fields conditions i.e. if the field is present it should be masked else it wont.
but in the above code it create a new field in the copied dataset in which it is not present, can you help me in this
The variables you want to format here, are they all character, or a mix of numeric and character?
yes it is a mix of numeric and character type
Did you try my code below?
One way to get around this is to use PROC DATASETS to copy the data sets and apply the formats. PROC DATASETS can not create new variables, it only reads and edits the descriptor portion of a data set. Not the actual data. Therefore, you will simply get a warning for the varables not present in the data set being processed.
Is something like this what you are after? Obviously, this can be automated further, but I want to make sure that this is actually what you want first 🙂
data data1;
input var1 phone_num $ name $;
datalines;
1 1234 Sampson
;
data data2;
input var1 var2 dob :date9.;
format dob date9.;
datalines;
1 2 01jan2010
;
data data3;
input var1 $ var2 mobile_no gender $ salary 8.;
datalines;
a 1 123 M 10000
;
libname to "PathHere";
proc format;
value $ cmask (default=12) other = 'ZZZZ';
value nmask (default=12) other = 'ZZZZ';
run;
proc datasets nolist lib=to;
copy in=work out=to memtype=data;
select data1 data2 data3;
modify data1;
format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
modify data2;
format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
modify data3;
format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
run;quit;
Also, do you want to copy all data sets in a library or only some of them?
let me explain more in detail.
I have a location where many sas dataset are present, let say production location X. These datasets are being used by my whole team. So they will copy the dataset from X to their location say Y. so I am writing a code to do this. so while doing this my code needs to mask the sensitive information in all dataset which ever copied. they will use my code to copy datasets. they will just define the source and destination in my code and run. so during copying the sensitive data should me masked. I already identified all the sensitive fields in all datasets which are need to be masked while copying. it contains Character and numeric types only. Thanks.
I understand that. So will they copy one data set at the time or multiple?
Ok. Did you try my PROC DATASETS code above?
Ok. Pleae try that first and feel free to ask if you have questions.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.