BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kadz_sas1990
Obsidian | Level 7

I am going to copy a dataset from one location to another in which it has sensitive information in some columns so in my script while copying the sensitive column data should be masked or replaced with any random series r names like test1, test2,...testn 

libname loc1 "destination path";
libname loc2 "source path";
data loc1.datasetname_masked;
set loc2.datasetname_sensitive;
run;

 

in above datasetname_sensitive have a column(field) called phone_number the entries in field phone_number should be replaced with test1, test2. test3... testn

 

the dataset is like this

current.JPG

 

I need like this after copying with Phone_number field replaced with masking text(any)

result.JPG

the sensitive field can have any values eg:number, text or any special characters (here number) and it can be replaced with any random text say XXX, yyyy(my motive is to just mask) the entries

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Yes, 'to' is the destination path. In my example it is the libname that I specify above. 

 

'Work' is the libname of the library in which your data is stored. In my example, my data sets are in the temporary work library, so change this to the library in which your data is stored.

 

Yes exactly. The Select Statement lists the data sets you want to copy. If you omit this statement, SAS will copy all data sets from the library.

 

The Modify Statement lists the individual data set you want to edit. For example this

 

   modify data1;
       format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;

means that we want to modify the data1 data set (in the to library). The Format Statement then formats the listed variables in data1.

View solution in original post

32 REPLIES 32
PeterClemmensen
Tourmaline | Level 20

Is this both numeric and character values that you want to mask?

 

An obvious solution would be simply to change the value of the variable you want to mask?

 

You could also create formats to mask your values like below. Be aware that this simply formats the values, but does not change the underlying value.

 

proc format;
    value $ cmask other = 'XXXX';
    value   nmask other = 'XXXX';
run;

data test;
input cvar1 $ cvar2 $ nvar1 nvar2;
datalines;
a b 1 2
c d 3 4
e f 5 6
;

data want;
    set test;
    format cvar2 $cmask. nvar2 nmask.;
run;

 

Result:

 

cvar1 cvar2 nvar1 nvar2 
a     XXXX  1     XXXX 
c     XXXX  3     XXXX 
e     XXXX  5     XXXX 

 

Kadz_sas1990
Obsidian | Level 7

Thanks Draycut,

The phone number field is just an example and I have many other fields like name, address, relationship status and all personal information of my clients which I need to be masked while copying the dataset to other users in my team. I don't want to show these details to them and they wont need these fields as well. so while copying the dataset to them I need to give a copy of the SAS dataset in which the sensitive fields are masked or replaced. this is my actual need.

 

Thanks again for your help in this.

Kadz_sas1990
Obsidian | Level 7

Its working fine for single dataset. That's Great. I have a doubt here I am going to create a single common code for copying any dataset from one loc to another. i.e dataset1, dataset2,dataset3 and so on.  dataset1 have 2 fields to be masked eg: phone_num and name. dataset2 have 1field to masked say date of birth and dataset3 has 3fields to be masked eg: mobile_no, gender and salary to be masked.

 

its is possible to use format condition in if statement,

proc format;
        value $ mask other = 'ZZZZ';
run;

data want;
set test;
format field1_dataset1 $mask. field2_dataset1 $mask.  field1_dataset2 $mask.  field1_dataset3 $mask. field2_dataset3 $mask. field3_dataset3 $mask.;
run;

how to use one format function for all fields conditions i.e. if the field is present it should be masked else it wont.

but in the above code it create a new field in the copied dataset in which it is not present,  can you help me in this

PeterClemmensen
Tourmaline | Level 20

The variables you want to format here, are they all character, or a mix of numeric and character?

PeterClemmensen
Tourmaline | Level 20

One way to get around this is to use PROC DATASETS to copy the data sets and apply the formats. PROC DATASETS can not create new variables, it only reads and edits the descriptor portion of a data set. Not the actual data. Therefore, you will simply get a warning for the varables not present in the data set being processed.

 

Is something like this what you are after? Obviously, this can be automated further, but I want to make sure that this is actually what you want first 🙂

 

data data1;
input var1 phone_num $ name $;
datalines;
1 1234 Sampson
;

data data2;
input var1 var2 dob :date9.;
format dob date9.;
datalines;
1 2 01jan2010
;

data data3;
input var1 $ var2 mobile_no gender $ salary 8.;
datalines;
a 1 123 M 10000
;

libname to "PathHere";

proc format;
    value $ cmask (default=12) other = 'ZZZZ';
    value   nmask (default=12) other = 'ZZZZ';
run;

proc datasets nolist lib=to;
   copy in=work out=to memtype=data;
      select data1 data2 data3;
   modify data1;
       format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
   modify data2;
       format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
   modify data3;
       format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
run;quit;

Also, do you want to copy all data sets in a library or only some of them?

Kadz_sas1990
Obsidian | Level 7

let me explain more in detail.

 

I have a location where many sas dataset are present, let say production location X. These datasets are being used by my whole team. So they will copy the dataset from X to their location say Y. so I am writing a code to do this. so while doing this my code needs to mask the sensitive information in all dataset which ever copied. they will use my code to copy datasets. they will just define the source and destination in my code and run. so during copying the sensitive data should me masked. I already identified all the sensitive fields in all datasets which are need to be masked while copying. it contains Character and numeric types only. Thanks.

PeterClemmensen
Tourmaline | Level 20

I understand that. So will they copy one data set at the time or multiple?

Kadz_sas1990
Obsidian | Level 7
mostly one. its better to have solution for many.
PeterClemmensen
Tourmaline | Level 20

Ok. Pleae try that first and feel free to ask if you have questions.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 32 replies
  • 2210 views
  • 8 likes
  • 5 in conversation