BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kadz_sas1990
Obsidian | Level 7
there it will copy all the 3 datasets right data1, data2, data3?
Kadz_sas1990
Obsidian | Level 7
the script should be packed with masking mechanism for all dataset sensitive field, but it should copy only the requested datasets with masked fields in it.
Kadz_sas1990
Obsidian | Level 7

can you explain this

proc datasets nolist lib=to; ( HERE lib=to "to" is the destination path right?) 

 

copy in=work out=to memtype=data; (HERE copy in=work out=to what is "work" and "to" again the source and destination?)

 

select data1 data2 data3; (what is the use of select line here this the place where I need to choose which dataset I need to copy for eg:if I give "select data1" it will copy only the data1? or else it will throw error)

 

modify data1;(these modify statement will work individually or it will work will select statement i.e. if I give "select data1;" alone and give "modify data2" in my code will it work??)

...

modify data2;

...

PeterClemmensen
Tourmaline | Level 20

Yes, 'to' is the destination path. In my example it is the libname that I specify above. 

 

'Work' is the libname of the library in which your data is stored. In my example, my data sets are in the temporary work library, so change this to the library in which your data is stored.

 

Yes exactly. The Select Statement lists the data sets you want to copy. If you omit this statement, SAS will copy all data sets from the library.

 

The Modify Statement lists the individual data set you want to edit. For example this

 

   modify data1;
       format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;

means that we want to modify the data1 data set (in the to library). The Format Statement then formats the listed variables in data1.

Kadz_sas1990
Obsidian | Level 7
Fantastic got the thing what I need...Thanks a lot draycut.. 🙂 will need you if any doubt persist while implementing. Thanks Again.
Kadz_sas1990
Obsidian | Level 7

Hi Draycut,

I tried that code for use but now I found that my script is too long and problem here is

1) I have more than 500+ datasets in one place where I am going to use this code so the modify statement I need to right 500+ and many fields in 50+ datasets are need to be masked

2) if write a code for all those dataset while running mostly I will copy one or two dataset, so in select I will give those dataset names and for remaining all modify statement I will get error messages

proc datasets nolist lib=TO;
   copy in=FROM out=TO memtype=data;
      select dataset1 dataset2;
   modify dataset1;
       format column1 $cmask. column3 $cmask.;
   modify dataset2;
       format column3 $cmask. column5 $cmask.;
   modify dataset3;
       format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
modify dataset4;
       format ........
modify dataset5;
       format ........
......
.....
modify dataset500;
       format ........
run;quit;

my code looks same like above,

so in above code I will get errors from dataset3 ... dataset500, since I used only dataset1 and dataset2 in select statement.(around 498 error messages)

3) when ever I need to add some fields as sensitive fields I need to modify the code. instead of doing this I already have an excel sheet

 

I have excel sheet like this

current.JPG

Kadz_sas1990
Obsidian | Level 7
so can I import this excel sheet in my code and how it can be referred, like if dataset1 means column 1, column2 and column3 should be masked, similarly dataset2 means column 3,5,7 should be masked....so it is easy for me to update the excel instead of updating the code...can you help on this. 🙂
Kadz_sas1990
Obsidian | Level 7
my condition is to copy any one or two at time..
Kurt_Bremser
Super User

Your "phone number" seems to be a number, as it is displayed right-justified, so you can only replace it with a number, not a string.

Do you need to mask the values, or do you need to anonymize them in a way so that relations are kept intact? If the later, you need to create a lookup table from all existing values so that you replace consistently.

 

As a side note, do not store real telephone numbers as numbers, store them as character. Phone numbers can have leading zeroes or special characters ("+" for international), and with extensions they can easily exceed the available precision of numbers in SAS (and other software that uses 8-byte real storage for numbers).

Kadz_sas1990
Obsidian | Level 7

The phone number field is just an example and I have many other fields like name, address, relationship status and all personal information of my clients which I need to be masked while copying the dataset to other users in my team. I don't want to show these details to them and they wont need these fields as well. so while copying the dataset to them I need to give a copy of the SAS dataset in which the sensitive fields are masked or replaced. this is my actual need.

 

Thanks for your help in this.

Reeza
Super User

Maybe this will be useful? Especially if you want to maintain integrity of the data, ie Phone Number XYZ is the same everywhere its encountered. 

 

https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 32 replies
  • 2213 views
  • 8 likes
  • 5 in conversation