BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kadz_sas1990
Obsidian | Level 7

Hi All,

I am having a code to mask the data in dataset while copying it to different location from source

libname FROM "source location";
libname TO "destination location";
proc format;
value $ cmask (default=12) other = 'ZZZZ';
value nmask (default=12) other = 'ZZZZ';
run;

proc datasets nolist lib=TO;
copy in=FROM out=TO memtype=data;
select dataset1 dataset2;
modify dataset1;
format column1 $cmask. column3 $cmask.;
modify dataset2;
format column3 $cmask. column5 $cmask.;
modify dataset3;
format phone_num $cmask. name $cmask. gender $cmask. dob nmask. mobile_no nmask. salary nmask.;
run;quit;

here i need to give every dataset name, column details that need to be masked in the SAS code itself(program becomes bulky). instead of doing this can I use excel sheet to my code to refer the dataset name and sensitive columns to be masked. so one common code to copy any dataset from one location to other. it should automatically refer the excel sheet i have and mask the required column while copying.

 

I am having Excel sheet like below, (Which can be edited according to needs)

 

current.JPG

here 1st column is the dataset name, 2nd column is the fields which I need to mask in the specified dataset and 3rd column is the type of the field(only two types of data I have in all datasets- Numeric and character).

 

so when iam copying the dataset from source to destination my code should refer this excel and it should check which dataset I am copying and correspondingly it should mask the fields(columns that listed as sensitive field in excel should be masked) and dataset should be copied after that to destination.

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

You define the tables and columns to change here:

 

data excel;
  infile datalines dsd;
  input dsname :$41. colname :$32.;
  datalines;
class,sex
class,age
class,name
sashelp.air,air
;

Column DSNAME contains either <libref>.<table name> or <table name> 

 

If you only pass in <table name> then the code will use WORK as the libref

Whatever you define here must actually exist (libref, table and column) without typo AND the library must be assigned. If it doesn't exist or the library is not assigned then the code won't fail but the SQL will not return any rows (as there was no match to what's in dictionary table dictionary.columns).

 

The SQL code and sample data for table EXCEL I've posted will pick-up SASHELP.AIR 

SASHELP.AIR is a SAS provided table that should exist in any SAS environment.

Capture.JPG

If you also execute the data step at the beginning in comment then the SQL will also pick-up WORK.CLASS

 

"what is the use of code gen file can you please explain your code."

The data _null_ step generates code using PUT statements writing this code to a file in WORK.

 

filename codegen temp;

 

Keyword TEMP in the filename statement makes this a temporary file in WORK.

 

Below statement then reads the generated code back into the SAS program so that it gets executed. The SOURCE2 option instructs SAS to write the %included code also to the SAS log.

%include codegen / source2;

With the code I've posted that's what you'll see in the SAS log.

105        %include codegen / source2;
NOTE: %INCLUDE (level 1) file CODEGEN is file <path and filename>
106       +data out.AIR ;
107       +  set SASHELP.AIR ;
108       +  AIR =9999;
109       +run;

And last but not least: You need to define a valid path for your environment to execute the code without error - or you use the libname definition in comment instead - but that's just for function testing and you shouldn't use it for your actual execution.

/* define library to store the tables with masked columns
   - do not overwrite the source tables
*/
%let outlib=out;
/*libname &outlib "%sysfunc(pathname(work))";*/
libname &outlib "<path name for output library"; 

 

View solution in original post

35 REPLIES 35
SASKiwi
PROC Star

Masking data with formats is not secure. Remove the formats and the original data will show. You need to replace the actual values. For character values the PUT function will do this. With numbers you could replace with 99999 or similar.

 

Another option would be just to drop sensitive columns entirely. They aren't useful with rubbish data in them anyway.

Kadz_sas1990
Obsidian | Level 7

I had imported the excel, I got struck at creating macros. I imported my excel using below code

PROC IMPORT OUT=WORK.sensitive
           DATAFILE="file location"
           DBMS=EXCEL REPLACE;
Proc print data=WORK.sensitive;

current.JPG 

this is how the dataset WORK.sensitive looks like.

 

how to use this in macro to mask my required columns, can anyone help please..

Patrick
Opal | Level 21

@Kadz_sas1990 

Just applying a format is NOT data masking. The actual values are still in the table and not hard to extract.

 

May be fully explain us what you need to do so we can give you some guidance on how you can get actual data masking.

 

Also: Do you have the SAS Federation Server licensed?

 

To help you with coding (us providing actual code): Please post sample data in easy to use form (like via SAS data steps).

Kadz_sas1990
Obsidian | Level 7
just format is enough in my case, kind of hiding.
yes my sas EG is licensed version. we are officially using it in our office legally.
Kadz_sas1990
Obsidian | Level 7
my complete thread is same sample of what I need. let me explain again. I have many datasets in one location in one server. my team mates will copy these datasets for there use(just for some testing purpose actually they wont use the some fields are values in it) what I am trying to do is creating a script which will copy the dataset from one loc to another while doing that my script should hide some columns or fields in the copying dataset.(this some customer sensitive info, they actually wont need it)
andreas_lds
Jade | Level 19

@Kadz_sas1990 wrote:
my complete thread is same sample of what I need. let me explain again. I have many datasets in one location in one server. my team mates will copy these datasets for there use(just for some testing purpose actually they wont use the some fields are values in it) what I am trying to do is creating a script which will copy the dataset from one loc to another while doing that my script should hide some columns or fields in the copying dataset.(this some customer sensitive info, they actually wont need it)

 

You don't plan to use a format to hide sensitive information, don't you?

And if they don't need the variables, why wasting time to "hide" the contents? Dropping the variables requires less code and is faster.

Kadz_sas1990
Obsidian | Level 7
does dropping the variable cause any changes in column details, I mean whether it impact dataset structure. I need some common values to filled in those columns that is my actual need. instead of format we can replace the values in those sensitive column. like IF &Var1 = variablename Then
if &type1 = "NUMERIC" then
variable = 9999
else if &type1 = "CHARACTER" then variable = "Test"||_n_;
andreas_lds
Jade | Level 19

@Kadz_sas1990 wrote:
does dropping the variable cause any changes in column details, I mean whether it impact dataset structure. I need some common values to filled in those columns that is my actual need. instead of format we can replace the values in those sensitive column. like IF &Var1 = variablename Then
if &type1 = "NUMERIC" then
variable = 9999
else if &type1 = "CHARACTER" then variable = "Test"||_n_;

Of course, if you drop a variable it is not in the dataset and no information is stored in the metadata-part of the dataset about the dropped variables.

 

You really should post one of your datasets and the control information (your excel file) as data steps using datalines-statement, so that we have something to work with.

Kadz_sas1990
Obsidian | Level 7
I have collected all the sensitive fields or columns in all dataset and having it handy in a form of excel. my excel have 3 columns in it. 1st column is dataset name, 2nd column is sensitive fields that needs to be masked in that corresponding dataset, 3rd column is the type of the each sensitive field. so I need to use this excel sheet while copying the dataset so that corresponding fields should be hided or formatted while copying
Patrick
Opal | Level 21

Besides of anything else: 

The custom format you create in your code gets stored in a catalog in your session specific WORK. You then want to assign this format permanently to some variables.

The moment you're using such data in another SAS session where the formats don't exist you either get an error or if option NOFMTERR is set you'll see the unformatted values (these are your colleagues). 

 

Using EG: You can click on a cell in a table grid and select (switch to edit mode). Edit mode then also shows the unformatted values.

 

So.... What you want to do is just a bad idea. 

 

BTW: I've been asking if you've got the SAS Federation Server licensed (not EG). This is a SAS server module which allows you to mask data "on-the-fly".

Kadz_sas1990
Obsidian | Level 7
can you give a solution to do this
Patrick
Opal | Level 21

@Kadz_sas1990 wrote:
can you give a solution to do this

Ok, here you go. Please provide next time sample data in an easy to use form (=not as a screen shot, ideally as a SAS data step like done in the code below).

/*data class;*/
/*  set sashelp.class;*/
/*run;*/

data excel;
  infile datalines dsd;
  input dsname :$41. colname :$32.;
  datalines;
class,sex
class,age
class,name
sashelp.air,air
;

/* list of all tables/cols that actually exist in a currently assigned library */
proc sql;
  create table colsToMask as
  select c.libname, c.memname, c.name, c.type, c.length as len
  from
    (  
      select
        upcase(scan(cats('WORK.',dsname),-2,'.')) as libname length=8,
        upcase(scan(dsname,-1,'.')) as memname length=32,
        upcase(colname) as name length=32
      from excel
    ) e
    ,
    dictionary.columns c
   where 
    e.libname=c.libname 
    and e.memname=c.memname 
    and e.name=upcase(c.name)
    and c.memtype='DATA'
  order by c.libname, c.memname
  ;
quit;

/* define library to store the tables with masked columns
   - do not overwrite the source tables
*/
%let outlib=out;
/*libname &outlib "%sysfunc(pathname(work))";*/
libname &outlib "<path name for output library";

/* generate code for masking */
filename codegen temp;
data _null_;
/*  file print;*/
  file codegen;
  set colsToMask;
  by libname memname;
  if first.memname then
    do;
      put 
        "data &outlib.." memname ';' /
        '  set ' libname +(-1) '.' memname ';'
        ;
    end;
  if type='num' then 
    put '  ' name '=9999;'; 
  else 
    do;
      _replace=repeat('Z',min(len-1,10));
      put 
        @3 name '="' _replace +(-1)'" ;' 
        ; 
    end;
  if last.memname then
    put 'run;' ;
run;

/* ensure that there is always a codegen file */
data _null_;
  file codegen mod;
run;

/* execute code for masking */
%include codegen / source2;

 

Kadz_sas1990
Obsidian | Level 7
what is the use of code gen file can you please explain your code.
Kadz_sas1990
Obsidian | Level 7
I tried the above code, there is no rows in coltomask table which created

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 35 replies
  • 1620 views
  • 3 likes
  • 4 in conversation