DATA Step, Macro, Functions and more

Posts: 1,301

Hi,

I have recieved some positive feedback and gained some great insights from posting these 'puzzles' on here.  This one is definitly much more complex than my previous postings and will take real forethought and skill to accomplish.  The subject of crytography has always been interesting to me.

This post yesterday: http://communities.sas.com/message/106240#106240 got me thinking about data masking and security.  Say I had a set of data where I wanted to mask the data in storage but still enable the usage of said data as plain text.  For instance a file containing a username and password.  I first thought of some classic ciphers: Ceaser, Bifid came to mind, but this has already been done:

*** Try to not look at the methods outlined in this paper so as to come up with your own method

Data Masking with Classical Ciphers by Murphy Choy, University College Dublin http://support.sas.com/resources/papers/proceedings10/108-2010.pdf

So what about creating a method for implementing another classic cipher, the solitaire cipher: http://en.wikipedia.org/wiki/Solitaire_(cipher)

Or try the playfair cipher (probably the a easier choice): http://en.wikipedia.org/wiki/Playfair_cipher

There are plenty of options to choose from, I just listed ones I already have a certain level of familiarity with.

I look forward to seeing the unique solutions the members here come up with.

Super User
Posts: 10,041

FriedEgg.I think your question refer to a big scope.It needs more time to read and understand these source you offered, and of course ,need more time to consider and code it.

Posts: 1,301

KSharp, this is most definitly a challenge.  Espeically if you are not already versed in crytography and ciphers.  This may help put you onto a starting path:

The following is a implementation of the solitaire cipher in python:

 `01` `from` `itertools ``import` `chain, ifilter, imap, izip`
 `02`
 `03` `A ``=` `53`
 `04` `B ``=` `54`
 `05` `N_CARDS ``=` `54`
 `06` `ORD_A ``=` `ord``(``'A'``)`
 `07`
 `08` `class` `Deck(``list``):`
 `09` `    ``def` `__init__(``self``, key``=``''):`
 `10` `        ``list``.__init__(``self``, ``range``(``1``,``55``))`
 `11`
 `12` `        ``self``.step()`
 `13` `        ``for` `ch ``in` `key:`
 `14` `            ``self``.counted_cut(``ord``(ch) ``-` `ORD_A ``+` `1``)`
 `15` `            ``self``.step()`
 `16`
 `17` `    ``def` `move(``self``, card, distance):`
 `18` `        ``fm ``=` `self``.index(card)`
 `19` `        ``to ``=` `fm ``+` `distance`
 `20` `        ``if` `to >``=` `N_CARDS:`
 `21` `            ``to ``-``=` `(N_CARDS``-``1``)`
 `22`
 `23` `        ``self``.insert(to, ``self``.pop(fm))`
 `24`
 `25` `    ``def` `triple_cut(``self``):`
 `26` `        ``a ``=` `self``.index(A)`
 `27` `        ``b ``=` `self``.index(B)`
 `28` `        ``s, e ``=` `(a, b``+``1``) ``if` `a < b ``else` `(b, a``+``1``)`
 `29` `        ``self``[:] ``=` `self``[e:] ``+` `self``[s:e] ``+` `self``[:s]`
 `30`
 `31` `    ``def` `counted_cut(``self``, count):`
 `32` `        ``self``[:] ``=` `self``[count:``-``1``] ``+` `self``[:count] ``+` `self``[``-``1``:]`
 `33`
 `34` `    ``def` `step(``self``):`
 `35` `        ``self``.move(A, ``1``)`
 `36` `        ``self``.move(B, ``2``)`
 `37` `        ``self``.triple_cut()`
 `38` `        ``self``.counted_cut(``min``(``self``[``-``1``], A))`
 `39`
 `40` `    ``def` `__iter__(``self``):`
 `41` `        ``while` `True``:`
 `42` `            ``ndx ``=` `min``(``self``[``0``], A)`
 `43`
 `44` `            ``if` `self``[ndx] < A:`
 `45` `                ``yield` `self``[ndx]`
 `46`
 `47` `            ``self``.step()`
 `48`
 `49` `def` `encrypt(text, key``=``''):`
 `50` `    ``encode ``=` `lambda` `c,k: ``chr``((``ord``(c) ``-` `ORD_A ``+` `k)``%``26` `+` `ORD_A)`
 `51`
 `52` `    ``padded_text ``=` `chain(text, [``'X'``]``*``(``-``len``(text)``%``5``))`
 `53`
 `54` `    ``cyphertext ``=`  `imap(encode, padded_text, Deck(key))`
 `55`
 `56` `    ``groups ``=` `imap(''.join, izip(``*``[cyphertext]``*``5``))`
 `57`
 `58` `    ``return` `' '``.join(groups)`
 `59`
 `60` `def` `decrypt(text, key``=``''):`
 `61` `    ``decode ``=` `lambda` `c,k: ``chr``((``ord``(c) ``-` `ORD_A ``-` `k)``%``26` `+` `ORD_A)`
 `62`
 `63` `    ``letters ``=` `(c ``for` `c ``in` `text ``if` `c.isupper())`
 `64`
 `65` `    ``return` `''.join(imap(decode, letters, Deck(key)))`
Posts: 1,301

The following probably will not be helpful unless you know perl, but here is an implementation of the playfair cipher:

\$,=42;for(34,0,-3,9,-11,11,-17,7,-5){\$*.=pack'c'=>\$,+=\$_}for(reverse split//=>\$*

){\$%++?\$ %%2?push@C,\$_,\$"ush@c,\$_,\$"push@C,\$_,\$")&&push@c,\$"}\$C[\$#C]=\$/;(\$#C

>\$#c)?(\$ c=\@C)&&(\$ C=\@c)\$ c=\@c)&&(\$C=\@C);\$%=\$|;for(@\$c){print\$_^\$\$C[\$%++]}

Super User
Posts: 10,041

Yes. I am not familiar with crytography and ciphers. So I am afraid to have enough time to study these and have some more patient to consider. It looks like needing much more time.

Super User
Posts: 5,433

I don't eat ciphers for breakfast either ...

But I think it would be interesting to have this discussion relate to existing data masking and cryptography tecniques in SAS, such as MD5, pwencode, encrypt= data set option and the offering in SAS Secure and others.

/Linus

Data never sleeps
Posts: 1,301

Hi LinusH,

Here is a very quick writeup on the differences between the methods you have mentioned.

The MD5 function as part of base SAS:

-- MD5 is a one-way function.  A given string of any length is passed to the hashing function which produces a unique string of 32 characters in return.  The encrypted string cannot be reversed to produce the original string.  SAS/Secure offers some additional hashing algorithms such as sha-1.

The PWENCODE Procedure:

--The pwencode procedure available in SAS has three available encryption methods (2 with base SAS and 1 additional with SAS/Secure).  The first is a basic base64 encoding, the second is a 32bit encryption that SAS reports to be a proprietary methodology and the third uses a 256bit variant of AES

--The pwencode procedure basically produces an encrypted variable that the macro interpreted will decrypt to plain text.  It is useful in situations where for example you wish to store an encrypted password for a connect statement.  Basically the task at hand is a method of producing your own version of this functionality with a different methodology.

The encrypt dataset option:

--This is an option through which you can choose to obsfucate the data enables methods by which to secure the information stored within a dataset by using a variety of methods of encryption that utilize a provided key (password) to decrypt.  In base SAS the method utilized is a SAS proprietary algorith and SAS/Secure adds additional industry standards such as RC4, DES, AES and more.  This in my experience is a useful technique when sharing datasets between companies over the internet where the information stored in sensitive.  It provides an additional layer of security to the data layer itself.

There are of course a lot more to each one of these items mentioned.  There are also a few other encryption related topics in SAS that have not been mentioned here but also do not necessarily pertain to the topic at hand.

--This is a good document which outlines a lot about the methods mentioned here and pwencode procedure.

http://support.sas.com/documentation/cdl/en/secref/62092/PDF/default/secref.pdf

As a final thought to tie this back to the original topic.  By implementing your own cipher what do you gain?  Well all of these methods provided by SAS are great and work as expected.  The one issue I have with the pwencode option is as follows:  Anyone with SAS can decrypt the string, simply, and the outputs are easily identifiable to methodolgy used.  For example I will share a encrpyted string use the procedure here:

{sas002}F1CB951853BCC0632380806E1603DC310868278F

Now you can take that string and even without me telling you how I produced it, pretty easily identify how to decrpyt utilize it to exploit what it represents.

If however I were to use a non-standard cipher, the method itself by which to utilize the string becomes part of the security it offers.  So with a self implemented cipher unless you share the method of encrpytion and/or the code with which to decrypt the message it is to some degree safer from being missused outside the originating codebase or at least this is my thinking at the moment.

Really I post this just for fun.  Implementing a cipher is a nice exercise in both computing and mathmatics.

Posts: 1,301

Re: Data masking. Implement a cipher.

Here is a functional implementation of the simplest cipher I know, the Caesar cipher.  Note that this code is very similar to the macro implementation available in the pdf I linked in the original post.  I am still continuing work on additional implementations of more complex algorithms.

procfcmp outlib=work.func.cipher;

function caesar(var \$, shift, mode \$) \$;

lengthcipher \$26;

alphabet='ABCDEFGHIJKLMNOPQRSTUVWXYZ';

cipher=substr(alphabet,shift+1);

array c[100]\$ c1-c100;

do i=1to shift;

c=substr(original,i,1);

end;

cipher=cats(of cipher c1-c100);

ifupcase(mode) in ('E','ENCRYPT')then

do;

encrypt=translate(strip(var),strip(alphabet),strip(cipher));

return(encrypt);

end;

elseif upcase(mode) in ('D','DECRYPT')then

do;

decrypt=translate(strip(var),cipher,alphabet);

return(decrypt);

end;

elsereturn('***ERROR***');

endsub;

quit;

%letcmplib=%sysfunc(getoption(cmplib));

optionscmplib=(work.func &cmplib);

datatest;

inputt \$;

e=caesar(t,5,'e');

d=caesar(e,5,'d');

cards;

TEST

REST

BEST

;

run;

options cmplib=(&cmplib);

Occasional Contributor
Posts: 7

FriedEgg,

Keep posting these kinds of puzzles and solutions.

These are real brain teasers - out of the day to day SAS stuff we do.

Thanks,

Chendhil

Frequent Contributor
Posts: 94

Re: Data masking. Implement a cipher.

First, let me echo what's been said already a few times - I really like these puzzles you've been posting.  It's been far too long since I challenged myself to do coding like this!

This code needs some some clearing up, but I left much of the working in using seperate variables so that it's a bit easier to follow.

This implements the Playfair cypher that you linked, both to encrypt and decrypt.  I have not added a reversal for the additional "X" insertions on duplicate characters as I don't believe this can be done programatically in a way that's completely reliable.

Following wikipedia's example of encrypting "Hide the gold in the tree stump" I also get the matching string "BMODZBXDNABEKUDMUIXMMOUVIF".  The same then successfully decrypted, given the above limitation.

It's written in SAS 9.1.3, although I don't think I've used anything that's different in later versions.

%let keyword = 'playfair example';

%let plaintext = 'Hide the gold in the tree stump';

%let cyphertext_length = %eval(2*(%length(&plaintext.)));

%put &cyphertext_length.;

data cypher;

/*covert keyword to uppercase, append alphabet, and remove all non uppercase letter characters*/

keyword = translate(compress(upcase("&keyword." || 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),,'kU'),'I','J');

/*output;*/

do i = 1 to length(keyword)-1;

substr(keyword,i+1) = compress(substr(keyword,i+1),substr(keyword,i,1));

/*    output;*/

end;

format key1-key25 \$1.;

array key {25} key1-key25;

do i = 1 to 25;

key{i} = substr(keyword,i,1);

end;

drop i;

run;

data encrypt;

set cypher;

format plaintext  \$&cyphertext_length..;

format cyphertext \$&cyphertext_length..;

plaintext = compress(upcase("&plaintext."),,'kU');

if _n_ > 1 then stop;

array key {0:4,0:4} key1-key25;

/*scan for pairs*/

i = 1;

do until (1=0);

if i = length(plaintext) then do; /*last character, and has no pair*/

plaintext=cats(plaintext,"X");

end;

if (i >= length(plaintext)) then leave;

if substr(plaintext,i,1) = substr(plaintext,i+1,1) then do;

plaintext = cats(substr(plaintext,1,i),"X",substr(plaintext,i+1));

end;

i = i + 2;

end;

do i = 1 to length(plaintext) by 2; /*can only have an even number of characters now*/

c1 = substr(plaintext,i,1);

c2 = substr(plaintext,i+1,1);

pos1 = whichc(substr(plaintext,i,1), of key

• ) - 1;
•     row1 = int(pos1/5);

col1 = mod(pos1,5);

test1 = key[row1,col1];

pos2 = whichc(substr(plaintext,i+1,1), of key

• ) - 1;
•     row2 = int(pos2/5);

col2 = mod(pos2,5);

test2 = key[row2,col2];

if row1 = row2 then do; /*same row*/

newrow1 = row1;

newrow2 = row2;

newcol1 = mod(col1+1,5); /*move one to the right*/

newcol2 = mod(col2+1,5);

end;

else if col1 = col2 then do; /*same column*/

newrow1 = mod(row1+1,5);

newrow2 = mod(row2+1,5);

newcol1 = col1;

newcol2 = col2;

end;

else do; /*different row and column*/

newrow1 = row1; /*keep rows*/

newrow2 = row2;

newcol1 = col2; /*swap columns*/

newcol2 = col1;

end;

newc1 = key[newrow1,newcol1];

newc2 = key[newrow2,newcol2];

substr(cyphertext,i,2) = cats(key[newrow1,newcol1],key[newrow2,newcol2]);

/*    output;*/

end;

keep keyword plaintext cyphertext key1-key25;

run;

data decrypt;

set encrypt (keep = key1-key25 keyword cyphertext);

format plaintext  \$&cyphertext_length..;

array key {0:4,0:4} key1-key25;

do i = 1 to length(cyphertext) by 2; /*can only have an even number of characters now*/

c1 = substr(cyphertext,i,1);

c2 = substr(cyphertext,i+1,1);

pos1 = whichc(substr(cyphertext,i,1), of key

• ) - 1;
•     row1 = int(pos1/5);

col1 = mod(pos1,5);

test1 = key[row1,col1];

pos2 = whichc(substr(cyphertext,i+1,1), of key

• ) - 1;
•     row2 = int(pos2/5);

col2 = mod(pos2,5);

test2 = key[row2,col2];

if row1 = row2 then do; /*same row*/

newrow1 = row1;

newrow2 = row2;

newcol1 = mod(col1+4,5); /*move one to the left*/

newcol2 = mod(col2+4,5);

end;

else if col1 = col2 then do; /*same column*/

newrow1 = mod(row1+4,5);

newrow2 = mod(row2+4,5);

newcol1 = col1;

newcol2 = col2;

end;

else do; /*different row and column*/

newrow1 = row1; /*keep rows*/

newrow2 = row2;

newcol1 = col2; /*swap columns*/

newcol2 = col1;

end;

newc1 = key[newrow1,newcol1];

newc2 = key[newrow2,newcol2];

substr(plaintext,i,2) = cats(key[newrow1,newcol1],key[newrow2,newcol2]);

/*    output;*/

end;

keep keyword plaintext cyphertext;

run;

Posts: 1,301

Nice work DF and thank you for your comment.  Thank you Chendhil as well.  I am glad people are enjoying these posts.

Posts: 1,301

Re: Data masking. Implement a cipher.

Here is another implementation of a cipher using proc fcmp.  I have it requiring two generated formats right now as well, but would like to change it to not require them later.  I also would want to add the ability to encrypt strings that contain multiple words, right now it will not work properly with spaces in the string.  One thing to note that this function uses is the nice feature of proc fcmp, the dynamic_array.  Too bad it only works for number arrays and not character also.

data v1;

alphabet='ABCDEFGHIJKLMNOPQRSTUVWXYZ';

retain fmtname '\$vigenere';

do label=0 to length(alphabet)-1;

start=substr(alphabet,label+1,1);

output;

end;

drop alphabet;

run;

data v2;

alphabet='ABCDEFGHIJKLMNOPQRSTUVWXYZ';

retain fmtname 'vigenere';

do start=0 to length(alphabet)-1;

label=substr(alphabet,start+1,1);

output;

end;

drop alphabet;

run;

data _null_;

do i=1 to 2;

call execute('proc format library=work cntlin=v'|| strip(i) || '; run;');

end;

run;

proc fcmp outlib=work.func.cipher;

function vigenere(string \$, mode \$, okey \$) \$32000;

length key want \$32000;

/* construct key */

key=okey;

do until(length(key) ge length(string));

key=cats(of key key);

end;

array k[1] /nosymbols;

dims=length(string);

call dynamic_array(k,dims);

do i=1to dims;

k=put(substr(key,i,1),\$vigenere.)*1;

end;

/* split input string*/

array s[1] /nosymbols;

array c[1] /nosymbols;

call dynamic_array(s,dims);

call dynamic_array(c,dims);

do ii=1to dims;

s[ii]=put(substr(string,ii,1),\$vigenere.)*1;

/* calculate encipheredletters */

if upcase(mode) in ('E','ENCRYPT')then

c[ii]=mod((k[ii]+s[ii]),26);

else if upcase(mode) in ('D','DECRYPT')then

c[ii]=ifn(mod((s[ii]-k[ii]),26)<0,mod((s[ii]-k[ii]),26)+26,mod((s[ii]-k[ii]),26));

end;

want='';

do j=1 to dims;

want=strip(want) || strip(put(c,vigenere.));

end;

return(want);

endsub;

run;

%let cmplib=%sysfunc(getoption(cmplib));

options cmplib=(work.func &cmplib);

data test;

input t \$;

e1=vigenere(t,'e','FRIEDEGG');

d1=vigenere(e1,'d','FRIEDEGG');

cards;

TEST

REST

BEST

;

run;

options cmplib=(&cmplib);

EDIT:  By the way, I forgot to mention, this is the Vigenère cipher: http://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher

Discussion stats
• 11 replies
• 2844 views
• 8 likes
• 5 in conversation