03-06-2012 12:34 PM
We seem to have some people here who will go to any lengths to bypass any rules regarding data entry.
Someone has been entering invalid characters into a mainframe CICS based system from which we extract data and FTP it to a local server. The FTP is now failing because they have entered a "Carriage Return" as data into a text field (hex '25'). It seems that they are being clever and cutting-and-pasting data into this field and there is no field level data validation.
The easy way for me to fix this is to go beat that user about the head and shoulders and make them stop it right now! But, alas, I am limited to how much injury I am allowed to inflict.
Therefore, I will need to remove the offending characters in the code that reads the mainframe table and prepares the data for download.
Question - what is the most efficient way to remove the offending characters? The "Translate" function seems to be the best option, but I would like to remove ANY "non display" character and I have a boat load of data to process (7+ million really big records), so I don't want the batch job to run much longer than it does now.
I can limit what needs to be done to a handfull of text fields (about 120 bytes) out of a 337 byte record.
Any suggestions, or should I just go with brute force?
Thanks in advance.
03-06-2012 12:39 PM
It has been a long time since I dealt with FTP but it may be possible to set switches in your FTP code. I remember a similar issue when transferring "text" files that was fixed by changing a setting to use a "binary" transport option.
I would also document time used to fix this problem and let management know about wasted assets caused by folks not following prototcol assuming the entry procedures are documented.
03-06-2012 01:36 PM
I'm afraid this is approach is in the "brute force" arena ...
On the SAS side, you might set up a set of macro variables that contain allowable characters. Just a couple of examples:
%let caps_only = ABCDEFGHIJKLMNOPQRSTUVWXYZ;
%let caps_plus_digits = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789;
That would let you apply the COMPRESS function down the road:
var1 = compress(var1, "&caps_only", 'Keep');
var2 = compress(var2, "&caps_plus_digits", 'Keep');
It's clumsy, but at least it gives you a way to apply different rules to different fields. And unfortunately, it will take some CPU time.
03-06-2012 02:09 PM
OK, these's much more to the COMPRESS function than I realized. Sometimes I guess it pays to read the documentation before you post.
"Keep" is illegal as a third parameter. Use "K" instead. But all of these macro variables may not be necessary. Varying the third parameter may give you all that you need.
Don't forget to read about the "o" as well, for speeding up the processing.