BookmarkSubscribeRSS Feed
OS2Rules
Obsidian | Level 7

Hi All:

We seem to have some people here who will go to any lengths to bypass any rules regarding data entry. 

Someone has been entering invalid characters into a mainframe CICS based system from which we extract data and FTP it to a local server.  The FTP is now failing because they have entered a "Carriage Return" as data into a text field (hex '25').  It seems that they are being clever and cutting-and-pasting data into this field and there is no field level data validation.

The easy way for me to fix this is to go beat that user about the head and shoulders and make them stop it right now!  But, alas, I am limited to how much injury I am allowed to inflict.

Therefore, I will need to remove the offending characters in the code that reads the mainframe table and prepares the data for download.

Question - what is the most efficient way to remove the offending characters?  The "Translate" function seems to be the best option, but I would like to remove ANY "non display" character and I have a boat load of data to process (7+ million really big records), so I don't want the batch job to run much longer than it does now.

I can limit what needs to be done to a handfull of text fields (about 120 bytes) out of a 337 byte record.

Any suggestions, or should I just go with brute force?

Thanks in advance.

3 REPLIES 3
ballardw
Super User

It has been a long time since I dealt with FTP but it may be possible to set switches in your FTP code. I remember a similar issue when transferring "text" files that was fixed by changing a setting to use a "binary" transport option.

I would also document time used to fix this problem and let management know about wasted assets caused by folks not following prototcol assuming the entry procedures are documented.

Astounding
PROC Star

I'm afraid this is approach is in the "brute force" arena ...

On the SAS side, you might set up a set of macro variables that contain allowable characters.  Just a couple of examples:

%let caps_only = ABCDEFGHIJKLMNOPQRSTUVWXYZ;

%let caps_plus_digits = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789;

That would let you apply the COMPRESS function down the road:

var1 = compress(var1, "&caps_only", 'Keep');

var2 = compress(var2, "&caps_plus_digits", 'Keep');

It's clumsy, but at least it gives you a way to apply different rules to different fields.  And unfortunately, it will take some CPU time.

Astounding
PROC Star

OK, these's much more to the COMPRESS function than I realized.  Sometimes I guess it pays to read the documentation before you post.

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

"Keep" is illegal as a third parameter.  Use "K" instead.  But all of these macro variables may not be necessary.  Varying the third parameter may give you all that you need.

Don't forget to read about the "o" as well, for speeding up the processing.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 635 views
  • 0 likes
  • 3 in conversation