BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickb
Calcite | Level 5

I'm working on parsing a text file and I need to find a way to look at the last value in a string, and if it is numeric.  The file is large but the data of interest will have the same position.

below is an example of the data.  When I copied and pasted it the alignment got out of whack the goal is to look at the last value and check to see if it is numeric.  I don't want the record that has _________ in the line.  I will use this check in my IF statement.

Data example:

C) D.C.CIRCUITS

          EER-126 D.C. CIRCUITS.............. 98/WI                 A                    2.67

      N) FUNDAMENTAL MECHANICAL SYSTEMS

          EGR-100 FUNDMNTL MECHNCL SKILLS______________________ 1 course needed

      C) ROBOTICS IN CIM SYSTEMS

          EGR-128 ROBOTICS IN CIM SYSTEMS.... 99/SP    A                        2

      C) COMPUTER PROGRAMMING APPLICATIONS IN EGR TECHNOLOGY

          IET-198 COMP PRGM APP IN ENG TECH.. 00/SP    B                   1.33

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I like the use of scan and -1 to get the last word.  But the ANYthis and ANYthat functions are tricky to determine if they have accounted for all the possibilities that might be numeric.  Should -.5 be numeric?  How about scientific notation, such as 314E-2?  Should #5 be character?  If the answer is Yes, here is another way to examine the last word:

if input(scan(line2,-1), ??16.) > .;

It examines the first 16 characters of the last word, and tries to read them to a number.  If it can, the IF condition is true.  If it can't, the ?? will suppress messages about invalid numeric data (and the IF condition will be false).  And if the last word is shorter than 16 characters, that doesn't present a problem.

View solution in original post

26 REPLIES 26
LarryWorley
Fluorite | Level 6

Nick,

Check out the anydigit string function is sas.  To get it to work, you might need to use other functions to check what you need.  For example,  if you expect the number in the last column of the input string, this would work:

if anydigit(in_string,length(in_string)) > 0 then do whatever:

This works in 9.2 and 9.3.  Here is link: http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0gq8bwobt4...

Larry

PGStats
Opal | Level 21

Use scan(x,-1) to get the last field and anyalpha to detect non-numbers :

data have;

length line1 line2 $128;

infile datalines truncover;

input line1 char. / line2 char.;

if not anyalpha(scan(line2,-1));

datalines;

C) D.C.CIRCUITS

          EER-126 D.C. CIRCUITS.............. 98/WI                 A                    2.67

      N) FUNDAMENTAL MECHANICAL SYSTEMS

          EGR-100 FUNDMNTL MECHNCL SKILLS______________________ 1 course needed

      C) ROBOTICS IN CIM SYSTEMS

          EGR-128 ROBOTICS IN CIM SYSTEMS.... 99/SP    A                        2

      C) COMPUTER PROGRAMMING APPLICATIONS IN EGR TECHNOLOGY

          IET-198 COMP PRGM APP IN ENG TECH.. 00/SP    B                   1.33

;

PG

PG
Astounding
PROC Star

I like the use of scan and -1 to get the last word.  But the ANYthis and ANYthat functions are tricky to determine if they have accounted for all the possibilities that might be numeric.  Should -.5 be numeric?  How about scientific notation, such as 314E-2?  Should #5 be character?  If the answer is Yes, here is another way to examine the last word:

if input(scan(line2,-1), ??16.) > .;

It examines the first 16 characters of the last word, and tries to read them to a number.  If it can, the IF condition is true.  If it can't, the ?? will suppress messages about invalid numeric data (and the IF condition will be false).  And if the last word is shorter than 16 characters, that doesn't present a problem.

PGStats
Opal | Level 21

That's indeed better. Better still : change the > . to not missing().

Cheers!

PG

PG
Haikuo
Onyx | Level 15

     Echo with Astounding's sentiment, since scan() has bunch of default delimiters, such as . < ! - etc, it is better for us to explicitly spell out the wanted delimiter ' '(blank):

scan(line2,-1, ' ')

so as something like "T-2" or "hai.2" would not be taken as numbers.

Haikuo

PGStats
Opal | Level 21

Further tweaking... short of Regex :

if not missing(input(scan(line2,-1,":=","s"),??16.));

PG

PG
Linlin
Lapis Lazuli | Level 10

Hi PG,

I am lazy to look up online. What  do

":=","s" do in

scan(line2,-1,":=","s"),??16.)?

Thank you!

PGStats
Opal | Level 21

It adds the characters '=' and ':' to spacing characters (space, tab, etc.) as valid SCAN separators. I figured colon and equal characters could appear before a number.

PG

PG
Linlin
Lapis Lazuli | Level 10

Thank you!

When you say spacing characters do you mean "delimiters"?

Would you please show me an example?:smileysilly:

PGStats
Opal | Level 21

Otherwise called word dividers, interword separators, whitespace... From the SCAN documentation :

s or S adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed).

So that lines ending with :

A     125

A=125

A:125

A 125

will be treated appropriately.

PG

PG
Linlin
Lapis Lazuli | Level 10

Thank you PG!Smiley Happy

After running the code below,  I know why you added ":=","s". Sometime my mind doesn't work at all.Smiley Sad

data have1;

input line2 $1-20;

If not missing(input(scan(line2,-1,":=","s"),??16.));

cards;

asdfgfhrfren  1

vdsjwheffffffff

dnfejfudsdhjn:2

vfdgriegrgggg=3

;

proc print;run;

data have2;

input line2 $1-20;

If not missing(input(scan(line2,-1),??16.));

cards;

asdfgfhrfren  1

vdsjwheffffffff

dnfejfudsdhjn:2

vfdgriegrgggg=3

;

proc print;run;

nickb
Calcite | Level 5

This worked great.  Thanks for the reply.

Nick

nickb
Calcite | Level 5

I thought this solution would work until I found a few records that are being omitted.  This is a line that I want.

INT-270 INT INTERNSHIP............. 09/SUB   A  3.33  *F
robertrao
Quartz | Level 8

Hi,

Could you please explain me this concept?

"It examines the first 16 characters of the last word"

what is the meaning of last word??

in the example below what if I have more than 16 letters in the word???

asdfgfhrfren  1

vdsjwheffffffff

dnfejfudsdhjn:2

vfdgriegrgggg=3

Thanks

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 26 replies
  • 2699 views
  • 6 likes
  • 8 in conversation