I'm working on parsing a text file and I need to find a way to look at the last value in a string, and if it is numeric. The file is large but the data of interest will have the same position.
below is an example of the data. When I copied and pasted it the alignment got out of whack the goal is to look at the last value and check to see if it is numeric. I don't want the record that has _________ in the line. I will use this check in my IF statement.
Data example:
C) D.C.CIRCUITS
EER-126 D.C. CIRCUITS.............. 98/WI A 2.67
N) FUNDAMENTAL MECHANICAL SYSTEMS
EGR-100 FUNDMNTL MECHNCL SKILLS______________________ 1 course needed
C) ROBOTICS IN CIM SYSTEMS
EGR-128 ROBOTICS IN CIM SYSTEMS.... 99/SP A 2
C) COMPUTER PROGRAMMING APPLICATIONS IN EGR TECHNOLOGY
IET-198 COMP PRGM APP IN ENG TECH.. 00/SP B 1.33
I like the use of scan and -1 to get the last word. But the ANYthis and ANYthat functions are tricky to determine if they have accounted for all the possibilities that might be numeric. Should -.5 be numeric? How about scientific notation, such as 314E-2? Should #5 be character? If the answer is Yes, here is another way to examine the last word:
if input(scan(line2,-1), ??16.) > .;
It examines the first 16 characters of the last word, and tries to read them to a number. If it can, the IF condition is true. If it can't, the ?? will suppress messages about invalid numeric data (and the IF condition will be false). And if the last word is shorter than 16 characters, that doesn't present a problem.
Nick,
Check out the anydigit string function is sas. To get it to work, you might need to use other functions to check what you need. For example, if you expect the number in the last column of the input string, this would work:
if anydigit(in_string,length(in_string)) > 0 then do whatever:
This works in 9.2 and 9.3. Here is link: http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0gq8bwobt4...
Larry
Use scan(x,-1) to get the last field and anyalpha to detect non-numbers :
data have;
length line1 line2 $128;
infile datalines truncover;
input line1 char. / line2 char.;
if not anyalpha(scan(line2,-1));
datalines;
C) D.C.CIRCUITS
EER-126 D.C. CIRCUITS.............. 98/WI A 2.67
N) FUNDAMENTAL MECHANICAL SYSTEMS
EGR-100 FUNDMNTL MECHNCL SKILLS______________________ 1 course needed
C) ROBOTICS IN CIM SYSTEMS
EGR-128 ROBOTICS IN CIM SYSTEMS.... 99/SP A 2
C) COMPUTER PROGRAMMING APPLICATIONS IN EGR TECHNOLOGY
IET-198 COMP PRGM APP IN ENG TECH.. 00/SP B 1.33
;
PG
I like the use of scan and -1 to get the last word. But the ANYthis and ANYthat functions are tricky to determine if they have accounted for all the possibilities that might be numeric. Should -.5 be numeric? How about scientific notation, such as 314E-2? Should #5 be character? If the answer is Yes, here is another way to examine the last word:
if input(scan(line2,-1), ??16.) > .;
It examines the first 16 characters of the last word, and tries to read them to a number. If it can, the IF condition is true. If it can't, the ?? will suppress messages about invalid numeric data (and the IF condition will be false). And if the last word is shorter than 16 characters, that doesn't present a problem.
That's indeed better. Better still : change the > . to not missing().
Cheers!
PG
Further tweaking... short of Regex :
if not missing(input(scan(line2,-1,":=","s"),??16.));
PG
Hi PG,
I am lazy to look up online. What do
":=","s" do in
scan(line2,-1,":=","s"),??16.)?
Thank you!
It adds the characters '=' and ':' to spacing characters (space, tab, etc.) as valid SCAN separators. I figured colon and equal characters could appear before a number.
PG
Thank you!
When you say spacing characters do you mean "delimiters"?
Would you please show me an example?:smileysilly:
Otherwise called word dividers, interword separators, whitespace... From the SCAN documentation :
s or S adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed).
So that lines ending with :
A 125
A=125
A:125
A 125
will be treated appropriately.
PG
Thank you PG!
After running the code below, I know why you added ":=","s". Sometime my mind doesn't work at all.
data have1;
input line2 $1-20;
If not missing(input(scan(line2,-1,":=","s"),??16.));
cards;
asdfgfhrfren 1
vdsjwheffffffff
dnfejfudsdhjn:2
vfdgriegrgggg=3
;
proc print;run;
data have2;
input line2 $1-20;
If not missing(input(scan(line2,-1),??16.));
cards;
asdfgfhrfren 1
vdsjwheffffffff
dnfejfudsdhjn:2
vfdgriegrgggg=3
;
proc print;run;
This worked great. Thanks for the reply.
Nick
I thought this solution would work until I found a few records that are being omitted. This is a line that I want.
INT-270 INT INTERNSHIP............. 09/SUB A | 3.33 *F |
Hi,
Could you please explain me this concept?
"It examines the first 16 characters of the last word"
what is the meaning of last word??
in the example below what if I have more than 16 letters in the word???
asdfgfhrfren 1
vdsjwheffffffff
dnfejfudsdhjn:2
vfdgriegrgggg=3
Thanks
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.