Hi all,
Lets say the data is as below
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
Koicrete School: CSUF Time: 1:40.25
Khaos School: UNLV Time: 2:03.45
Max School: UCSD Time: 1:26.47
Hakuna Matata School: UCLA Time: 1:20.64
Prospector School: CPSLO Time: 1:12.08
Andromeda School: CPP Time: 1:25.1
Kekoapohaku School: UHM Time: 1:24.49
if I want to read the text before the word "School:" how to go about it.
@n is useful to move to a column, but I want to identify a column @'School:' and move back from there.
Alternatively can we track the @n column where the dataline is held and use it to a calculation
*being a noob, pls be kind ;
You can read the whole thing, then extract the text that you need. Here is an example:
@skt wrote:
@n is useful to move to a column, but I want to identify a column @'School:' and move back from there.
Alternatively can we track the @n column where the dataline is held and use it to a calculation
Hi @skt,
I guess many people would indeed read the whole data line (e.g. using an "empty" INPUT statement so that the text is captured in automatic variable _INFILE_), then search for 'School' (e.g. using the FIND function to determine the position) and finally extract the appropriate substring (e.g. name=substr(_infile_,1,pos-2)), as suggested by Norman21.
But more specifically to your question: Yes, we can move back from the column we have found using the @'character-string' syntax and yes, we can track the position where the input pointer is currently located and use that in a calculation:
data want;
infile cards col=c;
input @'School' @;
c=c-8;
input @1 name $varying40. c;
cards;
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
;
Note that the @'character-string' syntax lets SAS go to the next line if 'character-string' is not found. So, records not containing the word "School" would be skipped and a note in the log "NOTE: SAS went to a new line ..." would indicate this. The other suggested method would be more flexible regarding this situation. For example, if "School" was not found, a different assignment statement could be executed (e.g. name=scan(_infile_,1,':')).
Edit: Changed all occurrences of "schoolname" and "school name" to "name" after realizing that CSULA, ASU etc. are the real "school" (i.e. university) names. Thanks @Tom for clarifying this for the non-native speakers. 🙂
If you want a more arcane solution, you could use a regular expression:
data work.want;
length
SchoolName $ 50
rx 8
;
drop rx;
input;
rx = prxparse('/(.+) school.*/i');
if prxmatch(rx, _infile_) then do;
SchoolName = prxposn(rx, 1, _infile_);
end;
datalines;
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
Koicrete School: CSUF Time: 1:40.25
Khaos School: UNLV Time: 2:03.45
Max School: UCSD Time: 1:26.47
Hakuna Matata School: UCLA Time: 1:20.64
Prospector School: CPSLO Time: 1:12.08
Andromeda School: CPP Time: 1:25.1
Kekoapohaku School: UHM Time: 1:24.49
;
run;
Use the COL= option on the INFILE statement to have SAS tell you where the current pointer is.
You can use @ and + in INPUT statement to move the pointer if you want.
For your particular example I wouldn't bother. Just read the data using ':' as the delimiter. Read the last field using a format so that it will read the colon's into the value. Then remove the "School" and "Time" from the values and convert the time string into a format that SAS can interpret.
data want;
length name $50 school $20 time 8;
infile datalines dsd dlm=':' truncover ;
input x1 :$100. x2 :$100. x3 $100. ;
drop x1-x3;
name=tranwrd(x1,'School ',' ');
school=tranwrd(x2,'Time ',' ');
time=input(cats('00:',x3),time20.);
format time time11.2;
datalines;
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
Koicrete School: CSUF Time: 1:40.25
Khaos School: UNLV Time: 2:03.45
Max School: UCSD Time: 1:26.47
Hakuna Matata School: UCLA Time: 1:20.64
Prospector School: CPSLO Time: 1:12.08
Andromeda School: CPP Time: 1:25.1
Kekoapohaku School: UHM Time: 1:24.49
;
proc print; run;
Obs name school time 1 Bellatorum CSULA 0:01:40.50 2 The Kraken ASU 0:01:45.35 3 Black Widow UoA 0:01:33.70 4 Koicrete CSUF 0:01:40.25 5 Khaos UNLV 0:02:03.45 6 Max UCSD 0:01:26.47 7 Hakuna Matata UCLA 0:01:20.64 8 Prospector CPSLO 0:01:12.08 9 Andromeda CPP 0:01:25.10 10 Kekoapohaku UHM 0:01:24.49
Having dealt with a similar issue I ask "Does every single one of your schools have "School" as the last word when present?"
I had to deal with "Idaho School for the Deaf and Blind" and "The School at <place name" constructs which meant more than a simple replacement was needed for a few cases.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.