Hi all,
Lets say the data is as below
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
Koicrete School: CSUF Time: 1:40.25
Khaos School: UNLV Time: 2:03.45
Max School: UCSD Time: 1:26.47
Hakuna Matata School: UCLA Time: 1:20.64
Prospector School: CPSLO Time: 1:12.08
Andromeda School: CPP Time: 1:25.1
Kekoapohaku School: UHM Time: 1:24.49
if I want to read the text before the word "School:" how to go about it.
@n is useful to move to a column, but I want to identify a column @'School:' and move back from there.
Alternatively can we track the @n column where the dataline is held and use it to a calculation
*being a noob, pls be kind ;
You can read the whole thing, then extract the text that you need. Here is an example:
@skt wrote:
@n is useful to move to a column, but I want to identify a column @'School:' and move back from there.
Alternatively can we track the @n column where the dataline is held and use it to a calculation
Hi @skt,
I guess many people would indeed read the whole data line (e.g. using an "empty" INPUT statement so that the text is captured in automatic variable _INFILE_), then search for 'School' (e.g. using the FIND function to determine the position) and finally extract the appropriate substring (e.g. name=substr(_infile_,1,pos-2)), as suggested by Norman21.
But more specifically to your question: Yes, we can move back from the column we have found using the @'character-string' syntax and yes, we can track the position where the input pointer is currently located and use that in a calculation:
data want;
infile cards col=c;
input @'School' @;
c=c-8;
input @1 name $varying40. c;
cards;
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
;
Note that the @'character-string' syntax lets SAS go to the next line if 'character-string' is not found. So, records not containing the word "School" would be skipped and a note in the log "NOTE: SAS went to a new line ..." would indicate this. The other suggested method would be more flexible regarding this situation. For example, if "School" was not found, a different assignment statement could be executed (e.g. name=scan(_infile_,1,':')).
Edit: Changed all occurrences of "schoolname" and "school name" to "name" after realizing that CSULA, ASU etc. are the real "school" (i.e. university) names. Thanks @Tom for clarifying this for the non-native speakers. 🙂
If you want a more arcane solution, you could use a regular expression:
data work.want;
length
SchoolName $ 50
rx 8
;
drop rx;
input;
rx = prxparse('/(.+) school.*/i');
if prxmatch(rx, _infile_) then do;
SchoolName = prxposn(rx, 1, _infile_);
end;
datalines;
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
Koicrete School: CSUF Time: 1:40.25
Khaos School: UNLV Time: 2:03.45
Max School: UCSD Time: 1:26.47
Hakuna Matata School: UCLA Time: 1:20.64
Prospector School: CPSLO Time: 1:12.08
Andromeda School: CPP Time: 1:25.1
Kekoapohaku School: UHM Time: 1:24.49
;
run;
Use the COL= option on the INFILE statement to have SAS tell you where the current pointer is.
You can use @ and + in INPUT statement to move the pointer if you want.
For your particular example I wouldn't bother. Just read the data using ':' as the delimiter. Read the last field using a format so that it will read the colon's into the value. Then remove the "School" and "Time" from the values and convert the time string into a format that SAS can interpret.
data want;
length name $50 school $20 time 8;
infile datalines dsd dlm=':' truncover ;
input x1 :$100. x2 :$100. x3 $100. ;
drop x1-x3;
name=tranwrd(x1,'School ',' ');
school=tranwrd(x2,'Time ',' ');
time=input(cats('00:',x3),time20.);
format time time11.2;
datalines;
Bellatorum School: CSULA Time: 1:40.5
The Kraken School: ASU Time: 1:45.35
Black Widow School: UoA Time: 1:33.7
Koicrete School: CSUF Time: 1:40.25
Khaos School: UNLV Time: 2:03.45
Max School: UCSD Time: 1:26.47
Hakuna Matata School: UCLA Time: 1:20.64
Prospector School: CPSLO Time: 1:12.08
Andromeda School: CPP Time: 1:25.1
Kekoapohaku School: UHM Time: 1:24.49
;
proc print; run;
Obs name school time 1 Bellatorum CSULA 0:01:40.50 2 The Kraken ASU 0:01:45.35 3 Black Widow UoA 0:01:33.70 4 Koicrete CSUF 0:01:40.25 5 Khaos UNLV 0:02:03.45 6 Max UCSD 0:01:26.47 7 Hakuna Matata UCLA 0:01:20.64 8 Prospector CPSLO 0:01:12.08 9 Andromeda CPP 0:01:25.10 10 Kekoapohaku UHM 0:01:24.49
Having dealt with a similar issue I ask "Does every single one of your schools have "School" as the last word when present?"
I had to deal with "Idaho School for the Deaf and Blind" and "The School at <place name" constructs which meant more than a simple replacement was needed for a few cases.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.