SAS functions are incredibly powerful in a Data step. This Juletip will look at the strength of functions on text strings, and since it’s December it would be fun to look at the famous Christmas song Jingle Bells.
So, first we will use the DATALINES statement with an INPUT statement to read the two first stanza of the song.
options linesize=256;
data jingle_bells;
infile datalines delimiter=',' dsd truncover;
input stanza:1. Text:$500.;
datalines4;
1,Dashing through the snow - In a one horse open sleigh - O'er the fields we go - Laughing all the way - Bells on bob tails ring - Making spirits bright - What fun it is to laugh and sing - A sleighing song tonight
2,Oh jingle bells - jingle bells - Jingle all the way - Oh what fun it is to ride - In a one horse open sleigh - Jingle bells jingle bells - Jingle all the way - Oh what fun it is to ride - In a one horse open sleigh
;;;;
Data Jingle_BellsN(drop=Text_o);
set Jingle_Bells(rename=(Text=Text_o));
Text=upcase(Text_o);
run;
proc print data=Jingle_BellsN;
run;
The word Bells appears a lot of times in this Christmas song. It would be great to count the number of times it appears in each of the two stanza, together with the starting positions of the word Bells.
Also it’s interesting to find the word after Bells, to see if there is a pattern.
data Multiple_BELLS(drop=PrevBELLSWordNum);
set Jingle_BellsN;
/*Counts the number of times that the word BELLS appears within the text*/
NumBELLS=count(Text,'BELLS');
/* Returns the starting position where the word BELLS is found the text, counting delimeters also*/
BELLSWordPos=find(Text,'BELLS');
/*The e modifier counts the words that are scanned until BELLS is found,
instead of the starting point, without counting delimeters='- .,'*/
BELLSWordNum=findw(Text,'BELLS','- .,','e');
/*find the word after the key word BELLS*/
if BELLSWordNum>0 then
AfterBELLS=scan(Text,BELLSWordNum+1,'- .,');
output;
/*Multiple Occurences*/
do while(BELLSWordPos>0);
PrevBELLSWordNum=BELLSWordNum;
BELLSWordNum=findw(Text,'BELLS','- ., ',BELLSWordPos+1,'e')+PrevBELLSWordNum;
BELLSWordPos=find(Text,'BELLS',BELLSWordPos+1);
if BELLSWordPos>0 then
do;
AfterBELLS=scan(Text,BELLSWordNum+1,'- .,');
output;
end;
end;
run;
proc print data=Multiple_BELLS;
run;
But what if we want to split the stanza into different parts as verse, so we get one row for each verse in the Christmas song.
This could be done by the following code:
data Dividing_Text;
set Jingle_BellsN;
/* counts the number of times the ',-' appears the character string Common_names */
SpecCharNum=countc(Text,',-');
if SpecCharNum=0 then
do;
Name=Text;
output;
end;
/* writes first part out before the delimeters, then second part, and so on...*/
else
do i=1 to SpecCharNum+1;
Name=scan(Text,i,',-');
output;
end;
run;
proc print data=Dividing_Text;
run;
Maybe, I am a song writer and want to compose a new version of the Christmas song.
I can easily substitute words in the song by using PRXCHANGE functions:
data BELLS_substitution(drop=Text);
format Name_New $500.;
set Jingle_BellsN;
/*S=Specifies a substitution in regular expression, PRXCHANGE function performs a substitution for a pattern match*/
Name_New=prxchange('s/BELLS/ STAR/',-1,Text);
Name_New=prxchange('s/ RIDE/ SLIDE/',-1,Name_New);
run;
proc print data=BELLS_substitution;
run;
So, in my family we are singing the new ‘hit’ song Jingle Star this Year. 😉
I wish you all a Merry Christmas and Happy New Year! 🌲🎁🌲
Take a look at the autumn/winter 2024 program and events for FANS and other SAS users!
www.sas.com/fans | #SASFANS #sasnordicusers