Hi everyone, I'm having trouble with a very basic thing (I'm new to SAS) and I'm hoping there is an easy answer.
Here is how the dataset looks:
var1 var2
CA09|CA10 CA09
CA06|CA09 CA09
CA06 CA09
I am attempting to create a variable called "ismatch" which I intended to help identify rows where var2 is found within var1.
I use proc sql:
proc sql;
create table newtable as
select *, index(var1, var2) as ismatch;
quit;
What this produces, based on the above table is:
var1 var2 ismatch
CA09|CA10 CA09 0
CA06|CA09 CA09 6
CA06 CA09 0
...
2nd and 3rd row are OK. But the first row is coming up 0 despite CA09 being at the beginning of the string. I've tested it more, and it seems that the index function will only identify CA09 when it is the rightmost part of var1 (i.e., it would not find CA09 within CA06|CA09|CA11).
Yet, if I hardcode "CA09" in place of var2, the code works perfectly and finds CA09 anywhere in the string (as intended). I've also tried it in a data step and it seems to behave the same way. All examples of the INDEX function I've found use a hardcoded string, but is it possible to use a variable there?
Or is there a better way to find one string in another when both are variables?
Thank you!
Try trimming the blanks off the end of VAR2 when using the index function.
891 data have ;
892 length var1 $20 var2 $10 ;
893 input var1 var2;
894 loc1 = index(var1,var2);
895 loc2 = index(var1,trim(var2));
896 put (loc1 loc2 var1 var2) (=) ;
897 cards;
loc1=0 loc2=1 var1=CA09|CA10 var2=CA09
loc1=6 loc2=6 var1=CA06|CA09 var2=CA09
loc1=0 loc2=0 var1=CA06 var2=CA09
NOTE: The data set WORK.HAVE has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Try trimming the blanks off the end of VAR2 when using the index function.
891 data have ;
892 length var1 $20 var2 $10 ;
893 input var1 var2;
894 loc1 = index(var1,var2);
895 loc2 = index(var1,trim(var2));
896 put (loc1 loc2 var1 var2) (=) ;
897 cards;
loc1=0 loc2=1 var1=CA09|CA10 var2=CA09
loc1=6 loc2=6 var1=CA06|CA09 var2=CA09
loc1=0 loc2=0 var1=CA06 var2=CA09
NOTE: The data set WORK.HAVE has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Thank you Tom!!
Tom, if you wouldn't mind, I'm trying to understand this. Clearly it works. But when I create another variable in proc sql,
LENGTH(var2) as lenvar
It returns 4 for each of the VAR2 values (all of them look like CA09) which makes sense.
But if TRIM is removing something, what can possibly be removed if the LENGTH of CA09 was already 4?
Thanks,
Dave
All character variables in SAS are fixed length fields. The LENGTH() function is showing you the length of the trimmed value, not the defined length of the variable. To see the defined length of the variable run PROC CONTENTS on your dataset (or use the DESCRIBE TABLE statement in SQL). You seem to be more familiar with SQL syntax so to define the length of variable created by a SQL select statement you can add the LENGTH keyword.
create table new as
select 'A' as newvar length=5
, *
from sashelp.class
;
For most operations SAS will automatically trim the trailing spaces. INDEX() is an exception where the spaces are preserved.
For example:
929 data _null_;
930 if 'A ' = 'A' then put 'yes';
931 run;
yes
Thank you again sir!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.