BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ucdcrush
Obsidian | Level 7

Hi everyone, I'm having trouble with a very basic thing (I'm new to SAS) and I'm hoping there is an easy answer.

Here is how the dataset looks:

var1           var2
CA09|CA10      CA09
CA06|CA09      CA09
CA06           CA09

I am attempting to create a variable called "ismatch" which I intended to help identify rows where var2 is found within var1.

I use proc sql:

proc sql;
create table newtable as
select *, index(var1, var2) as ismatch;
quit;

What this produces, based on the above table is:


var1           var2        ismatch
CA09|CA10      CA09           0
CA06|CA09      CA09           6
CA06           CA09           0

...

2nd and 3rd row are OK. But the first row is coming up 0 despite CA09 being at the beginning of the string. I've tested it more, and it seems that the index function will only identify CA09 when it is the rightmost part of var1 (i.e., it would not find CA09 within CA06|CA09|CA11).

Yet, if I hardcode "CA09" in place of var2, the code works perfectly and finds CA09 anywhere in the string (as intended). I've also tried it in a data step and it seems to behave the same way. All examples of the INDEX function I've found use a hardcoded string, but is it possible to use a variable there?

Or is there a better way to find one string in another when both are variables?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Try trimming the blanks off the end of VAR2 when using the index function.

891  data have ;

892    length var1 $20 var2 $10 ;

893    input var1 var2;

894    loc1 = index(var1,var2);

895    loc2 = index(var1,trim(var2));

896    put (loc1 loc2 var1 var2) (=) ;

897  cards;

loc1=0 loc2=1 var1=CA09|CA10 var2=CA09

loc1=6 loc2=6 var1=CA06|CA09 var2=CA09

loc1=0 loc2=0 var1=CA06 var2=CA09

NOTE: The data set WORK.HAVE has 3 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Try trimming the blanks off the end of VAR2 when using the index function.

891  data have ;

892    length var1 $20 var2 $10 ;

893    input var1 var2;

894    loc1 = index(var1,var2);

895    loc2 = index(var1,trim(var2));

896    put (loc1 loc2 var1 var2) (=) ;

897  cards;

loc1=0 loc2=1 var1=CA09|CA10 var2=CA09

loc1=6 loc2=6 var1=CA06|CA09 var2=CA09

loc1=0 loc2=0 var1=CA06 var2=CA09

NOTE: The data set WORK.HAVE has 3 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

ucdcrush
Obsidian | Level 7

Thank you Tom!!

ucdcrush
Obsidian | Level 7

Tom, if you wouldn't mind, I'm trying to understand this. Clearly it works. But when I create another variable in proc sql,

LENGTH(var2) as lenvar

It returns 4 for each of the VAR2 values (all of them look like CA09) which makes sense.

But if TRIM is removing something, what can possibly be removed if the LENGTH of CA09 was already 4?

Thanks,

Dave

Tom
Super User Tom
Super User

All character variables in SAS are fixed length fields.  The LENGTH() function is showing you the length of the trimmed value, not the defined length of the variable.  To see the defined length of the variable run PROC CONTENTS on your dataset  (or use the DESCRIBE TABLE statement in SQL).  You seem to be more familiar with SQL syntax so to define the length of variable created by a SQL select statement you can add the LENGTH keyword.

create table new as

   select 'A'  as newvar length=5

        , *

   from sashelp.class

;

For most operations SAS will automatically trim the trailing spaces.  INDEX() is an exception where the spaces are preserved.

For example:

929  data _null_;

930    if 'A   ' = 'A' then put 'yes';

931  run;

yes



ucdcrush
Obsidian | Level 7

Thank you again sir!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1408 views
  • 0 likes
  • 2 in conversation