BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rmacarthur
Pyrite | Level 9

Hi SAS friends, 

Am trying to use the index function to locate a string variable "MrgMT" in a longer string variable "Merge_term ", but it is not working as expected. 

Have checked the source datasets for hidden characters and "purged" them with a compress function (used the "KAS" option to keep only alphanumeric characters and some symbols), and viewed them in note pad looking for bad characters, but still no luck.   I've uploaded a sample SAS file with these thee variables. 

Here is the code 

data	lcl.Index_Prob		; set	lcl.&FFN._TMP	;
Where MrgMT in ("ALBUTEROL", "ACAMPROSATE") 	
and	
Merge_term in ("4MG OF ALBUTEROL", "ALBUTEROL", "ACAMPROSATE", "ACAMPROSATE (CAMPRAL)");

Merge_No	=	index(Merge_term, MrgMT)	;

keep MrgMT	Merge_term Merge_No	;

run	;

Here is an example of the output, with comments:

 

Merge_termMrgMTMerge_NoShould be
4MG OF ALBUTEROLALBUTEROL08
ACAMPROSATEACAMPROSATE1OK !
ACAMPROSATEALBUTEROL0OK !
ACAMPROSATE (CAMPRAL)ACAMPROSATE01
ALBUTEROLALBUTEROL1OK!

 

Any suggestions on why INDEX function is not working?  

Thank you !

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Nothing wrong with the INDEX function.  Be sure to use the TRIM function as in:

Merge_No	=	index(Merge_term, trim(MrgMT))	;

Here's an illustration:

 

30   data have;
31    text='aaa bbb ccc';
32    length longsrch $4;
33    length shrtsrch $3;
34    longsrch='ccc';
35    shrtsrch='ccc';
36    ixlong=index(text,longsrch);
37    ixshrt=index(text,shrtsrch);
38    ixtrim=index(text,trim(longsrch));
39    put (_all_) (=);
40   run;

text=aaa bbb ccc longsrch=ccc shrtsrch=ccc ixlong=0 ixshrt=9 ixtrim=9
NOTE: The data set WORK.HAVE has 1 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

Nothing wrong with the INDEX function.  Be sure to use the TRIM function as in:

Merge_No	=	index(Merge_term, trim(MrgMT))	;

Here's an illustration:

 

30   data have;
31    text='aaa bbb ccc';
32    length longsrch $4;
33    length shrtsrch $3;
34    longsrch='ccc';
35    shrtsrch='ccc';
36    ixlong=index(text,longsrch);
37    ixshrt=index(text,shrtsrch);
38    ixtrim=index(text,trim(longsrch));
39    put (_all_) (=);
40   run;

text=aaa bbb ccc longsrch=ccc shrtsrch=ccc ixlong=0 ixshrt=9 ixtrim=9
NOTE: The data set WORK.HAVE has 1 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rmacarthur
Pyrite | Level 9

Thank you very much , and especially appreciate the explanation !

AVO339
Obsidian | Level 7

Hello.

The documentation say:

INDEX(source, excerpt)

Both leading and trailing spaces are considered part of the excerpt argument. To remove trailing spaces, include the TRIM function with the excerpt variable inside the INDEX function.

 

Maybe you can try:

Merge_No = index(Merge_term, trim(MrgMT));

rmacarthur
Pyrite | Level 9

Thank you AVO, much appreciated !

Tom
Super User Tom
Super User

SAS uses fixed length character variables.  So short values are padded with spaces to fill out the variable to its complete length.

 

Normal equality comparisons (like in your WHERE statement) will ignore the trailing spaces.  But the INDEX function does not.  If you ask it to look for 'FRED    ' in will not match strings that don't have all four spaces following the D.

 

Use TRIM() to remove the trailing spaces from the string you are asking INDEX to look for.

data lcl.Index_Prob;
  set lcl.&FFN._TMP;
  where MrgMT in ("ALBUTEROL", "ACAMPROSATE")  
    and Merge_term in ("4MG OF ALBUTEROL", "ALBUTEROL", "ACAMPROSATE", "ACAMPROSATE (CAMPRAL)")
  ;
  Merge_No = index(Merge_term, trim(MrgMT));
  keep MrgMT Merge_term Merge_No ;
run ;
rmacarthur
Pyrite | Level 9

Thank you Tom, and the explanation is very helpful, much appreciated !

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 578 views
  • 4 likes
  • 4 in conversation