DATA Step, Macro, Functions and more

Handling multi-character word delimiters ?

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

Handling multi-character word delimiters ?

I have a string containing multiple words.  Each word needs to go into a macro variable.  I'm sure I'm missing something, but how do I handle a word delimiters that consists of multiple characters?

Here is the string I have to work with.  Is there a way to use the scan function to pull out the words without first doing a whole lot of substringing to replace "_" with a single character like a comma?

CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD

John


Accepted Solutions
Solution
‎05-27-2014 04:34 PM
Respected Advisor
Posts: 3,777

Re: Handling multi-character word delimiters ?

Another alternative that does not change the input uses INFILE magic and the DLMSTR infile statement option.  I do not see where/if the SCAN function has similar option.

%let string=CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD;

filename FT15F001 temp;
data _null_;           
  
infile FT15F001 dlmstr='"_"' missover;
  
input @;
   _infile_ = symget('string');
   length word $64;
  
do i = 1 by 1 until(missing(word));
      input word:$64. @;
      put word=;
      end;
  
stop;
  
parmcards;
Necessary evil
;;;;
   run;

word=CELL_TRGT_SEGMENT_CD
word=CELL_STATUS_CD
word=CELL_CREATIVE_CD
word=CELL_RPT_CD
word=CELL_FULFILL_GRP_CD
word=

View solution in original post


All Replies
Super User
Posts: 17,840

Re: Handling multi-character word delimiters ?

Scan can take multiple delimiters. You can also look at the modifiers section to add another delimiter.

Use single quote to include the quotation marks and _ in the list.

scan(word, i, '"_')

EDIT: I think I misunderstood your question. You only have 4 words separated by "_" each time?

If so I guess I'd go old school and use findw and substr.

Or PRX functions which I've successfully managed to avoid Smiley Happy

Contributor
Posts: 63

Re: Handling multi-character word delimiters ?

When multiple delimiters are passed to the scan function it uses any one of them as the delimiter.

Super User
Posts: 5,084

Re: Handling multi-character word delimiters ?

So you have data that uses _ as both text and as part of a multi-character delimiter?

Like Reeza, I have avoided PRX functions and they may be a candidate.  But here is some DATA step code that could be used.  The 4 quote marks are a single quote, a double quote, and a single quote.

data _null_;

length word $ 32;

i=0;

do until (word=' ');

   i + 1;

   word = scan(string, i, '"');

   if word not in (' ', '_') then do;

      mv_counter + 1;

     call symputx ('mv' || left(put(mv_counter,3.)), word);

   end;

end;

run;

Respected Advisor
Posts: 3,777

Re: Handling multi-character word delimiters ?

It looks to me more like you have delimiters in quoted sub-strings of string.  I think Q option on SCAN function will suffice.

%let string="CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD";

data _null_;
   string=symget(
'string');
   length word $64;
  
do i = 1 by 1 until(missing(word));
      word = scan(string,i,'_','Q');
      put word=;
      end;
  
stop;
  
run;

word=
"CELL_TRGT_SEGMENT_CD"
word=
"CELL_STATUS_CD "
word=
"CELL_CREATIVE_CD"
word=
"CELL_RPT_CD"
word=
"CELL_FULFILL_GRP_CD"
word=
Super User
Posts: 5,084

Re: Handling multi-character word delimiters ?

Nice approach.  However, I think the problem starts out a little differently, using delimeters of:

"_"

The original string doesn't start or end with a quote.  But that can be rectified in your program by adding the double quotes around strip(string).  Might be necessary to remove quotes from WORD at the end, though.

Respected Advisor
Posts: 3,777

Re: Handling multi-character word delimiters ?

I realized I've changed the input slightly but I think that can all be accommodated as you have outlined.

Solution
‎05-27-2014 04:34 PM
Respected Advisor
Posts: 3,777

Re: Handling multi-character word delimiters ?

Another alternative that does not change the input uses INFILE magic and the DLMSTR infile statement option.  I do not see where/if the SCAN function has similar option.

%let string=CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD;

filename FT15F001 temp;
data _null_;           
  
infile FT15F001 dlmstr='"_"' missover;
  
input @;
   _infile_ = symget('string');
   length word $64;
  
do i = 1 by 1 until(missing(word));
      input word:$64. @;
      put word=;
      end;
  
stop;
  
parmcards;
Necessary evil
;;;;
   run;

word=CELL_TRGT_SEGMENT_CD
word=CELL_STATUS_CD
word=CELL_CREATIVE_CD
word=CELL_RPT_CD
word=CELL_FULFILL_GRP_CD
word=
Contributor
Posts: 63

Re: Handling multi-character word delimiters ?

Brilliant.   I wasn't aware of the infile statement's dlmstr= option. (it's been a long time since I've used INFILE.)  The way you use a temporary fileref and reset the _infile_ variable is vey clever.  Thanks for your help, and thanks to the other folks who took time to provide suggestions.

Respected Advisor
Posts: 3,777

Re: Handling multi-character word delimiters ?

Search for INFILE MAGIC at lexjansen.com

http://www2.sas.com/proceedings/sugi28/086-28.pdf

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1523 views
  • 1 like
  • 4 in conversation