DATA Step, Macro, Functions and more

Handling multi-character word delimiters ?

Accepted Solution Solved
Reply
Contributor
Posts: 69
Accepted Solution

Handling multi-character word delimiters ?

I have a string containing multiple words.  Each word needs to go into a macro variable.  I'm sure I'm missing something, but how do I handle a word delimiters that consists of multiple characters?

Here is the string I have to work with.  Is there a way to use the scan function to pull out the words without first doing a whole lot of substringing to replace "_" with a single character like a comma?

CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD

John


Accepted Solutions
Solution
‎05-27-2014 04:34 PM
Respected Advisor
Posts: 3,799

Re: Handling multi-character word delimiters ?

Posted in reply to Astounding

Another alternative that does not change the input uses INFILE magic and the DLMSTR infile statement option.  I do not see where/if the SCAN function has similar option.

%let string=CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD;

filename FT15F001 temp;
data _null_;           
  
infile FT15F001 dlmstr='"_"' missover;
  
input @;
   _infile_ = symget('string');
   length word $64;
  
do i = 1 by 1 until(missing(word));
      input word:$64. @;
      put word=;
      end;
  
stop;
  
parmcards;
Necessary evil
;;;;
   run;

word=CELL_TRGT_SEGMENT_CD
word=CELL_STATUS_CD
word=CELL_CREATIVE_CD
word=CELL_RPT_CD
word=CELL_FULFILL_GRP_CD
word=

View solution in original post


All Replies
Super User
Posts: 19,861

Re: Handling multi-character word delimiters ?

Posted in reply to bentleyj1

Scan can take multiple delimiters. You can also look at the modifiers section to add another delimiter.

Use single quote to include the quotation marks and _ in the list.

scan(word, i, '"_')

EDIT: I think I misunderstood your question. You only have 4 words separated by "_" each time?

If so I guess I'd go old school and use findw and substr.

Or PRX functions which I've successfully managed to avoid Smiley Happy

Contributor
Posts: 69

Re: Handling multi-character word delimiters ?

When multiple delimiters are passed to the scan function it uses any one of them as the delimiter.

Super User
Posts: 5,516

Re: Handling multi-character word delimiters ?

Posted in reply to bentleyj1

So you have data that uses _ as both text and as part of a multi-character delimiter?

Like Reeza, I have avoided PRX functions and they may be a candidate.  But here is some DATA step code that could be used.  The 4 quote marks are a single quote, a double quote, and a single quote.

data _null_;

length word $ 32;

i=0;

do until (word=' ');

   i + 1;

   word = scan(string, i, '"');

   if word not in (' ', '_') then do;

      mv_counter + 1;

     call symputx ('mv' || left(put(mv_counter,3.)), word);

   end;

end;

run;

Respected Advisor
Posts: 3,799

Re: Handling multi-character word delimiters ?

Posted in reply to bentleyj1

It looks to me more like you have delimiters in quoted sub-strings of string.  I think Q option on SCAN function will suffice.

%let string="CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD";

data _null_;
   string=symget(
'string');
   length word $64;
  
do i = 1 by 1 until(missing(word));
      word = scan(string,i,'_','Q');
      put word=;
      end;
  
stop;
  
run;

word=
"CELL_TRGT_SEGMENT_CD"
word=
"CELL_STATUS_CD "
word=
"CELL_CREATIVE_CD"
word=
"CELL_RPT_CD"
word=
"CELL_FULFILL_GRP_CD"
word=
Super User
Posts: 5,516

Re: Handling multi-character word delimiters ?

Posted in reply to data_null__

Nice approach.  However, I think the problem starts out a little differently, using delimeters of:

"_"

The original string doesn't start or end with a quote.  But that can be rectified in your program by adding the double quotes around strip(string).  Might be necessary to remove quotes from WORD at the end, though.

Respected Advisor
Posts: 3,799

Re: Handling multi-character word delimiters ?

Posted in reply to Astounding

I realized I've changed the input slightly but I think that can all be accommodated as you have outlined.

Solution
‎05-27-2014 04:34 PM
Respected Advisor
Posts: 3,799

Re: Handling multi-character word delimiters ?

Posted in reply to Astounding

Another alternative that does not change the input uses INFILE magic and the DLMSTR infile statement option.  I do not see where/if the SCAN function has similar option.

%let string=CELL_TRGT_SEGMENT_CD"_"CELL_STATUS_CD "_"CELL_CREATIVE_CD"_"CELL_RPT_CD"_"CELL_FULFILL_GRP_CD;

filename FT15F001 temp;
data _null_;           
  
infile FT15F001 dlmstr='"_"' missover;
  
input @;
   _infile_ = symget('string');
   length word $64;
  
do i = 1 by 1 until(missing(word));
      input word:$64. @;
      put word=;
      end;
  
stop;
  
parmcards;
Necessary evil
;;;;
   run;

word=CELL_TRGT_SEGMENT_CD
word=CELL_STATUS_CD
word=CELL_CREATIVE_CD
word=CELL_RPT_CD
word=CELL_FULFILL_GRP_CD
word=
Contributor
Posts: 69

Re: Handling multi-character word delimiters ?

Posted in reply to data_null__

Brilliant.   I wasn't aware of the infile statement's dlmstr= option. (it's been a long time since I've used INFILE.)  The way you use a temporary fileref and reset the _infile_ variable is vey clever.  Thanks for your help, and thanks to the other folks who took time to provide suggestions.

Respected Advisor
Posts: 3,799

Re: Handling multi-character word delimiters ?

Posted in reply to bentleyj1

Search for INFILE MAGIC at lexjansen.com

http://www2.sas.com/proceedings/sugi28/086-28.pdf

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2156 views
  • 1 like
  • 4 in conversation