DATA Step, Macro, Functions and more

How to retrieve the multiple instances of a word between two special characters

Reply
Contributor
Posts: 26

How to retrieve the multiple instances of a word between two special characters

Hi Folks,

 

I have one string and I need to extract multiple words based on two special characters.

 

data ttest;
test="case when date_account_opened >= intnx('month',&STRMONTH_START.,&i.,'same' ) and date_account_closed <= intnx('month',&STRMONTH_END.,&i.,'same' ) then 1 else 0 end as Corp_New_aggregate";
inner_str = SCAN(SUBSTR(test,INDEX(test,',')+1),1,',');
run;

So, in the above example I need the output as both  STRMONTH_START and STRMONTH_END. Now I am getting only STRMONTH_START in the inner_str variable.

 

Please help me to retrieve the words either in the same variable or in multiple variables as per the occurances.

 

Super Contributor
Posts: 500

Re: How to retrieve the multiple instances of a word between two special characters

Contributor
Posts: 26

Re: How to retrieve the multiple instances of a word between two special characters

Posted in reply to andreas_lds

Hi Andreas,

 

Thanks for replying. I think earlier it was the concern of finding a piece of sentense based on 2 words and this time it's finding a word based on a keyword that is INTNX, so I raised another thread.

 

 

Yeah, I had raised the thread with similar heading that could be the confusion, but the requirement is different this time I guess. I did that coding with my understading. Also, I just wanted to know can it be possible with PRXCHANGE, finding two keywords based on a regular expression or a pettern.

 

Hope this will help. 

Super User
Super User
Posts: 9,227

Re: How to retrieve the multiple instances of a word between two special characters

Really, really not a good idea to be trying to parse code from a text string.  There are multiple possibilities which you would need to check for, upper/lower case, only first string is present, only last string, multiples unbalanced etc. I would really question why you are doing this. Just for starters, your test string is incorrect, as the " around it will trigger the macro pre-processor to try to finid those macro variables.

 

Contributor
Posts: 26

Re: How to retrieve the multiple instances of a word between two special characters

Hi Rw9,

 

Thanks for responding. I think I had ran the same code it was working fine and giving the output as &STRMONTH_START. , but just wondering how to retrieve if there are multiple instances of INTNX function. I am doing this just to identify the contibuting variables inside a specific query.

 

 

Hope this much of information helps.

Super User
Super User
Posts: 9,227

Re: How to retrieve the multiple instances of a word between two special characters

Thats really the point though isn't it.  Programming allows you to code things in many different ways using a variety of techniques and constructs.  What you plan may work for one specific example, but for others it will not.  Code analysers are not simple things to create.  Take a look at this example, which uses the SAS proc scaproc (source code analyser) output to identify inputs/outputs etc.  

http://support.sas.com/kb/58/047.html

Really not that straight forward.

Contributor
Posts: 26

Re: How to retrieve the multiple instances of a word between two special characters

Thanks for replying. I think the style I am talking about to retrieve the words will be generic for any example. Anyways, will post the code if anything will click in my mind.

 

Thanks and Cheers.

Super User
Posts: 13,084

Re: How to retrieve the multiple instances of a word between two special characters

[ Edited ]

Perhaps what you need, especially if you are going to keep looking for more of these things, is a tokenizer program. This is a program that finds ALL "words" and creates an output data set that has word and the line the word was found on.

 

Then you could search for "words" like "intnx" that occur more than once on the same line.

 

Something like:

data tokens;
   infile datalines dsd  ;
   length word $42.;
   input @;
   Line=_n_;
   do i= 1 to (countw(_infile_,' ,()[]{}/=;','OQST'));
      word = scan(_infile_,i,' ,()[]{}/=;');
      output;
   end;
   drop i;
   input;

datalines4;
Proc sgplot  data= internal  
        dattrmap=wicenr.ethattrmap  ;
   by agency;
   styleattrs datasymbols = (circle circlefilled );
   reg x=monoffset y=inf2Child/ 
            group=ethnicity   attrid=ethnicity
   ;         
   format  ethnicity enrethnicity. inf2Child percent7.1 monoffset monoffset.;
   yaxis values=(0 to .50 by .1) ;
   xaxis values=(0 to 60 by 6);
run ;
title;
footnote;
;;;;
run;

If you use many long string literals in your code you would have to increase the length of the word variable.

 

You might also want to consider upcase or lowcase to get consistent case for frequencies or such.

Contributor
Posts: 26

Re: How to retrieve the multiple instances of a word between two special characters

Hi Ballard,

 

Thanks for the reply. I think the code given by you is tokeninzing each and every words which I don't need. Whenever the loop will be initiated it will strat the search from the begining.

 

I have tried to modify your code as per mine, but I am getting the STRMONTH_START two times in output. It's not capturing the STRMONTH_END though it's identifying the total number of intnx functions used in the query.

 

Below is the code.

 

data ttest;
test="case when date_account_opened >= intnx('month',&STRMONTH_START.,&i.,'same' ) and date_account_closed <= intnx('month',&STRMONTH_END.,&i.,'same' ) then 1 else 0 end as Corp_New_aggregate";
do i= 1 to (count(test,'intnx'));
      word = scan(substr(test,index(UPCASE(test),"INTNX")),2,',');
      output;
   end;
   drop i;
run;

 Please help anyone.

Ask a Question
Discussion stats
  • 8 replies
  • 225 views
  • 0 likes
  • 4 in conversation