BookmarkSubscribeRSS Feed
rajdeep
Pyrite | Level 9

Hi Folks,

 

I have one string and I need to extract multiple words based on two special characters.

 

data ttest;
test="case when date_account_opened >= intnx('month',&STRMONTH_START.,&i.,'same' ) and date_account_closed <= intnx('month',&STRMONTH_END.,&i.,'same' ) then 1 else 0 end as Corp_New_aggregate";
inner_str = SCAN(SUBSTR(test,INDEX(test,',')+1),1,',');
run;

So, in the above example I need the output as both  STRMONTH_START and STRMONTH_END. Now I am getting only STRMONTH_START in the inner_str variable.

 

Please help me to retrieve the words either in the same variable or in multiple variables as per the occurances.

 

8 REPLIES 8
rajdeep
Pyrite | Level 9

Hi Andreas,

 

Thanks for replying. I think earlier it was the concern of finding a piece of sentense based on 2 words and this time it's finding a word based on a keyword that is INTNX, so I raised another thread.

 

 

Yeah, I had raised the thread with similar heading that could be the confusion, but the requirement is different this time I guess. I did that coding with my understading. Also, I just wanted to know can it be possible with PRXCHANGE, finding two keywords based on a regular expression or a pettern.

 

Hope this will help. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Really, really not a good idea to be trying to parse code from a text string.  There are multiple possibilities which you would need to check for, upper/lower case, only first string is present, only last string, multiples unbalanced etc. I would really question why you are doing this. Just for starters, your test string is incorrect, as the " around it will trigger the macro pre-processor to try to finid those macro variables.

 

rajdeep
Pyrite | Level 9

Hi Rw9,

 

Thanks for responding. I think I had ran the same code it was working fine and giving the output as &STRMONTH_START. , but just wondering how to retrieve if there are multiple instances of INTNX function. I am doing this just to identify the contibuting variables inside a specific query.

 

 

Hope this much of information helps.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Thats really the point though isn't it.  Programming allows you to code things in many different ways using a variety of techniques and constructs.  What you plan may work for one specific example, but for others it will not.  Code analysers are not simple things to create.  Take a look at this example, which uses the SAS proc scaproc (source code analyser) output to identify inputs/outputs etc.  

http://support.sas.com/kb/58/047.html

Really not that straight forward.

rajdeep
Pyrite | Level 9

Thanks for replying. I think the style I am talking about to retrieve the words will be generic for any example. Anyways, will post the code if anything will click in my mind.

 

Thanks and Cheers.

ballardw
Super User

Perhaps what you need, especially if you are going to keep looking for more of these things, is a tokenizer program. This is a program that finds ALL "words" and creates an output data set that has word and the line the word was found on.

 

Then you could search for "words" like "intnx" that occur more than once on the same line.

 

Something like:

data tokens;
   infile datalines dsd  ;
   length word $42.;
   input @;
   Line=_n_;
   do i= 1 to (countw(_infile_,' ,()[]{}/=;','OQST'));
      word = scan(_infile_,i,' ,()[]{}/=;');
      output;
   end;
   drop i;
   input;

datalines4;
Proc sgplot  data= internal  
        dattrmap=wicenr.ethattrmap  ;
   by agency;
   styleattrs datasymbols = (circle circlefilled );
   reg x=monoffset y=inf2Child/ 
            group=ethnicity   attrid=ethnicity
   ;         
   format  ethnicity enrethnicity. inf2Child percent7.1 monoffset monoffset.;
   yaxis values=(0 to .50 by .1) ;
   xaxis values=(0 to 60 by 6);
run ;
title;
footnote;
;;;;
run;

If you use many long string literals in your code you would have to increase the length of the word variable.

 

You might also want to consider upcase or lowcase to get consistent case for frequencies or such.

rajdeep
Pyrite | Level 9

Hi Ballard,

 

Thanks for the reply. I think the code given by you is tokeninzing each and every words which I don't need. Whenever the loop will be initiated it will strat the search from the begining.

 

I have tried to modify your code as per mine, but I am getting the STRMONTH_START two times in output. It's not capturing the STRMONTH_END though it's identifying the total number of intnx functions used in the query.

 

Below is the code.

 

data ttest;
test="case when date_account_opened >= intnx('month',&STRMONTH_START.,&i.,'same' ) and date_account_closed <= intnx('month',&STRMONTH_END.,&i.,'same' ) then 1 else 0 end as Corp_New_aggregate";
do i= 1 to (count(test,'intnx'));
      word = scan(substr(test,index(UPCASE(test),"INTNX")),2,',');
      output;
   end;
   drop i;
run;

 Please help anyone.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2301 views
  • 0 likes
  • 4 in conversation