Hi Folks,
I have one string and I need to extract multiple words based on two special characters.
data ttest;
test="case when date_account_opened >= intnx('month',&STRMONTH_START.,&i.,'same' ) and date_account_closed <= intnx('month',&STRMONTH_END.,&i.,'same' ) then 1 else 0 end as Corp_New_aggregate";
inner_str = SCAN(SUBSTR(test,INDEX(test,',')+1),1,',');
run;
So, in the above example I need the output as both STRMONTH_START and STRMONTH_END. Now I am getting only STRMONTH_START in the inner_str variable.
Please help me to retrieve the words either in the same variable or in multiple variables as per the occurances.
The question seems to be connected to https://communities.sas.com/t5/Base-SAS-Programming/How-to-extract-a-string-based-on-a-word-inside-t...
Hi Andreas,
Thanks for replying. I think earlier it was the concern of finding a piece of sentense based on 2 words and this time it's finding a word based on a keyword that is INTNX, so I raised another thread.
Yeah, I had raised the thread with similar heading that could be the confusion, but the requirement is different this time I guess. I did that coding with my understading. Also, I just wanted to know can it be possible with PRXCHANGE, finding two keywords based on a regular expression or a pettern.
Hope this will help.
Really, really not a good idea to be trying to parse code from a text string. There are multiple possibilities which you would need to check for, upper/lower case, only first string is present, only last string, multiples unbalanced etc. I would really question why you are doing this. Just for starters, your test string is incorrect, as the " around it will trigger the macro pre-processor to try to finid those macro variables.
Hi Rw9,
Thanks for responding. I think I had ran the same code it was working fine and giving the output as &STRMONTH_START. , but just wondering how to retrieve if there are multiple instances of INTNX function. I am doing this just to identify the contibuting variables inside a specific query.
Hope this much of information helps.
Thats really the point though isn't it. Programming allows you to code things in many different ways using a variety of techniques and constructs. What you plan may work for one specific example, but for others it will not. Code analysers are not simple things to create. Take a look at this example, which uses the SAS proc scaproc (source code analyser) output to identify inputs/outputs etc.
http://support.sas.com/kb/58/047.html
Really not that straight forward.
Thanks for replying. I think the style I am talking about to retrieve the words will be generic for any example. Anyways, will post the code if anything will click in my mind.
Thanks and Cheers.
Perhaps what you need, especially if you are going to keep looking for more of these things, is a tokenizer program. This is a program that finds ALL "words" and creates an output data set that has word and the line the word was found on.
Then you could search for "words" like "intnx" that occur more than once on the same line.
Something like:
data tokens; infile datalines dsd ; length word $42.; input @; Line=_n_; do i= 1 to (countw(_infile_,' ,()[]{}/=;','OQST')); word = scan(_infile_,i,' ,()[]{}/=;'); output; end; drop i; input; datalines4; Proc sgplot data= internal dattrmap=wicenr.ethattrmap ; by agency; styleattrs datasymbols = (circle circlefilled ); reg x=monoffset y=inf2Child/ group=ethnicity attrid=ethnicity ; format ethnicity enrethnicity. inf2Child percent7.1 monoffset monoffset.; yaxis values=(0 to .50 by .1) ; xaxis values=(0 to 60 by 6); run ; title; footnote; ;;;; run;
If you use many long string literals in your code you would have to increase the length of the word variable.
You might also want to consider upcase or lowcase to get consistent case for frequencies or such.
Hi Ballard,
Thanks for the reply. I think the code given by you is tokeninzing each and every words which I don't need. Whenever the loop will be initiated it will strat the search from the begining.
I have tried to modify your code as per mine, but I am getting the STRMONTH_START two times in output. It's not capturing the STRMONTH_END though it's identifying the total number of intnx functions used in the query.
Below is the code.
data ttest;
test="case when date_account_opened >= intnx('month',&STRMONTH_START.,&i.,'same' ) and date_account_closed <= intnx('month',&STRMONTH_END.,&i.,'same' ) then 1 else 0 end as Corp_New_aggregate";
do i= 1 to (count(test,'intnx'));
word = scan(substr(test,index(UPCASE(test),"INTNX")),2,',');
output;
end;
drop i;
run;
Please help anyone.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.