DATA Step, Macro, Functions and more

How to extract a string based on a word inside the string

Reply
Contributor
Posts: 26

How to extract a string based on a word inside the string

Hi folks,

 

I am using SAS 9.4. I have the below string stored in a variable called Test.

 

Data test;
input Test $1000;
Test="case when date_account_opened >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_New_aggregate ,case when closure_date >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_Closed_aggregate,case when a.credit_limit > 0 then (bl.current_balance/a.credit_limit) else . end as Month_End_Utilisation";

Test1=Findw(Test,"Corp_Closed_aggregate");
Test2=Findw(Test,"Case When");
test3=Substr(Test,Test1,Test2); Run;

 

Now I want to extract the complete the case when statement based on the as variable name.

 

Like if I want to search the "Corp_Closed_aggregate", then it should give me the complete "case when closure_date >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_Closed_aggregate" as result into another variable.

 

Could anyone please let me know if it is possible in SAS?

 

Please help me if possible.

 

Thanks in advance.

 

 

 

Super User
Posts: 2,518

Re: How to extract a string based on a word inside the string

A regular expression would do this easily, but you can just use 

Test2=Findw(Test,"Case When");

as you did 

and then 

Test2=Findw(Test,"End As",'i');

Anything  between is the case test and anything after this the variable name 

Respected Advisor
Posts: 4,802

Re: How to extract a string based on a word inside the string

[ Edited ]

@ChrisNZ

I believe it's actually quite tricky if you want to build something which also works for multiple case statements in a string especially if you also allow for nested case statements.

 

The following should work as long as there are no nested case statements.

%let STRMONTH='29Aug2017'd;
%let i=1;

data test;
  test="case when date_account_opened >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_New_aggregate ,case when closure_date >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_Closed_aggregate,case when a.credit_limit > 0 then (bl.current_balance/a.credit_limit) else . end as Month_End_Utilisation";

  found=prxmatch('/end\s+as\s+Corp_Closed_aggregate/oi',test);
  if found>0 then
    do;
      start=find(test,'case',-found,'i');
      want=catx(' ',substrn(test,start,found-start),'end as Corp_Closed_aggregate');
    end;
run;
Super User
Posts: 2,518

Re: How to extract a string based on a word inside the string

@Patrick Not really unless you have nested cases (and then regex probably should be used).

Otherwise this works:

 

data TEST;
  TEST='case when date_account_opened >= intnx(''month'',&STRMONTH.,&i.,''same'') then 1 
   else 0 end as Corp_New_aggregate ,
   case when closure_date >= intnx(''month'',&STRMONTH.,&i.,''same'') then 1 
   else 0 end as Corp_Closed_aggregate,
   case when a.credit_limit > 0 then (bl.current_balance/a.credit_limit) 
   else . end as Month_End_Utilisation';

  VAR_TO_FIND = 'Corp_Closed_aggregate';
  POSVAR      = find(TEST,' as '||trim(VAR_TO_FIND),'i');
  POSCASE     = find(TEST,'case when ',-POSVAR,'i');
  FORMULA     = substr(TEST,POSCASE,POSVAR-POSCASE);
  putlog FORMULA=;
run;


FORMULA=case when closure_date >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end

 

 

Super User
Super User
Posts: 8,290

Re: How to extract a string based on a word inside the string

This problem is that your example string includes three CASE statements separated by commas. Easier to see if you refomart the code to show this.

Test=cats
("case when date_account_opened >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_New_aggregate"
,','
,"case when closure_date >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_Closed_aggregate"
,','
,"case when a.credit_limit > 0 then (bl.current_balance/a.credit_limit) else . end as Month_End_Utilisation"
);

Since the case statements themselves could contain commas you will need to use some type of regular expression to split the string into individual parts.  Most likely you will want to use regular expressions for this.

Super User
Posts: 10,860

Re: How to extract a string based on a word inside the string

Data test;
infile cards truncover;
input Test $1000.;
temp=substr(test,1,findw(test,'Corp_Closed_aggregate',' ,''i')-1);
want=substr(temp,findw(temp,'case',' ,','ib'));
cards4;
case when date_account_opened >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_New_aggregate ,case when closure_date >= intnx('month',&STRMONTH.,&i.,'same') then 1 else 0 end as Corp_Closed_aggregate,case when a.credit_limit > 0 then (bl.current_balance/a.credit_limit) else . end as Month_End_Utilisation
;;;;
Run;
Super User
Posts: 13,950

Re: How to extract a string based on a word inside the string

One thing that helps: when providing a data step test that it will run. You had an input statement and nothing to read. Also because you

have macro variables referenced inside the string you cannot use the " (double quote) or else the macro variables such as &strmonth will attempt to resolve. If you have actually read a text file to contain your "test" variable then that shouldn't be an issue. Note the use of the doubled single quotes around the strings month and same so we can get around that in the example code.

I truly detest lines of code over 200 characters long displayed on a single line in the editor as they aren't readable and get even worse here so I put in some line breaks.

 

Please see:

Data test;
Test='case when date_account_opened >= intnx(''month'',&STRMONTH.,&i.,''same'') then 1 
 else 0 end as Corp_New_aggregate ,
 case when closure_date >= intnx(''month'',&STRMONTH.,&i.,''same'') then 1 
 else 0 end as Corp_Closed_aggregate,
 case when a.credit_limit > 0 then (bl.current_balance/a.credit_limit) 
 else . end as Month_End_Utilisation';

Test1=Findw(Test,"Corp_Closed_aggregate");
Test2=Findw(Test,"Case When",' ','bi',test1);
test3=Substr(Test,Test2,(test1-test2 +length("Corp_Closed_aggregate")));
Run;

Key pieces: when searching for the "case when" associated with the varaible you search backward from the starting position, so "b" in the findw for test2 and test1 as a start position, the 'i' is included because "Case When" does not exist in your search line so use the 'i' to ignore case. I suggest using the 'i' option in all of the findw calls unless someone is real good about consistent case in that source file.

 

You also had the substring for test 3 in the wrong order: Test1 should be > test2. And since test1 and test2 are the positions of the start of the respective found words you need to find the overall number of characters to select, so test1-test2 gives the length of the string from "case" to the start of "Corp_closed_aggregate. So you need to add the length of that variable to get enough characters for the entire value of test3.

Ask a Question
Discussion stats
  • 6 replies
  • 730 views
  • 1 like
  • 6 in conversation