Hi All,
I have a character variable in my dataset and i need to create the macro which will pass some value as parameter then in that variable, i need to lookback to find first occurance of keyword called "This" and then go and find the last occurance of keyword "mortgage" whatever is between these two key variables , i need to output in another dataset.
Input
data have;
column $30.;
datalines;
welcome
to
my
system
this
is
new
variable
and
its called
delq_status
in
mortgage
and
we
are assigning
it to zero
;
run;
What I want in output is
Want
Output (column name)
This
is
new
variable
and
its called
delq_status
in
mortgage
Start point and end point (This and Mortgage ) is always common but we need to be case insensitive and there has to be exact match.
I am trying really hard and think that regular expressions can be used for such scenarios but somehow not getting it right
Please please please suggest ideas
Do not use a macro.
data want;
length column $ 30;
keep column;
retain write 'N';
set have;
if upcase(column) = 'THIS' then write = 'Y';
if write = 'Y' then output;
if upcase(column) = 'MORTGAGE' then write = 'N';
run;
Hi DKB and slchen,
If the keyword "this" appears in between then the output is not getting generated correctly.
Lets say below input , where this is replaced by " and this ". Its getting empty results.
I have tried using contains but its not working
data have;
input column $30.;
datalines;
welcome
to
my
system
and this
is
new
variable
and
its called
delq_status
in
mortgage
and
we
are assigning
it to zero
;
run;
You could change it to index(column,"this")>0
That would find it even if there is other text. However what is it your trying to do, whay have you split a comment field into sometimes seperate words, sometimes not? What happens for tHis, THIs, this mortgage is this etc.
If you used one variable:
data have;
column="welcome to my system and this is new variable and its called delq_status in mortgage and we are assigning it to zero";
extract=substr(column,index(column,"this"),(index(column,"mortgage") - index(column,"this"))+8);
run;
My code still work if 'this' replace with 'and this'.
data temp(keep=_string);
set have end=last;
length string $200.;
retain string;
string=catx(' ',string,column);
if last then do;
_string=prxchange('s/.*(this .* mortgage).*/$1/i',-1,string);
output;
end;
run;
data want;
set temp;
num=countw(_string);
do i=1 to num;
col=scan(_string,i);
output;
end;
keep col;
run;
data have;
input column $30.;
datalines;
welcome
to
my
system
this
is
new
variable
and
its called
delq_status
in
mortgage
and
we
are assigning
it to zero
;
run;
data temp;
set have;
retain a;
n+1;
if column='this' then a=1;
run;
proc sort data=temp;by descending n;run;
data want;
set temp;
retain b;
if column='mortgage' then b=1;
if a and b;
drop a b;
run;
proc sort data=want;by n;run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.