Hi All,
I have a character variable in my dataset and i need to create the macro which will pass some value as parameter then in that variable, i need to lookback to find first occurance of keyword called "This" and then go and find the last occurance of keyword "mortgage" whatever is between these two key variables , i need to output in another dataset.
Input
data have;
column $30.;
datalines;
welcome
to
my
system
this
is
new
variable
and
its called
delq_status
in
mortgage
and
we
are assigning
it to zero
;
run;
What I want in output is
Want
Output (column name)
This
is
new
variable
and
its called
delq_status
in
mortgage
Start point and end point (This and Mortgage ) is always common but we need to be case insensitive and there has to be exact match.
I am trying really hard and think that regular expressions can be used for such scenarios but somehow not getting it right
Please please please suggest ideas
Do not use a macro.
data want;
length column $ 30;
keep column;
retain write 'N';
set have;
if upcase(column) = 'THIS' then write = 'Y';
if write = 'Y' then output;
if upcase(column) = 'MORTGAGE' then write = 'N';
run;
Hi DKB and slchen,
If the keyword "this" appears in between then the output is not getting generated correctly.
Lets say below input , where this is replaced by " and this ". Its getting empty results.
I have tried using contains but its not working
data have;
input column $30.;
datalines;
welcome
to
my
system
and this
is
new
variable
and
its called
delq_status
in
mortgage
and
we
are assigning
it to zero
;
run;
You could change it to index(column,"this")>0
That would find it even if there is other text. However what is it your trying to do, whay have you split a comment field into sometimes seperate words, sometimes not? What happens for tHis, THIs, this mortgage is this etc.
If you used one variable:
data have;
column="welcome to my system and this is new variable and its called delq_status in mortgage and we are assigning it to zero";
extract=substr(column,index(column,"this"),(index(column,"mortgage") - index(column,"this"))+8);
run;
My code still work if 'this' replace with 'and this'.
data temp(keep=_string);
set have end=last;
length string $200.;
retain string;
string=catx(' ',string,column);
if last then do;
_string=prxchange('s/.*(this .* mortgage).*/$1/i',-1,string);
output;
end;
run;
data want;
set temp;
num=countw(_string);
do i=1 to num;
col=scan(_string,i);
output;
end;
keep col;
run;
data have;
input column $30.;
datalines;
welcome
to
my
system
this
is
new
variable
and
its called
delq_status
in
mortgage
and
we
are assigning
it to zero
;
run;
data temp;
set have;
retain a;
n+1;
if column='this' then a=1;
run;
proc sort data=temp;by descending n;run;
data want;
set temp;
retain b;
if column='mortgage' then b=1;
if a and b;
drop a b;
run;
proc sort data=want;by n;run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.