Is there any way to extract Country Name/ Codes from a free text column containing country name/code? There is no pattern in the free text i.e. the name/code of country can be at any place, start, end or middle in the free text.
Below are the sample values from free text column:
Further Credit to Bank ABC IRAN
Forward Credit to India from Salman Ali
For further credit from Bank XYZ USA to Account of Rakesh Roy
Forward Credit to Bank in United States of America from Salman Ali
From above free texts, I have to find out all country names / codes available and store it in a new field. Can you please suggest any way possible to do this? I have a table with all country names & their codes.
Hello,
data have;
input text $50.;
cards;
Further Credit to Bank ABC IRAN
Forward Credit to India from Salman Ali
For further credit from Bank XYZ USA to Account of Rakesh Roy
Forward Credit to Bank in United States of America from Salman Ali
run;
data countries;
input country $30.;
cards;
USA
India
United States of America
Iran
;
run;
data want;
set have;
found=0;
do i=1 to nrows;
set countries point=i nobs=nrows;
if find(text, country, 'it') then do;
found=1;
leave;
end;
end;
run;
Hello,
data have;
input text $50.;
cards;
Further Credit to Bank ABC IRAN
Forward Credit to India from Salman Ali
For further credit from Bank XYZ USA to Account of Rakesh Roy
Forward Credit to Bank in United States of America from Salman Ali
run;
data countries;
input country $30.;
cards;
USA
India
United States of America
Iran
;
run;
data want;
set have;
found=0;
do i=1 to nrows;
set countries point=i nobs=nrows;
if find(text, country, 'it') then do;
found=1;
leave;
end;
end;
run;
If you have the list then you can use that. For instance - and I have no test data for either dataset (in the form of a datastep!) to use so this is just sample code:
data _null_; set countries end=last; if _n_=1 then call execute('data want; set have;'); call execute('if index(free_text,"',strip(country),'" then country="',strip(country),'";'); if last then call execute('run;'); run;
What this will do is generate a datastep with an if statement for each country in your list of countries. This new datastep then gets run.
another way I can think of by using lookup table and prxchange.
/*have a lookuptable with all your values*/
data lookup;
value ="IRAN|INDIA|USA|United States of America";
run;
/*this is your table*/
data mytable;
length col1 $200.;
col1= "Further Credit to Bank ABC IRAN";
output;
col1= "Forward Credit to India from Salman Ali";
output;
col1= "For further credit from Bank XYZ USA to Account of Rakesh Roy";
output;
col1= "Forward Credit to Bank in United States of America from Salman Al"
;
output;
run;
/*create macrovariable to use this in your prxchange query */
data _null_;
set lookup;
call symputx("value", value);
run;
%put &value;
data final;
set mytable;
suffix = prxchange("s/(.*)(&value)(.*)/$2/i", -1, col1);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.