Hello,
I have a dataset similar to the following that contains a text(a single word or phrase) variable. The strings are either in English or French.
Is there a way to flag the English words?
data list;
input name $20.;
datalines;
Côté
Boucher
Fournier
Cats
how to register
morning
Thibeault
Martin
Vaudron
Girard
Hello;
run;
Thank you!
May not be possible with just words out of context, but you could try incorporating Python. Take a look at: https://www.probytes.net/blog/python-language-detection/
Art, CEO, AnalystFinder.com
data list;
input name $20.;
flag=prxmatch('/[^a-z]/i',compress(name,,'ka'))>0;
datalines;
Côté
Boucher
Fournier
Cats
how to register
morning
Thibeault
Martin
Vaudron
Girard
Hello
;
run;
My French is pretty rusty but I do remember that a moderate number of nouns are the same in both French and English.
So without the articles the / a or le/ la /les/ un / une or similar clue those are going to be very problematic.
Some adjectives, grand, for example are going to be worse.
I would hesitate to assign any name to a specific language as the French and English have been interacting for so long names go both ways (and spelling gets butchered)
Hi @parmis ,
I know this is an answer that comes after 2 years :), but felt that you may derive some benefit nevertheless, knowledge at the least. In Jan of this year, SAS released a language identification action as part of its Viya platform. Here are details on how it works :
regards,
Sundaresh
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.