Hello,
I have a dataset similar to the following that contains a text(a single word or phrase) variable. The strings are either in English or French.
Is there a way to flag the English words?
data list;
input name $20.;
datalines;
Côté
Boucher
Fournier
Cats
how to register
morning
Thibeault
Martin
Vaudron
Girard
Hello;
run;
Thank you!
May not be possible with just words out of context, but you could try incorporating Python. Take a look at: https://www.probytes.net/blog/python-language-detection/
Art, CEO, AnalystFinder.com
data list;
input name $20.;
flag=prxmatch('/[^a-z]/i',compress(name,,'ka'))>0;
datalines;
Côté
Boucher
Fournier
Cats
how to register
morning
Thibeault
Martin
Vaudron
Girard
Hello
;
run;
My French is pretty rusty but I do remember that a moderate number of nouns are the same in both French and English.
So without the articles the / a or le/ la /les/ un / une or similar clue those are going to be very problematic.
Some adjectives, grand, for example are going to be worse.
I would hesitate to assign any name to a specific language as the French and English have been interacting for so long names go both ways (and spelling gets butchered)
Hi @parmis ,
I know this is an answer that comes after 2 years :), but felt that you may derive some benefit nevertheless, knowledge at the least. In Jan of this year, SAS released a language identification action as part of its Viya platform. Here are details on how it works :
regards,
Sundaresh
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.