- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a dataset similar to the following that contains a text(a single word or phrase) variable. The strings are either in English or French.
Is there a way to flag the English words?
data list;
input name $20.;
datalines;
Côté
Boucher
Fournier
Cats
how to register
morning
Thibeault
Martin
Vaudron
Girard
Hello;
run;
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
May not be possible with just words out of context, but you could try incorporating Python. Take a look at: https://www.probytes.net/blog/python-language-detection/
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data list;
input name $20.;
flag=prxmatch('/[^a-z]/i',compress(name,,'ka'))>0;
datalines;
Côté
Boucher
Fournier
Cats
how to register
morning
Thibeault
Martin
Vaudron
Girard
Hello
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My French is pretty rusty but I do remember that a moderate number of nouns are the same in both French and English.
So without the articles the / a or le/ la /les/ un / une or similar clue those are going to be very problematic.
Some adjectives, grand, for example are going to be worse.
I would hesitate to assign any name to a specific language as the French and English have been interacting for so long names go both ways (and spelling gets butchered)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @parmis ,
I know this is an answer that comes after 2 years :), but felt that you may derive some benefit nevertheless, knowledge at the least. In Jan of this year, SAS released a language identification action as part of its Viya platform. Here are details on how it works :
regards,
Sundaresh