Dear SAS users,
I have a list 32K articles and want to have first author's country based on their domain of email information.
For example,
| Country | |
| xxxx@lumc.nl | Netherlands |
| xxxx@hku.hk | Hong Kong |
Am I able to use any SAS to have column "Country" from "email" column?
Thanks for your suggestions in advance,
BR,
Epi
Yes within some limits:
proc format lib=work;
value $country
'nl' = 'Netherland'
'hk' = 'Hong Kong'
....
; run;
data want;
set have;
code = scan(email,-1,'.');
country = putc(code,$country.);
run;
Pay attention that in some emails there may be no country code,
as in xxxxxx@gmail.com or xxxxxx@yahoo.com etc.
Thanks Shmuel, So, I have to list all countries like 'nl'='Netherlands'.
What if there are more than one email addresses in one cell? I only need to use the first email address. Please advise me how I can deal in that case.
Thanks again,
Epi
1) Create the format with as many countries as you can add a line with
proc format lib=work;
value $country
'nl' = 'Netherland'
'hk' = 'Hong Kong'
....
other = "*unKnown*" /* choose any text you prefer */
; run;
2) In case there are more than one email, you need to select which is preferred or add a country per email (email1-country1 email2-country2 ...)
If you post a sample of your data in a form of a data step with input from datalines. I'll be able
to propose some available code to deal with.
Dear Shmuel,
Sorry for my delayed response. Thanks a lot.
I have 32K data like this. I wanna add one column with country name based on domain of first email address. If domain is like .edu; .com etc then leave as empty.
Thanks again,
| Type | Authors | Name | Source | Language | Author Keywords | Email Addresses |
| J | Jo, S; Kim, S; Shin, DH; Kim, MS | Jo, Seri; Kim, Suwon; Shin, Dong Hae; Kim, Mi-Sun | JOURNAL OF ENZYME INHIBITION AND MEDICINAL CHEMISTRY | English | SARS-CoV; SARS-CoV 3CLpro; flavonoid; FRET; inhibitory compounds | dhshin55@ewha.ac.kr; shfwk31@ewha.ac.kr |
| J | Beury, D; Flechon, L; Maurier, F; Caboche, S; Varre, JS; Touzet, H; Faure, K; Dubuisson, J; Hot, D; Guery, B; Goffard, A | Beury, Delphine; Flechon, Lea; Maurier, Florence; Caboche, Segolene; Varre, Jean-Stephane; Touzet, Helene; Faure, Karine; Dubuisson, Jean; Hot, David; Guery, Benoit; Goffard, Anne | JOURNAL OF CLINICAL VIROLOGY | English | Human coronavirus; Whole genome sequencing; Hematopoietic cell transplant; Care-associated infections; Phylogenetic analyses | delphine.beury@pasteur-lille.fr; lea.flechon.etu@univ-lille.fr; florence.maurier@pasteur-lille.fr; segolene.caboche@pasteur-lille.fr; jean-stephane.varre@lifl.fr; helene.touzet@lifl.fr; karine.faure@univ-lille.fr; jean.dubuisson@ibl.cnrs.fr; david.hot@pasteur-lille.fr; Benoit.Guery@unil.ch; anne.goffard@univ-lille.fr |
| J | Checovich, MM; Barlow, S; Shult, P; Reisdorf, E; Temte, JL | Checovich, Mary M.; Barlow, Shari; Shult, Peter; Reisdorf, Erik; Temte, Jonathan L. | JOURNAL OF THE AMERICAN MEDICAL DIRECTORS ASSOCIATION | English | Long-term care; acute respiratory infection; rapid influenza diagnostic testing; influenza | mary.checovich@fammed.wisc.edu |
@Epid wrote:
Thanks Shmuel, So, I have to list all countries like 'nl'='Netherlands'.
What if there are more than one email addresses in one cell? I only need to use the first email address. Please advise me how I can deal in that case.
Thanks again,
Epi
I am sure that icann (or maybe mozilla) has a usable list that you can import and use to create the format.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.