Dear SAS users,
I have a list 32K articles and want to have first author's country based on their domain of email information.
For example,
Country | |
xxxx@lumc.nl | Netherlands |
xxxx@hku.hk | Hong Kong |
Am I able to use any SAS to have column "Country" from "email" column?
Thanks for your suggestions in advance,
BR,
Epi
Yes within some limits:
proc format lib=work;
value $country
'nl' = 'Netherland'
'hk' = 'Hong Kong'
....
; run;
data want;
set have;
code = scan(email,-1,'.');
country = putc(code,$country.);
run;
Pay attention that in some emails there may be no country code,
as in xxxxxx@gmail.com or xxxxxx@yahoo.com etc.
Thanks Shmuel, So, I have to list all countries like 'nl'='Netherlands'.
What if there are more than one email addresses in one cell? I only need to use the first email address. Please advise me how I can deal in that case.
Thanks again,
Epi
1) Create the format with as many countries as you can add a line with
proc format lib=work;
value $country
'nl' = 'Netherland'
'hk' = 'Hong Kong'
....
other = "*unKnown*" /* choose any text you prefer */
; run;
2) In case there are more than one email, you need to select which is preferred or add a country per email (email1-country1 email2-country2 ...)
If you post a sample of your data in a form of a data step with input from datalines. I'll be able
to propose some available code to deal with.
Dear Shmuel,
Sorry for my delayed response. Thanks a lot.
I have 32K data like this. I wanna add one column with country name based on domain of first email address. If domain is like .edu; .com etc then leave as empty.
Thanks again,
Type | Authors | Name | Source | Language | Author Keywords | Email Addresses |
J | Jo, S; Kim, S; Shin, DH; Kim, MS | Jo, Seri; Kim, Suwon; Shin, Dong Hae; Kim, Mi-Sun | JOURNAL OF ENZYME INHIBITION AND MEDICINAL CHEMISTRY | English | SARS-CoV; SARS-CoV 3CLpro; flavonoid; FRET; inhibitory compounds | dhshin55@ewha.ac.kr; shfwk31@ewha.ac.kr |
J | Beury, D; Flechon, L; Maurier, F; Caboche, S; Varre, JS; Touzet, H; Faure, K; Dubuisson, J; Hot, D; Guery, B; Goffard, A | Beury, Delphine; Flechon, Lea; Maurier, Florence; Caboche, Segolene; Varre, Jean-Stephane; Touzet, Helene; Faure, Karine; Dubuisson, Jean; Hot, David; Guery, Benoit; Goffard, Anne | JOURNAL OF CLINICAL VIROLOGY | English | Human coronavirus; Whole genome sequencing; Hematopoietic cell transplant; Care-associated infections; Phylogenetic analyses | delphine.beury@pasteur-lille.fr; lea.flechon.etu@univ-lille.fr; florence.maurier@pasteur-lille.fr; segolene.caboche@pasteur-lille.fr; jean-stephane.varre@lifl.fr; helene.touzet@lifl.fr; karine.faure@univ-lille.fr; jean.dubuisson@ibl.cnrs.fr; david.hot@pasteur-lille.fr; Benoit.Guery@unil.ch; anne.goffard@univ-lille.fr |
J | Checovich, MM; Barlow, S; Shult, P; Reisdorf, E; Temte, JL | Checovich, Mary M.; Barlow, Shari; Shult, Peter; Reisdorf, Erik; Temte, Jonathan L. | JOURNAL OF THE AMERICAN MEDICAL DIRECTORS ASSOCIATION | English | Long-term care; acute respiratory infection; rapid influenza diagnostic testing; influenza | mary.checovich@fammed.wisc.edu |
@Epid wrote:
Thanks Shmuel, So, I have to list all countries like 'nl'='Netherlands'.
What if there are more than one email addresses in one cell? I only need to use the first email address. Please advise me how I can deal in that case.
Thanks again,
Epi
I am sure that icann (or maybe mozilla) has a usable list that you can import and use to create the format.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.