BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Rahim_221
Calcite | Level 5

Hi,

I have a character variable named "Title and Authors" that contains a long text and I want to extract a certain text which is last Author name that lies between comma and period as shown in RED the example below:

 

Title and Authors

1- Minervina AA, Pogorelyy MV, Kirk AM, Crawford JC, Allen EK, Chou CH, Mettelman RC, Allison KJ, Lin CY, Brice DC, Zhu X, Vegesana K, Wu G, Trivedi S, Kottapalli P, Darnell D, McNeely S, Olsen SR, Schultz-Cherry S, McGargill MA, Wolf J, Thomas PG. SARS-CoV-2 antigen exposure history shapes phenotypes and specificity of memory CD8(+) T cells.

 

2- Bauler M, Roberts JK, Wu CC, Fan B, Ferrara F, Yip BH, Diao S, Kim YI, Moore J, Zhou S, Wielgosz MM, Ryu B, Throm RE. Production of lentiviral vectors using suspension cells grown in serum-free media.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Since the style seems to be to not have periods for author initials it looks pretty simple to find the last author.

data have;
  infile cards truncover;
  input line $500.;
cards4;
1- Minervina AA, Pogorelyy MV, Kirk AM, Crawford JC, Allen EK, Chou CH, Mettelman RC, Allison KJ, Lin CY, Brice DC, Zhu X, Vegesana K, Wu G, Trivedi S, Kottapalli P, Darnell D, McNeely S, Olsen SR, Schultz-Cherry S, McGargill MA, Wolf J, Thomas PG. SARS-CoV-2 antigen exposure history shapes phenotypes and specificity of memory CD8(+) T cells.
2- Bauler M, Roberts JK, Wu CC, Fan B, Ferrara F, Yip BH, Diao S, Kim YI, Moore J, Zhou S, Wielgosz MM, Ryu B, Throm RE. Production of lentiviral vectors using suspension cells grown in serum-free media.
;;;;

data want;
  length author $80;
  set have;
  author=left(scan(scan(line,1,'.'),-1,','));
run;

Tom_0-1692314662179.png

 

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

Below should work if the desired substring is always right after the 2nd period in your source string.

data null;
  string='2- Bauler M, Roberts JK, Wu CC, Fan B, Ferrara F, Yip BH, Diao S, Kim YI, Moore J, Zhou S, Wielgosz MM, Ryu B, Throm RE. Production of lentiviral vectors using suspension cells grown in serum-free media.';
  length lastAuthor $40 listAuthors $200;
  listAuthors=scan(string,-2,'.');
  lastAuthor=scan(listAuthors,-1,',');
  put lastAuthor=;
run;
Rahim_221
Calcite | Level 5

Thank you. The code you provided worked on only the second observation, however I have about 1900 rows/observations that I want to extract the last author's name from. Is there a way to make this code work on all observations?

Patrick
Opal | Level 21

The length for listAuthors was too short for your first sample. If you increase the length then the code as posted works for both rows of your sample data.

Patrick_0-1692313990094.png

 

If it's going to work for all your data depends on the assumption that the text after the Authors doesn't contain some "embedded" period.

Tom
Super User Tom
Super User

Since the style seems to be to not have periods for author initials it looks pretty simple to find the last author.

data have;
  infile cards truncover;
  input line $500.;
cards4;
1- Minervina AA, Pogorelyy MV, Kirk AM, Crawford JC, Allen EK, Chou CH, Mettelman RC, Allison KJ, Lin CY, Brice DC, Zhu X, Vegesana K, Wu G, Trivedi S, Kottapalli P, Darnell D, McNeely S, Olsen SR, Schultz-Cherry S, McGargill MA, Wolf J, Thomas PG. SARS-CoV-2 antigen exposure history shapes phenotypes and specificity of memory CD8(+) T cells.
2- Bauler M, Roberts JK, Wu CC, Fan B, Ferrara F, Yip BH, Diao S, Kim YI, Moore J, Zhou S, Wielgosz MM, Ryu B, Throm RE. Production of lentiviral vectors using suspension cells grown in serum-free media.
;;;;

data want;
  length author $80;
  set have;
  author=left(scan(scan(line,1,'.'),-1,','));
run;

Tom_0-1692314662179.png

 

Rahim_221
Calcite | Level 5
Thank you so much. It worked.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 785 views
  • 0 likes
  • 3 in conversation