BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PavSingh
Calcite | Level 5

Hi colleagues,

 

I have a text variable that has multiple commas -- sometimes two commas, sometimes three commas. I'd like to extract the text between the commas but can't seem to find the right code to do that efficiently. I'm including a few examples of the variables below. Any suggestions will be greatly appreciated!

 

Have:

Press Attaché and Country Information Officer, Office of the U.S. Ambassador to Pakistan, U.S. Embassy in Pakistan, United States Department of State

 

Want

Press Attache and Country Information Officer | Office of the U.S. Ambassador to Pakistan | U.S. Embassy in Pakistan | United States Department of State

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@PavSingh wrote:

My mistake in detailing my challenge. I'd like to extract the text between the commas, so they land in individual column variables. 


One way as an example:

data example;
   x="Press Attaché and Country Information Officer, Office of the U.S. Ambassador to Pakistan, U.S. Embassy in Pakistan, United States Department of State";
   Array v (15) $ 50;
   do i=1 to countw(x,',');
      v[i]=scan(x,i,',');
   end;
   drop i;
run;

The array is to create a number of variables to store each value in. In this case I have picked an arbitrary number of 15 variables, which would be named v1, v1, ... v15. Each can hold a maximum of 50 characters, the $ 50. If you need more or fewer variables adjust the number in parentheses on the Array statement. If the values need to be longer increase the 50 or reduce.

The Countw function in this use only counts commas as the delimiter be between "words", so we can use that to extract each "word" using the SCAN function, which again has been told to only use the comma as delimiter.

 

View solution in original post

6 REPLIES 6
andreas_lds
Jade | Level 19

So, you don't want to extract anything, but replace a comma followed by a blank by " | ". Have a look at the function tranwrd, it exists for such tasks.

PavSingh
Calcite | Level 5

My mistake in detailing my challenge. I'd like to extract the text between the commas, so they land in individual column variables. 

ballardw
Super User

@PavSingh wrote:

My mistake in detailing my challenge. I'd like to extract the text between the commas, so they land in individual column variables. 


One way as an example:

data example;
   x="Press Attaché and Country Information Officer, Office of the U.S. Ambassador to Pakistan, U.S. Embassy in Pakistan, United States Department of State";
   Array v (15) $ 50;
   do i=1 to countw(x,',');
      v[i]=scan(x,i,',');
   end;
   drop i;
run;

The array is to create a number of variables to store each value in. In this case I have picked an arbitrary number of 15 variables, which would be named v1, v1, ... v15. Each can hold a maximum of 50 characters, the $ 50. If you need more or fewer variables adjust the number in parentheses on the Array statement. If the values need to be longer increase the 50 or reduce.

The Countw function in this use only counts commas as the delimiter be between "words", so we can use that to extract each "word" using the SCAN function, which again has been told to only use the comma as delimiter.

 

PavSingh
Calcite | Level 5

Thank you, Ballardw.

 

As I've studied the idiosyncrasies of my data, I'd love to get your advice. Basically I have a column of data that includes a person's career history. The jobs are separated by semicolons. Generally, all the jobs have Titles. Some of the Jobs have an Office (or multiple offices) and an Organization (all have an organization). Some have the years of service, contained in parenthesis, some don't have the dates of service. Below is an example of the text that is in the column that I'm trying to distill.

 

Example Text:

"United States Army Officer, Office of Strategy, United States Department of the Army (1980-2009); Highly Qualified Expert, United States Department of the Army (2009-2011); Director, U.S. Army Heritage and Education Center, United States Department of the Army, United States Department of Defense; Assistant Chief, Museum Support Center, U.S. Army Center of Military History, United States Department of Defense; Member, Department of Defense Historical Advisory Committee, Office of the Secretary of Defense, United States Department of Defense (2011-2012); Director and Chief of Military History, U.S. Army Center of Military History, Office of the Administrative Assistant to the Secretary, United States Department of the Army (2011-2014); Chair, The Commission, World War I Centennial Commission; Acting Secretary, American Battle Monuments Commission (2017-2018)"

 

 

Ultimately, I'd like to end up with as close an approximation as possible to the following:

 

Title: United States Army Officer

Office 1: Office of Strategy

Organization: United States Department of the Army

Start Year: 1980

End Year: 2009

 

Title: Highly Qualified Expert

Organization: United States Department of the Army

Start Year: 2009

End Year: 2011

 

Title: Director

Office1: U.S. Army Heritage and Education Center

Office2: United States Department of the Army

Organization: United States Department of Defense

Start Year:

End Year:

 

I appreciate your guidance!

ballardw
Super User

If there is ALWAYS a semicolon between the "jobs" then it is the same thing basically. You just need another loop outside of the example that will select the jobs.

 

But a working example of code would require showing what the resulting DATA SET looks like.

I might be tempted to make one observation per JOB for clarity.

 

data example;
   x="United States Army Officer, Office of Strategy, United States Department of the Army (1980-2009); Highly Qualified Expert, United States Department of the Army (2009-2011); Director, U.S. Army Heritage and Education Center, United States Department of the Army, United States Department of Defense; Assistant Chief, Museum Support Center, U.S. Army Center of Military History, United States Department of Defense; Member, Department of Defense Historical Advisory Committee, Office of the Secretary of Defense, United States Department of Defense (2011-2012); Director and Chief of Military History, U.S. Army Center of Military History, Office of the Administrative Assistant to the Secretary, United States Department of the Army (2011-2014); Chair, The Commission, World War I Centennial Commission; Acting Secretary, American Battle Monuments Commission (2017-2018)";
   Array v (15) $ 50;
   length jobtext $ 200;
   do jobnum=1 to countw(x,';');
      jobtext = scan(x,jobnum,';');
      do i=1 to countw(jobtext,',');
         v[i]=scan(jobtext,i,',');
      end;
      output;
   end;
   drop i;
run;

What we cannot do by code because each "job" does not have the the same fields is tell what the ones between "title" and "organization" may be because there are differing numbers of values based on your rules of "separate by comma".

You can try to parse out years from the last "word" in the JOBTEXT (i.e. when i=countw(jobtext,','); IF the data is regular enough it just another application of SCAN with ( - and ) used as the delimiters (no "countw" needed for that). However if there are ever any other uses of ( - or ) character that may not work.

Manoharan141
Calcite | Level 5

I hope this one help, because you want to replace ',' in string into '|' 

 

data dataset1;
input value $ 1-150;
datalines;
Press Attache and Country Information Officer, Office of the U.S. Ambassador to Pakistan, U.S. Embassy in Pakistan, United States Department of State
;
run;

data dataset2;
set dataset1;
value = tranwrd(value,',','|');
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3314 views
  • 0 likes
  • 4 in conversation