BookmarkSubscribeRSS Feed
Dsrountree
Obsidian | Level 7

input fullname indicator;
datalines;
LEX HIBEAULT NM
LEX V TEORGE NM
AEX-SUT ENERGY CO

BOBBY N MAJUK CO
;

 

If I need to parse the following data and indicator = NM put scan 1 into first name and scan 2 into last name.

The problem is when a middle initial appears.  I dont want to put V into last name.

It should put Teorge into last name and ignore the V.

 

Also Bobby N Majuk should be parsed as Bobby first name and Majuk last name.  Even when the indicator is CO

 

I tried to write a data set with if then based on evaluating the full name to see if a character after the first scan was = 1, then treat it as first name last name and move each section from full name to first name then last name.

 

Had no luck.

 

 

6 REPLIES 6
Reeza
Super User

Pull out the last part (CO/NM).

 

Then use COUNTW to see the number of words in the string use SCAN to get the first and third strings and if it's 2 use SCAN to get the first and second names.

 

Dsrountree
Obsidian | Level 7

can you write out the data set?

 

Dsrountree
Obsidian | Level 7

@Reeza - Your suggestion seems more logical, but I never used countw in a data set.

Can you write a mock example of how it would parse the following:

 

John A Smith

John Smith

New York Times

Flex Gym

ABC Constructions

 

A contact (user) will either be two syllables (John Smith) or two syllables with a middle initial (John A Smith)

A Company/Business will show up as:

Two syllable - (Flex Gym, ABC Constructions), or three syllable (New York Times)

 

One parsed into these multiple sets it will have to be reviewed manually to validate where the mixture/blend of two syllable contact and company business took place.

 

I forsee this (hot mess) mixing items like Flex Gym and John Smith in the same column due to using a scan/countw.

 

Thanks in advance.

Astounding
PROC Star

A relatively easy way:

 

firstname  = scan(fullname, 1, ' ');

lastname = scan(fullname, -2, ' ');

 

That way you don't have to check for a middle initial.  You get the first word, and the next-to-last word.

HB
Barite | Level 11 HB
Barite | Level 11

Is Gabriel José de la Concordia García Márquez NM in your dataset?

 

Or Casey's General Store CO?

 

 

 

Dsrountree
Obsidian | Level 7
This data is a hot mess...The problem with scan is a company can also be in the column and be 3 syllables.
Ex:ACME Marketing Company

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2531 views
  • 5 likes
  • 4 in conversation