BookmarkSubscribeRSS Feed
Dsrountree
Obsidian | Level 7

input fullname indicator;
datalines;
LEX HIBEAULT NM
LEX V TEORGE NM
AEX-SUT ENERGY CO

BOBBY N MAJUK CO
;

 

If I need to parse the following data and indicator = NM put scan 1 into first name and scan 2 into last name.

The problem is when a middle initial appears.  I dont want to put V into last name.

It should put Teorge into last name and ignore the V.

 

Also Bobby N Majuk should be parsed as Bobby first name and Majuk last name.  Even when the indicator is CO

 

I tried to write a data set with if then based on evaluating the full name to see if a character after the first scan was = 1, then treat it as first name last name and move each section from full name to first name then last name.

 

Had no luck.

 

 

6 REPLIES 6
Reeza
Super User

Pull out the last part (CO/NM).

 

Then use COUNTW to see the number of words in the string use SCAN to get the first and third strings and if it's 2 use SCAN to get the first and second names.

 

Dsrountree
Obsidian | Level 7

can you write out the data set?

 

Dsrountree
Obsidian | Level 7

@Reeza - Your suggestion seems more logical, but I never used countw in a data set.

Can you write a mock example of how it would parse the following:

 

John A Smith

John Smith

New York Times

Flex Gym

ABC Constructions

 

A contact (user) will either be two syllables (John Smith) or two syllables with a middle initial (John A Smith)

A Company/Business will show up as:

Two syllable - (Flex Gym, ABC Constructions), or three syllable (New York Times)

 

One parsed into these multiple sets it will have to be reviewed manually to validate where the mixture/blend of two syllable contact and company business took place.

 

I forsee this (hot mess) mixing items like Flex Gym and John Smith in the same column due to using a scan/countw.

 

Thanks in advance.

Astounding
PROC Star

A relatively easy way:

 

firstname  = scan(fullname, 1, ' ');

lastname = scan(fullname, -2, ' ');

 

That way you don't have to check for a middle initial.  You get the first word, and the next-to-last word.

HB
Barite | Level 11 HB
Barite | Level 11

Is Gabriel José de la Concordia García Márquez NM in your dataset?

 

Or Casey's General Store CO?

 

 

 

Dsrountree
Obsidian | Level 7
This data is a hot mess...The problem with scan is a company can also be in the column and be 3 syllables.
Ex:ACME Marketing Company

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2692 views
  • 5 likes
  • 4 in conversation