BookmarkSubscribeRSS Feed
Dsrountree
Obsidian | Level 7

input fullname indicator;
datalines;
LEX HIBEAULT NM
LEX V TEORGE NM
AEX-SUT ENERGY CO

BOBBY N MAJUK CO
;

 

If I need to parse the following data and indicator = NM put scan 1 into first name and scan 2 into last name.

The problem is when a middle initial appears.  I dont want to put V into last name.

It should put Teorge into last name and ignore the V.

 

Also Bobby N Majuk should be parsed as Bobby first name and Majuk last name.  Even when the indicator is CO

 

I tried to write a data set with if then based on evaluating the full name to see if a character after the first scan was = 1, then treat it as first name last name and move each section from full name to first name then last name.

 

Had no luck.

 

 

6 REPLIES 6
Reeza
Super User

Pull out the last part (CO/NM).

 

Then use COUNTW to see the number of words in the string use SCAN to get the first and third strings and if it's 2 use SCAN to get the first and second names.

 

Dsrountree
Obsidian | Level 7

can you write out the data set?

 

Dsrountree
Obsidian | Level 7

@Reeza - Your suggestion seems more logical, but I never used countw in a data set.

Can you write a mock example of how it would parse the following:

 

John A Smith

John Smith

New York Times

Flex Gym

ABC Constructions

 

A contact (user) will either be two syllables (John Smith) or two syllables with a middle initial (John A Smith)

A Company/Business will show up as:

Two syllable - (Flex Gym, ABC Constructions), or three syllable (New York Times)

 

One parsed into these multiple sets it will have to be reviewed manually to validate where the mixture/blend of two syllable contact and company business took place.

 

I forsee this (hot mess) mixing items like Flex Gym and John Smith in the same column due to using a scan/countw.

 

Thanks in advance.

Astounding
PROC Star

A relatively easy way:

 

firstname  = scan(fullname, 1, ' ');

lastname = scan(fullname, -2, ' ');

 

That way you don't have to check for a middle initial.  You get the first word, and the next-to-last word.

HB
Barite | Level 11 HB
Barite | Level 11

Is Gabriel José de la Concordia García Márquez NM in your dataset?

 

Or Casey's General Store CO?

 

 

 

Dsrountree
Obsidian | Level 7
This data is a hot mess...The problem with scan is a company can also be in the column and be 3 syllables.
Ex:ACME Marketing Company

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2128 views
  • 5 likes
  • 4 in conversation