BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
peppapig
Calcite | Level 5

How to divide the whole name column into last name, first name, middle name columns? Thanks!

have:

FIGLIULO,MARGARET A
WHEELER,TANYA
WALDRON,VIVIAN A

want:

LAST NAMEFIRST_NAMEMIDDLE NAME
FIGLIULOMARGARETA
WHEELERTANYA
WALDRONVIVIAN A
1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Any exceptions beyond those shown in the code below?

data have;

  informat name $80.;

  input name &;

  cards;

FIGLIULO,MARGARET A

WHEELER,TANYA

WALDRON,VIVIAN A

LEE,KHALID

LEWIS,G. P

MC NAVARRETE,SAUL

OKOCHA,MICHELLE JR

MC RANKIN,EBONY F

SHIRA,PAUL

SINGH,JASBIR

LEWIS, G. P.

DR. MC DERMIT,JOHN,SR.

MRS. FIGLIULO,MARGARET A

MS WHEELER,TANYA JANE

MS WALDRON,VIVIAN A II

MR LEE,KHALID

MS LEWIS,G. P

MR MC NAVARRETE,SAUL

MRS. OKOCHA,MICHELLE JR

MISS MC RANKIN,EBONY F

DR. SHIRA,PAUL

MR. SINGH,JASBIR

MR. LEWIS, G. P.

DR. MC DERMIT,JOHN,III

;

data want (drop=_:);

  set have;

  _i=1;

  _j=0;

  name=compress(name,".");

  do until (scan(name,_i+_j," ,") eq "");

    select (_i);

      when (1) do;

        last=scan(name,_i+_j," ,");

        if last in ("MC") then do;

       _j+1;

          last=catx(' ',last,scan(name,_i+_j," ,"));

        end;

        else if last in ("MR","MRS","MS","MISS","DR") then do;

       _j+1;

          _i=_i-1;

          prefix=last;

          call missing(last);

        end;

    end;

      when (2) first=scan(name,_i+_j," ,");

      otherwise do;

        if scan(name,_i+_j," ,") in ("JR","SR","II","III","IV") then

          suffix=scan(name,_i+_j," ,");

        else middle=scan(name,_i+_j," ,");

      end;

    end;

    _i+1;

  end;

run;

View solution in original post

9 REPLIES 9
art297
Opal | Level 21

One way would be to parse the string using the scan function.  e.g.:

data have;

  informat name $80.;

  input name &;

  cards;

FIGLIULO,MARGARET A

WHEELER,TANYA

WALDRON,VIVIAN A

;

data want;

  set have;

  _n_=1;

  do until (scan(name,_n_," ,") eq "");

    select (_n_);

      when (1) last=scan(name,_n_," ,");

      when (2) first=scan(name,_n_," ,");

      otherwise middle=scan(name,_n_," ,");

    end;

    _n_+1;

  end;

run;

yeshwanth
Fluorite | Level 6

One more approach using Substr and Index function.

data y1;

input name $ 1-40;

cards;

FIGLIULO,MARGARET Apple

WHEELER,TANYA

WALDRON,VIVIAN A

;

run;

data y2;

set y1;

firstname= substr(name,1,index(name,',')-1);

lastname=substr(name,index(name,',')+1,index(name,' ')-index(name,',')-1);

middlename=substr(name,index(name,' '),index(name,''));

run;


ballardw
Super User

Have you confirmed that all of your data has a comma between names? Do you have titles like Mr Mrs Ms or things like Junior, II, III or IV?

peppapig
Calcite | Level 5

The data is actually more complicate, many types of names, like the ones with MC NAVARRETE, OR MICHELLE JR...

LEE,KHALID
LEWIS,G. P
MC NAVARRETE,SAUL
OKOCHA,MICHELLE JR
MC RANKIN,EBONY F
SHIRA,PAUL
SINGH,JASBIR
peppapig
Calcite | Level 5

OR LIKE LEWIS,G. P, middle name in the middle, not at last....

art297
Opal | Level 21

Any exceptions beyond those shown in the code below?

data have;

  informat name $80.;

  input name &;

  cards;

FIGLIULO,MARGARET A

WHEELER,TANYA

WALDRON,VIVIAN A

LEE,KHALID

LEWIS,G. P

MC NAVARRETE,SAUL

OKOCHA,MICHELLE JR

MC RANKIN,EBONY F

SHIRA,PAUL

SINGH,JASBIR

LEWIS, G. P.

DR. MC DERMIT,JOHN,SR.

MRS. FIGLIULO,MARGARET A

MS WHEELER,TANYA JANE

MS WALDRON,VIVIAN A II

MR LEE,KHALID

MS LEWIS,G. P

MR MC NAVARRETE,SAUL

MRS. OKOCHA,MICHELLE JR

MISS MC RANKIN,EBONY F

DR. SHIRA,PAUL

MR. SINGH,JASBIR

MR. LEWIS, G. P.

DR. MC DERMIT,JOHN,III

;

data want (drop=_:);

  set have;

  _i=1;

  _j=0;

  name=compress(name,".");

  do until (scan(name,_i+_j," ,") eq "");

    select (_i);

      when (1) do;

        last=scan(name,_i+_j," ,");

        if last in ("MC") then do;

       _j+1;

          last=catx(' ',last,scan(name,_i+_j," ,"));

        end;

        else if last in ("MR","MRS","MS","MISS","DR") then do;

       _j+1;

          _i=_i-1;

          prefix=last;

          call missing(last);

        end;

    end;

      when (2) first=scan(name,_i+_j," ,");

      otherwise do;

        if scan(name,_i+_j," ,") in ("JR","SR","II","III","IV") then

          suffix=scan(name,_i+_j," ,");

        else middle=scan(name,_i+_j," ,");

      end;

    end;

    _i+1;

  end;

run;

peppapig
Calcite | Level 5

Thank you very much for your great help!

ballardw
Super User

At least my garbage name data didn't split the MC off of names like MCDERMOT.

Any names like O'DONNEL or O'NEIL?

How about VAN DYKE?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1401 views
  • 0 likes
  • 5 in conversation