Help using Base SAS procedures

more name string

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

more name string

How to divide the whole name column into last name, first name, middle name columns? Thanks!

have:

FIGLIULO,MARGARET A
WHEELER,TANYA
WALDRON,VIVIAN A

want:

LAST NAMEFIRST_NAMEMIDDLE NAME
FIGLIULOMARGARETA
WHEELERTANYA
WALDRONVIVIAN A

Accepted Solutions
Solution
‎09-09-2013 01:53 PM
PROC Star
Posts: 7,356

Re: more name string

Any exceptions beyond those shown in the code below?

data have;

  informat name $80.;

  input name &;

  cards;

FIGLIULO,MARGARET A

WHEELER,TANYA

WALDRON,VIVIAN A

LEE,KHALID

LEWIS,G. P

MC NAVARRETE,SAUL

OKOCHA,MICHELLE JR

MC RANKIN,EBONY F

SHIRA,PAUL

SINGH,JASBIR

LEWIS, G. P.

DR. MC DERMIT,JOHN,SR.

MRS. FIGLIULO,MARGARET A

MS WHEELER,TANYA JANE

MS WALDRON,VIVIAN A II

MR LEE,KHALID

MS LEWIS,G. P

MR MC NAVARRETE,SAUL

MRS. OKOCHA,MICHELLE JR

MISS MC RANKIN,EBONY F

DR. SHIRA,PAUL

MR. SINGH,JASBIR

MR. LEWIS, G. P.

DR. MC DERMIT,JOHN,III

;

data want (drop=_Smiley Happy;

  set have;

  _i=1;

  _j=0;

  name=compress(name,".");

  do until (scan(name,_i+_j," ,") eq "");

    select (_i);

      when (1) do;

        last=scan(name,_i+_j," ,");

        if last in ("MC") then do;

       _j+1;

          last=catx(' ',last,scan(name,_i+_j," ,"));

        end;

        else if last in ("MR","MRS","MS","MISS","DR") then do;

       _j+1;

          _i=_i-1;

          prefix=last;

          call missing(last);

        end;

    end;

      when (2) first=scan(name,_i+_j," ,");

      otherwise do;

        if scan(name,_i+_j," ,") in ("JR","SR","II","III","IV") then

          suffix=scan(name,_i+_j," ,");

        else middle=scan(name,_i+_j," ,");

      end;

    end;

    _i+1;

  end;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,356

Re: more name string

One way would be to parse the string using the scan function.  e.g.:

data have;

  informat name $80.;

  input name &;

  cards;

FIGLIULO,MARGARET A

WHEELER,TANYA

WALDRON,VIVIAN A

;

data want;

  set have;

  _n_=1;

  do until (scan(name,_n_," ,") eq "");

    select (_n_);

      when (1) last=scan(name,_n_," ,");

      when (2) first=scan(name,_n_," ,");

      otherwise middle=scan(name,_n_," ,");

    end;

    _n_+1;

  end;

run;

Super Contributor
Posts: 307

Re: more name string

Contributor
Posts: 22

Re: more name string

One more approach using Substr and Index function.

data y1;

input name $ 1-40;

cards;

FIGLIULO,MARGARET Apple

WHEELER,TANYA

WALDRON,VIVIAN A

;

run;

data y2;

set y1;

firstname= substr(name,1,index(name,',')-1);

lastname=substr(name,index(name,',')+1,index(name,' ')-index(name,',')-1);

middlename=substr(name,index(name,' '),index(name,''));

run;


Super User
Posts: 10,466

Re: more name string

Have you confirmed that all of your data has a comma between names? Do you have titles like Mr Mrs Ms or things like Junior, II, III or IV?

Contributor
Posts: 42

Re: more name string

The data is actually more complicate, many types of names, like the ones with MC NAVARRETE, OR MICHELLE JR...

LEE,KHALID
LEWIS,G. P
MC NAVARRETE,SAUL
OKOCHA,MICHELLE JR
MC RANKIN,EBONY F
SHIRA,PAUL
SINGH,JASBIR
Contributor
Posts: 42

Re: more name string

OR LIKE LEWIS,G. P, middle name in the middle, not at last....

Solution
‎09-09-2013 01:53 PM
PROC Star
Posts: 7,356

Re: more name string

Any exceptions beyond those shown in the code below?

data have;

  informat name $80.;

  input name &;

  cards;

FIGLIULO,MARGARET A

WHEELER,TANYA

WALDRON,VIVIAN A

LEE,KHALID

LEWIS,G. P

MC NAVARRETE,SAUL

OKOCHA,MICHELLE JR

MC RANKIN,EBONY F

SHIRA,PAUL

SINGH,JASBIR

LEWIS, G. P.

DR. MC DERMIT,JOHN,SR.

MRS. FIGLIULO,MARGARET A

MS WHEELER,TANYA JANE

MS WALDRON,VIVIAN A II

MR LEE,KHALID

MS LEWIS,G. P

MR MC NAVARRETE,SAUL

MRS. OKOCHA,MICHELLE JR

MISS MC RANKIN,EBONY F

DR. SHIRA,PAUL

MR. SINGH,JASBIR

MR. LEWIS, G. P.

DR. MC DERMIT,JOHN,III

;

data want (drop=_Smiley Happy;

  set have;

  _i=1;

  _j=0;

  name=compress(name,".");

  do until (scan(name,_i+_j," ,") eq "");

    select (_i);

      when (1) do;

        last=scan(name,_i+_j," ,");

        if last in ("MC") then do;

       _j+1;

          last=catx(' ',last,scan(name,_i+_j," ,"));

        end;

        else if last in ("MR","MRS","MS","MISS","DR") then do;

       _j+1;

          _i=_i-1;

          prefix=last;

          call missing(last);

        end;

    end;

      when (2) first=scan(name,_i+_j," ,");

      otherwise do;

        if scan(name,_i+_j," ,") in ("JR","SR","II","III","IV") then

          suffix=scan(name,_i+_j," ,");

        else middle=scan(name,_i+_j," ,");

      end;

    end;

    _i+1;

  end;

run;

Contributor
Posts: 42

Re: more name string

Thank you very much for your great help!

Super User
Posts: 10,466

Re: more name string

At least my garbage name data didn't split the MC off of names like MCDERMOT.

Any names like O'DONNEL or O'NEIL?

How about VAN DYKE?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 333 views
  • 0 likes
  • 5 in conversation