Solved: Splitting an Employee_Name (Last Name, First Name) to (First Name Last...

loubams3 · Posted 01-25-2019 01:43 PM

proc sql;
select Manager_Name, Employee_Name, Total_Sales format=COMMA10.2
from (select l.Employee_ID, sum(Total_Retail_Price) as Total_Sales, Employee_Name, Country
from orion.order_fact as l,
orion.employee_addresses as p
where l.Employee_ID=p.Employee_ID and year(Order_Date)=2011
Group by Country, l.Employee_ID, Employee_Name) as l,
(select l.Employee_ID, Manager_ID, Employee_Name as Manager_Name
from orion.employee_organization as l,
orion.employee_addresses as p
where l.Manager_ID=p.Employee_ID) as p
where l.Employee_ID=p.Employee_ID
Order by Country, Manager_Name, Total_Sales desc;
quit;

Employee_Name is shown as Zhou, Tom, but I need it to change it to Tom Zhou

Reeza · Posted 01-25-2019 01:50 PM

If the data is well formatted you can easily use SCAN() as well.

catx(" ", scan(employee_name, 2, ","), scan(employee_name, 1, ",")) as new_name

View solution in original post

PeterClemmensen · Posted 01-25-2019 01:48 PM

Classic use of the PRXCHANGE Function

I stole this example directly from the PRXCHANGE Documentation (Example 2).

data ReversedNames;
   input name & $32.;
   datalines;
Jones, Fred
Kavich, Kate
Turley, Ron
Dulix, Yolanda
Zhou, Tom
;
proc sql;
   create table names as
   select prxchange('s/(\w+), (\w+)/$2 $1/', -1, name) as name
   from ReversedNames;
quit;
proc print data=names;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Reeza · Posted 01-25-2019 01:50 PM

If the data is well formatted you can easily use SCAN() as well.

catx(" ", scan(employee_name, 2, ","), scan(employee_name, 1, ",")) as new_name

novinosrin · Posted 01-25-2019 02:05 PM

@Reeza is brilliant to answer at lightning speed making me jealous while i was testing the same with 50 million records on my machine

here is the log difference. Kudos Reeza

361  proc sql;
362     create table names as
363     select prxchange('s/(\w+), (\w+)/$2 $1/', -1, name) as name
364     from ReversedNames;
NOTE: Table WORK.NAMES created, with 50000000 rows and 1 columns.

365  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           1:39.26
      cpu time            33.20 seconds


366  data names;
367   set ReversedNames;
368   name=catx(', ',scan(name,-1),scan(name,1));
369   run;

NOTE: There were 50000000 observations read from the data set WORK.REVERSEDNAMES.
NOTE: The data set WORK.WANT has 50000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           10.46 seconds
      cpu time            8.90 seconds

@mkeintz mentioned PRX being very slow in another thread and the above is the proof. But Mark if you have a minute, do you know why so? Is there an implementation problem? I am just curious

PaigeMiller · Posted 01-25-2019 02:09 PM

Is it PRX that is slow, or is it doing this operation in SQL is the cause of the slowness? What happens if you try the PRX in a data step?

--
Paige Miller

novinosrin · Posted 01-25-2019 02:11 PM

@PaigeMiller Like I always said, you are full of brains, much better

370  data names;
371   set ReversedNames;
372   name=prxchange('s/(\w+), (\w+)/$2 $1/', -1, name);
373   run;

NOTE: There were 50000000 observations read from the data set WORK.REVERSEDNAMES.
NOTE: The data set WORK.NAMES has 50000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           32.99 seconds
      cpu time            22.29 seconds

PaigeMiller · Posted 01-25-2019 02:19 PM

So PRX is still approximately 3X slower than doing it with the SCAN function.

--
Paige Miller

Reeza · Posted 01-25-2019 02:20 PM

@novinosrin I just use SCAN() because I find it's easier to remember the parameters, has nothing to do with efficiency in terms of computer, but efficiency in terms of typing and my time. I should really just sit down one day and learn them 😄

novinosrin · Posted 01-25-2019 02:25 PM

" it's easier to remember the parameters," sounds like invariably efficient mind. Like @PaigeMiller pointed out "So PRX is still approximately 3X slower than doing it with the SCAN function." plus SQL processor makes it all the more time consuming. Jeez! what if we were to deal with 5 years transaction data as population for customer spends analysis for campaign targeting models. I can only imagine the size of the data would let us have lunch session and coffee session combines and process might still be running 🙂

I learned something from OP, You, draycut and ofcourse Mr. Quantmiller

@Reeza wrote:

@novinosrin I just use SCAN() because I find it's easier to remember the parameters, has nothing to do with efficiency in terms of computer, but efficiency in terms of typing and my time. I should really just sit down one day and learn them 😄

PaigeMiller · Posted 01-25-2019 02:43 PM

I can only imagine the size of the data would let us have lunch session and coffee session ...

The first company to ever allow its employees to have coffee breaks is located about one mile from here in Buffalo, NY. It is the Barcalo Manufacturing Company, now known as Barcalounger (because they also invented the first recliner, the Barcalounger).

--
Paige Miller

novinosrin · Posted 01-25-2019 02:12 PM

Great presence of mind!!!!!!!!

Reeza · Posted 01-25-2019 02:13 PM

https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/162-30.pdf

https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/178-30.pdf

novinosrin · Posted 01-25-2019 02:20 PM

Thank you. Looks interesting. Will take a serious look.

For those who work on large datasets on prod environments, dealing with humongous data

1. SQL processor doesn't seem to compete well with a datastep

2. PRX if not avoidable coz some manipulations are done best with PRX, otherwise stick to SAS functions

Disclaimer: Nonetheless,only extensive testing with rich samples can only confirm

Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Re: Splitting an Employee_Name (Last Name, First Name) to (First Name Last Name)

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away