proc sql;
select Manager_Name, Employee_Name, Total_Sales format=COMMA10.2
from (select l.Employee_ID, sum(Total_Retail_Price) as Total_Sales, Employee_Name, Country
from orion.order_fact as l,
orion.employee_addresses as p
where l.Employee_ID=p.Employee_ID and year(Order_Date)=2011
Group by Country, l.Employee_ID, Employee_Name) as l,
(select l.Employee_ID, Manager_ID, Employee_Name as Manager_Name
from orion.employee_organization as l,
orion.employee_addresses as p
where l.Manager_ID=p.Employee_ID) as p
where l.Employee_ID=p.Employee_ID
Order by Country, Manager_Name, Total_Sales desc;
quit;
Employee_Name is shown as Zhou, Tom, but I need it to change it to Tom Zhou
If the data is well formatted you can easily use SCAN() as well.
catx(" ", scan(employee_name, 2, ","), scan(employee_name, 1, ",")) as new_name
Classic use of the PRXCHANGE Function
I stole this example directly from the PRXCHANGE Documentation (Example 2).
data ReversedNames;
input name & $32.;
datalines;
Jones, Fred
Kavich, Kate
Turley, Ron
Dulix, Yolanda
Zhou, Tom
;
proc sql;
create table names as
select prxchange('s/(\w+), (\w+)/$2 $1/', -1, name) as name
from ReversedNames;
quit;
proc print data=names;
run;
If the data is well formatted you can easily use SCAN() as well.
catx(" ", scan(employee_name, 2, ","), scan(employee_name, 1, ",")) as new_name
@Reeza is brilliant to answer at lightning speed making me jealous while i was testing the same with 50 million records on my machine
here is the log difference. Kudos Reeza
361 proc sql; 362 create table names as 363 select prxchange('s/(\w+), (\w+)/$2 $1/', -1, name) as name 364 from ReversedNames; NOTE: Table WORK.NAMES created, with 50000000 rows and 1 columns. 365 quit; NOTE: PROCEDURE SQL used (Total process time): real time 1:39.26 cpu time 33.20 seconds 366 data names; 367 set ReversedNames; 368 name=catx(', ',scan(name,-1),scan(name,1)); 369 run; NOTE: There were 50000000 observations read from the data set WORK.REVERSEDNAMES. NOTE: The data set WORK.WANT has 50000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 10.46 seconds cpu time 8.90 seconds
@mkeintz mentioned PRX being very slow in another thread and the above is the proof. But Mark if you have a minute, do you know why so? Is there an implementation problem? I am just curious
Is it PRX that is slow, or is it doing this operation in SQL is the cause of the slowness? What happens if you try the PRX in a data step?
@PaigeMiller Like I always said, you are full of brains, much better
370 data names; 371 set ReversedNames; 372 name=prxchange('s/(\w+), (\w+)/$2 $1/', -1, name); 373 run; NOTE: There were 50000000 observations read from the data set WORK.REVERSEDNAMES. NOTE: The data set WORK.NAMES has 50000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 32.99 seconds cpu time 22.29 seconds
So PRX is still approximately 3X slower than doing it with the SCAN function.
@novinosrin I just use SCAN() because I find it's easier to remember the parameters, has nothing to do with efficiency in terms of computer, but efficiency in terms of typing and my time. I should really just sit down one day and learn them 😄
" it's easier to remember the parameters," sounds like invariably efficient mind. Like @PaigeMiller pointed out "So PRX is still approximately 3X slower than doing it with the SCAN function." plus SQL processor makes it all the more time consuming. Jeez! what if we were to deal with 5 years transaction data as population for customer spends analysis for campaign targeting models. I can only imagine the size of the data would let us have lunch session and coffee session combines and process might still be running 🙂
I learned something from OP, You, draycut and ofcourse Mr. Quantmiller
@Reeza wrote:
@novinosrin I just use SCAN() because I find it's easier to remember the parameters, has nothing to do with efficiency in terms of computer, but efficiency in terms of typing and my time. I should really just sit down one day and learn them 😄
I can only imagine the size of the data would let us have lunch session and coffee session ...
The first company to ever allow its employees to have coffee breaks is located about one mile from here in Buffalo, NY. It is the Barcalo Manufacturing Company, now known as Barcalounger (because they also invented the first recliner, the Barcalounger).
Great presence of mind!!!!!!!!
Thank you. Looks interesting. Will take a serious look.
For those who work on large datasets on prod environments, dealing with humongous data
1. SQL processor doesn't seem to compete well with a datastep
2. PRX if not avoidable coz some manipulations are done best with PRX, otherwise stick to SAS functions
Disclaimer: Nonetheless,only extensive testing with rich samples can only confirm
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.