Currently, I have a datastep that accesses a very large table, and it must later be sorted by three variables, which takes forever.
DATA SAMPLES_DATA ;
SET DATABASE.SAMPLES (RENAME=(SMPL_ID=SAMP_NUM STN_NM=STATION ANLYT_NM=ANALYTE)) ;
IF PROJ_NM ='Project 1 old name' then PROJ_NM =’Project 1 new name' ;
RUN ;
…later on…
PROC SORT DATA=DATABASE.SAMPLES ; BY SAMP_NUM ANALYTE STATION ; RUN ;
I’m trying to come up with a solution using SQL, so I can use an ORDER BY that runs much faster.
PROC SQL NOPRINT ;
CREATE TABLE SAMPLES_SQL
AS SELECT *,
SMPL_ID AS SAMP_NUM,
STN_NM AS STATION,
ANLYT_NM AS ANALYTE
FROM DATABASE.SAMPLES
ORDER BY SAMP_NUM, ANALYTE, STATION ;
QUIT;
However, I’m stumped as how to change the name of Project 1 in PROC SQL. I played with the CASE statement, but if I understand the documentation correctly, all values of PROJ_NM have to be represented. There are lots of projects, with new ones added on a fairly frequent basis, so that’s not a very good solution.
Thanks for any advice you can give me.
Wendy T.