Suppose I have a data set analogous to the following:
DATA TEST;
INFILE DATALINES MISSOVER DSD;
INPUT (KEY PROC1-PROC3) ($);
DATALINES;
1A, SQL, GBAR, FREQ
1A, , , MEANS
1A, , APPEND
2B, , , APPEND
3C, COMPARE, SHEWHART, PWENCODE
3C, IML
4D, , APPEND
4D, APPEND, GLM, REG
5E, SQL
5E, , GPLOT
;
RUN;
I want to generate a macro variable containing all the distinct values of KEY where one of the PROC variables is equal to APPEND. This PROC SQL code does the trick for exactly three PROC variables:
%LET APPEND_KEYS = ;
PROC SQL NOPRINT;
SELECT DISTINCT KEY INTO:APPEND_KEYS SEPARATED BY '" "'
FROM TEST
WHERE FINDW(CATX(' ', PROC1, PROC2, PROC3), 'APPEND');
QUIT;
%PUT "&APPEND_KEYS";
/* On the log appears "1A" "2B" "4D" */
Now suppose that I want to generalize this code to work with an arbitrary number of PROC variables on the TEST data set. I know that I can accomplish this by writing a SAS macro, but I'm hoping that there is a better way. Therefore, I'm not interested in seeing any %MACRO solutions. Here is what I've done so far using the DATA step:
%LET APPEND_KEYS = ;
DATA _NULL_;
SET TEST;
BY KEY;
LENGTH ALL_PROCS $ 32767;
RETAIN ALL_PROCS ' ';
ALL_PROCS = CATX(' ', ALL_PROCS, CATX(' ', OF PROC:));
IF LAST.KEY THEN DO;
IF FINDW(ALL_PROCS, 'APPEND') THEN CALL SYMPUTX("APPEND_KEYS", CATX('" "', RESOLVE('&APPEND_KEYS'), KEY));
CALL MISSING(ALL_PROCS);
END;
RUN;
%PUT "&APPEND_KEYS";
/* On the log appears "1A" "2B" "4D" */
The results are the same for both the DATA step and PROC SQL, but there are some problems with the DATA step:
Thanks in advance for any ideas.
14 data foo;
15 infile datalines missover dsd;
16 input (key proc1-proc3) ($);
17 cards;
1A,SQL,GBAR,FREQ
1A,,,MEANS
1A,,APPEND
2B,,,APPEND
3C,COMPARE,SHEWHART,PWENCODE
3C,IML
4D,,APPEND,
4D,APPEND,GLM,REG
5E,SQL,,
5E,,GPLOT
NOTE: The data set WORK.FOO has 10 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
28 ;
29 run;
30
31 data _null_;
32 length all_procs keys $32000;
33 if done then call symputx("append_keys",keys);
34 set foo end=done;
35 by key;
36 retain keys '';
37 if first.key then flag=0;
38 all_procs=catx(' ',of proc:);
39 if findw(all_procs,'APPEND') then flag+1;
40 if last.key and flag>0 then keys=catx('" "',of keys key);
41 run;
NOTE: There were 10 observations read from the data set WORK.FOO.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
42 %put "&append_keys";
"1A" "2B" "4D"
14 data foo;
15 infile datalines missover dsd;
16 input (key proc1-proc3) ($);
17 cards;
1A,SQL,GBAR,FREQ
1A,,,MEANS
1A,,APPEND
2B,,,APPEND
3C,COMPARE,SHEWHART,PWENCODE
3C,IML
4D,,APPEND,
4D,APPEND,GLM,REG
5E,SQL,,
5E,,GPLOT
NOTE: The data set WORK.FOO has 10 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
28 ;
29 run;
30
31 data _null_;
32 length all_procs keys $32000;
33 if done then call symputx("append_keys",keys);
34 set foo end=done;
35 by key;
36 retain keys '';
37 if first.key then flag=0;
38 all_procs=catx(' ',of proc:);
39 if findw(all_procs,'APPEND') then flag+1;
40 if last.key and flag>0 then keys=catx('" "',of keys key);
41 run;
NOTE: There were 10 observations read from the data set WORK.FOO.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
42 %put "&append_keys";
"1A" "2B" "4D"
Thanks FriedEgg! Your code produced the same result, got rid of all the annoying NOTEs, and actually executed a lot faster.
I would still like to understand why my DATA step produced these NOTEs in the first place. Also, which portion of my DATA step was so grossly inefficient? At first glance, I would have thought that FriedEgg's code and mine would have run in a comparable amount of time, but his code is actually much faster.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.