DATA Step, Macro, Functions and more

Help me to generalize this code

Accepted Solution Solved
Reply
Regular Contributor
Posts: 171
Accepted Solution

Help me to generalize this code

Suppose I have a data set analogous to the following:

DATA TEST;

INFILE DATALINES MISSOVER DSD;

INPUT (KEY PROC1-PROC3) ($);

DATALINES;

1A, SQL, GBAR, FREQ

1A, , , MEANS

1A, , APPEND

2B, , , APPEND

3C, COMPARE, SHEWHART, PWENCODE

3C, IML

4D, , APPEND

4D, APPEND, GLM, REG

5E, SQL

5E, , GPLOT

;

RUN;

I want to generate a macro variable containing all the distinct values of KEY where one of the PROC variables is equal to APPEND.  This PROC SQL code does the trick for exactly three PROC variables:

%LET APPEND_KEYS = ;

PROC SQL NOPRINT;

     SELECT DISTINCT KEY INTO:APPEND_KEYS SEPARATED BY '" "'

     FROM TEST

     WHERE FINDW(CATX(' ', PROC1, PROC2, PROC3), 'APPEND');

QUIT;

%PUT "&APPEND_KEYS";

/* On the log appears "1A" "2B" "4D" */

Now suppose that I want to generalize this code to work with an arbitrary number of PROC variables on the TEST data set.  I know that I can accomplish this by writing a SAS macro, but I'm hoping that there is a better way.  Therefore, I'm not interested in seeing any %MACRO solutions.  Here is what I've done so far using the DATA step:

%LET APPEND_KEYS = ;

DATA _NULL_;

     SET TEST;

     BY KEY;

     LENGTH ALL_PROCS $ 32767;

     RETAIN ALL_PROCS ' ';

     ALL_PROCS = CATX(' ', ALL_PROCS, CATX(' ', OF PROCSmiley Happy);

     IF LAST.KEY THEN DO;

          IF FINDW(ALL_PROCS, 'APPEND') THEN CALL SYMPUTX("APPEND_KEYS", CATX('" "', RESOLVE('&APPEND_KEYS'), KEY));

          CALL MISSING(ALL_PROCS);

     END;

RUN;

%PUT "&APPEND_KEYS";

/* On the log appears "1A" "2B" "4D" */

The results are the same for both the DATA step and PROC SQL, but there are some problems with the DATA step:

  1. On my actual dataset the DATA step takes much longer to execute than the PROC SQL step, probably because of the WHERE clause utilized by PROC SQL.  The problem is that the CATX(' ', OF PROCSmiley Happy syntax is not supported in a WHERE statement.  Can anyone think of an alternate approach that would be faster?
  2. The DATA step above produces these really annoying notes in the log about the meaning of an identifier after a quoted string changing in a future release of SAS.  For the sample data I provided, the note only appears once.  However, using my actual data set, the note appears many times and fills up a sizable chunk of my log window.  I don't understand exactly which part of my code is causing that note to appear.

Thanks in advance for any ideas.


Accepted Solutions
Solution
‎11-30-2011 11:58 AM
Trusted Advisor
Posts: 1,301

Re: Help me to generalize this code

14         data foo;

15          infile datalines missover dsd;

16          input (key proc1-proc3) ($);

17          cards;

1A,SQL,GBAR,FREQ

1A,,,MEANS

1A,,APPEND

2B,,,APPEND

3C,COMPARE,SHEWHART,PWENCODE

3C,IML

4D,,APPEND,

4D,APPEND,GLM,REG

5E,SQL,,

5E,,GPLOT

NOTE: The data set WORK.FOO has 10 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

28         ;

29         run;

30        

31         data _null_;

32          length all_procs keys $32000;

33          if done then call symputx("append_keys",keys);

34          set foo end=done;

35          by key;

36          retain keys '';

37          if first.key then flag=0;

38           all_procs=catx(' ',of procSmiley Happy;

39           if findw(all_procs,'APPEND') then flag+1;

40          if last.key and flag>0 then keys=catx('" "',of keys key);

41         run;

NOTE: There were 10 observations read from the data set WORK.FOO.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

42         %put "&append_keys";

"1A" "2B" "4D"

View solution in original post


All Replies
Solution
‎11-30-2011 11:58 AM
Trusted Advisor
Posts: 1,301

Re: Help me to generalize this code

14         data foo;

15          infile datalines missover dsd;

16          input (key proc1-proc3) ($);

17          cards;

1A,SQL,GBAR,FREQ

1A,,,MEANS

1A,,APPEND

2B,,,APPEND

3C,COMPARE,SHEWHART,PWENCODE

3C,IML

4D,,APPEND,

4D,APPEND,GLM,REG

5E,SQL,,

5E,,GPLOT

NOTE: The data set WORK.FOO has 10 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

28         ;

29         run;

30        

31         data _null_;

32          length all_procs keys $32000;

33          if done then call symputx("append_keys",keys);

34          set foo end=done;

35          by key;

36          retain keys '';

37          if first.key then flag=0;

38           all_procs=catx(' ',of procSmiley Happy;

39           if findw(all_procs,'APPEND') then flag+1;

40          if last.key and flag>0 then keys=catx('" "',of keys key);

41         run;

NOTE: There were 10 observations read from the data set WORK.FOO.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

42         %put "&append_keys";

"1A" "2B" "4D"

Regular Contributor
Posts: 171

Help me to generalize this code

Thanks FriedEgg!  Your code produced the same result, got rid of all the annoying NOTEs, and actually executed a lot faster.  

I would still like to understand why my DATA step produced these NOTEs in the first place.  Also, which portion of my DATA step was so grossly inefficient?  At first glance, I would have thought that FriedEgg's code and mine would have run in a comparable amount of time, but his code is actually much faster.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 129 views
  • 1 like
  • 2 in conversation