DATA Step, Macro, Functions and more

arrays with first.

Reply
Regular Contributor
Posts: 171

arrays with first.

Is there any way to use array variable references in conjunction with first.variable processing? I want to write a datastep that is somewhat analogous to the following:

[pre]
data new;
set old;
by variables:;
array variables{*} variables:;
do i=1 to dim(variables);
if first.variables{i} then do;

end;
end;
run;
[/pre]
[pre]
[/pre]
Of course, this code will produce an error due to the syntax of “first.variables{i}.” Does anyone have any alternatives to make this code work?
Super Contributor
Super Contributor
Posts: 3,174

Re: arrays with first.

Consider using SAS macro code with your BY variable list explicitly referenced as a %LET string, parse the list with %SCAN(...), to generate the specific DATA step DO/END code piece using %DO/%END logic.

As mentioned, you cannot reference an array variable when using FIRST. so you will need to generate the SAS code with explicit BY variables, each mentioned individually as above.

Scott Barry
SBBWorks, Inc.
Regular Contributor
Posts: 171

Re: arrays with first.

Thanks Scott. I had thought about writing a macro to accomplish this task, but I was hoping that there would be an easier alternative. I appreciate your advice.
Occasional Contributor
Posts: 13

Re: arrays with first.

* J.D. Opdyke, President, DataMineIt;
* Below is an actual solution to your question;

options label symbolgen fullstimer yearcutoff=1950 nocenter ls = 256 ps = 51 msymtabmax=max mprint mlogic minoperator mindelimiter=' ' cleanup;

data old(sortedby=a b c);
do a=1 to 2;
do b=11 to 12;
do c = 21 to 22;
do d = 100 to 104;
output;
end;
end;
end;
end;
run;
proc print data=old;
run;

*** The code below assumes that what you want to do is simply "output"
The output statement is just a placehold for whatever code you
want to put there.

*** If you need to extract all the byvars from the old dataset,
just use a proc contents and a proc sql
***;

%macro firstdotbyvars(indata=, outdata=, byvars=);

%let num_byvars = %sysfunc(countw(&byvars.));
%do i=1 %to &num_byvars.;
%let byvar&i. = %scan(&byvars.,&i.);
%end;

%macro iffirst;
if first.&byvar1. then do;
output;
end;
%if &num_byvars.>1 %then %do;
%do i=2 %to &num_byvars.;
else if first.&&byvar&i. then do;
output;
end;
%end;
%end;
%mend;

data new;
set old;
by &byvars.;
%iffirst
run;

proc print data=new;
run;
%mend firstdotbyvars;

%firstdotbyvars(indata=old, outdata=new, byvars=a b c);
Regular Contributor
Posts: 171

Re: arrays with first.

Thanks for posting this solution. I appreciate the help.
Super Contributor
Super Contributor
Posts: 3,174

Re: arrays with first.

I doubt the OP is going to want to use CONTENTS with all variables from a given SAS file in a BY list - more likely a subset list and that's why I would consider the %LET definition (explicitly) to be more appropriate.

Scott Barry
SBBWorks, Inc.
Frequent Contributor
Posts: 101

Re: arrays with first.

I think you misunderstood. You can use the output data set of PROC CONTENTS to get the by variables. The sortedby variable contains the numerical order of the by variables, if any. So you can programatically build the list of by variables in the correct order for any dataset without any prior knowledge of it. Quick example that anyone can run:

proc sort data=sashelp.cars out=cars;
by origin make drivetrain;
run;

proc contents noprint data=cars out=cars_contents;
run;

proc sql noprint;
select name
into :byvars separated by ' '
from cars_contents
where sortedby is not null
order by sortedby
;
quit;

%put byvars=&byvars;


> I doubt the OP is going to want to use CONTENTS with
> all variables from a given SAS file in a BY list -
> more likely a subset list and that's why I would
> consider the %LET definition (explicitly) to be more
> appropriate.
>
> Scott Barry
> SBBWorks, Inc.
Occasional Contributor
Posts: 13

Re: arrays with first.

FYI - Using a proc contents, outputting a dataset, and using a proc sql to get all the variables into a macro string will get around Scott's need to explicitly type out all the by-variable names.
Occasional Contributor
Posts: 13

Re: arrays with first.

My actual solution allows the user to do it either way.

The macro allows the user to type them out Scott.

OR if serious automation is required, then if the user has thought ahead and named the key byvariables similarly, say, with a common prefix, then it is a trivial matter to drop those outputted by the proc contents that don’t matter Scott.

And then it is fully automated, which is typically much more appropriate, unless the task is a small one-off little piece of code that can be banged out in 6 minutes.

J.D. Opdyke, President, DataMineIt
Respected Advisor
Posts: 3,799

Re: arrays with first.

You can't use the exact syntax you suggest because you can't create an array of mixed data types, or reference FIRST.array[index] as you have already determined. You CAN however create an array of FIRST or LAST variables.

The example doesn't really do any thing by illustrate the syntax.

[pre]
proc plan ordered;
factors a=3 first=2 c=3 i=4 / noprint;
output out=test first cvals=('a' 'b');
run;
quit;

options validvarname=any;
data test2;
set test;
by a first c;
array f
  • 'first.'n:;
    put 'NOTE: ' _n_= @;
    do j = 1 to dim(f);
    if f then put f= @;
    end;
    put;
    run;
    options validvarname=v7;
    [/pre]

    Fixed OUTPUT statement in PROC PLAN.


    Message was edited by: data _null_;
  • Regular Contributor
    Posts: 171

    Re: arrays with first.

    Posted in reply to data_null__
    data _null_,

    Wow! I thought that the validvarname option was only for compatibility with DBMS variable names. This is exactly what I was hoping to find!
    Respected Advisor
    Posts: 3,799

    Re: arrays with first.

    Without validvarname=any you can still create an array of first.var

    array f
  • first:;

    But there is more possibility of problems from user variables that begin with FIRST. I "never" use validvarname=any except for very special situations so the technique should be mostly safe.
  • Regular Contributor
    Posts: 241

    Re: arrays with first.

    Posted in reply to data_null__
    I think data _null_'s solution naturally leads to a simpler solution below. It makes a number of assumptions, of course, but may work out when you have a reasonable idea about what the variable names are in the input dataset.



       /* test data */


       proc plan ordered;


         factors var1=2 var2=2 var3=3 / noprint;


         output out=old;


       run;


       quit;


     


       /* do something when we hit any first.var */


       data new;


         set old;


         by var:;


         array firsts(*) first:;


     


         drop i;


         do i = 1 to dim(firsts);


           put firsts(i) 2. @;


         end;


         put;


       run;


       /* on log


       1 1 1


       0 0 1


       0 0 1


       0 1 1


       0 0 1


       0 0 1


       1 1 1


       0 0 1


       0 0 1


       0 1 1


       0 0 1


       0 0 1


       */




    p.s. Well, ... did not know that data _null_ has already posted a follow up with exactly the same idea moments earlier... :-(

    Message was edited by: chang_y_chung@hotmail.com
    Respected Advisor
    Posts: 3,799

    Re: arrays with first.

    Posted in reply to chang_y_chung_hotmail_com
    I had written it your way first, as that was what I remembered from our conversations on SAS-L on this subject. The problem with the simplier approach is user define variables that begin with FIRST using 'FIRST.'n is a bit safer.
    Occasional Contributor
    Posts: 13

    Re: arrays with first.

    In fairness, polingjw did state, “Of course, this code will produce an error due to the syntax of “first.variables{i}.” ”

    J.D.’s code above works, as does data _null_’s (although purists (I’d say prudes) might complain about the WARNING: note in data _null_’s SAS .log).
    Ask a Question
    Discussion stats
    • 24 replies
    • 261 views
    • 0 likes
    • 6 in conversation