Macro help for data step automation

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Macro help for data step automation

Can someone please help me create an automation/macro for this data step. Right now, I get data on a daily basis and I have to create fileid every day which is a pain when I get more than 1000 file ids. Is there a way it can automatically pick the total file ids from my stats data to create the %fileid(num=n) datasets, inN datasets and newN datasets. And is there a way to reduce the macro to few lines without filling the editor window. Please help. Many thanks in advance!

/*Comparing for unique VehicleIds*/ /*Update everyday*/

%macro fileid (num=);

proc sort data=tripstats;

by fileid vehicleid;

data fileid#

set stats;

by fileid vehicleid;

if fileid = #

if first.vehicleid then output;

run;

proc sort data=fileid#

by vehicleid;

run;

%mend fileid;

%fileid (num=6);%fileid (num=7);%fileid (num=8);%fileid (num=9);%fileid (num=10);%fileid (num=11);%fileid (num=12);%fileid (num=13);

%fileid (num=14);%fileid (num=15);%fileid (num=16);%fileid (num=17);%fileid (num=18);%fileid (num=19);%fileid (num=20);

proc sort data=stats;

by fileid vehicleid;

data fileid5;

set stats;

by fileid vehicleid;

if fileid=5;

if last.vehicleid then output;

run;

proc sort data=fileid5;

by vehicleid;

run;

/*New vehicles in each fileid*/

title1 "Unique vehicles in each fileid";

data want ;

  merge fileid5 (in=in5 keep=vehicleid)

        fileid6 (in=in6 keep=vehicleid)

        fileid7 (in=in7 keep=vehicleid)

        fileid8 (in=in8 keep=vehicleid)

        fileid9 (in=in9 keep=vehicleid)

        fileid10(in=in10 keep=vehicleid)

  fileid11(in=in11 keep=vehicleid)

  fileid12(in=in12 keep=vehicleid)

  fileid13(in=in13 keep=vehicleid)

  fileid14(in=in14 keep=vehicleid)

  fileid15(in=in15 keep=vehicleid)

  fileid16(in=in16 keep=vehicleid)

  fileid17(in=in17 keep=vehicleid)

  fileid18(in=in18 keep=vehicleid)

  fileid19(in=in19 keep=vehicleid)

  fileid20(in=in20 keep=vehicleid)

...................................

    end=eof

  ;

  by vehicleid;

  if first.vehicleid;

  new6 + (in6 and ^in5);

  new7 + (in7 and ^in6 and ^in5);

  new8 + (in8 and ^in7 and ^in6 and ^in5);

  new9 + (in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new10+ (in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new11+ (in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new12+ (in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new13+ (in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new14+ (in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new15+ (in15 and ^in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new16+ (in16 and ^in15 and ^in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new17+ (in17 and ^in16 and ^in15 and ^in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new18+ (in18 and ^in17 and ^in16 and ^in15 and ^in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new19+ (in19 and ^in18 and ^in17 and ^in16 and ^in15 and ^in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  new20+ (in20 and ^in19 and ^in18 and ^in17 and ^in16 and ^in15 and ^in14 and ^in13 and ^in12 and ^in11 and ^in10 and ^in9 and ^in8 and ^in7 and ^in6 and ^in5);

  .....................

if eof;

  put (new6-new108) (=/);

run;

For more information, this is what I am trying to achieve

https://communities.sas.com/message/185370#185370

Message was edited by: sas lover


Accepted Solutions
Solution
‎11-05-2013 04:57 PM
Super Contributor
Posts: 339

Re: Macro help for data step automation

you could use arrays with variable ranges to trim some of the new{i} variables definitions.

you could trim the macro calls using a read of the fileid variable in your stats dataset. It's difficult to fully solve that puzzle for you as I don't know what drives the subset of fileid which you want to analyze.

A gross example would be:

%macro fileid (min=, max=);

proc sort data=stats;

by fileid vehicleid;

data %do i=&min %to &max; fileid&num %end; ;

set stats;

by fileid vehicleid;

%do i=&min %to &max;

     if first.vehicleid and filedid=&i then output fileid&i;

%end;

run;

/* this creates each individual dataset in a single step (and what's even better, in a single pass through STATS dataset) */

%do i=&min %to &max;

proc sort data=fileid&i;

by vehicleid;

run;

%end;

/* Lets include the &min-1 fileid */

/* stats was already sorted so no need to resort */

data fileid%eval(&min-1);

set stats;

by fileid vehicleid;

if last.vehicleid and fileid=%eval(&min-1) then output;

run;

proc sort data=fileid%eval(&min-1);

by vehicleid;

run;

/* Now onto improving your big data step. - let's keep on using min and max params */

data want ;

  merge

     %do i=%eval(&min-1) %to &max;

          fileid&i (in=in&i keep=vehicleid)

     %end;

    end=eof

     ;

  by vehicleid;

  if first.vehicleid; /* NOTE: I BELIEVE THIS STATEMENT IS REDUNDANT AS ALL YOUR FILEID DATASET SHOULD HAVE ONLY EXACTLY ONE OCCURENCE OF EACH VEHICLE BY THEIR CONSTRUCT */

  %do i=&min %to &max;

     new&i+(in&i

          %do j=%eval(&min-1) %to %eval(&i-1);

               and ^in&j

          %end;

          );

   %end;

if eof then put (new&min-new&max)(=/); /* could also be rewritten with a %do %end block */

run;

%mend fileid;

%fileid(min=6, max=108);

Anyway, I've merely used the power of macros to write the code for you using a min and max value. If your conditions to include or not an ID are more complex and/or stored somehow in a dataset, it can be worked around but you'd need to be far more specific about your desired results.

Hopefully the above gives you some insight on how to use macro loops and %eval to write all the tedious code following the same logic for you.

Vincent

P.S. I didn't really look into improving code efficiency but I'm sure others will contribute in that way.

View solution in original post


All Replies
Solution
‎11-05-2013 04:57 PM
Super Contributor
Posts: 339

Re: Macro help for data step automation

you could use arrays with variable ranges to trim some of the new{i} variables definitions.

you could trim the macro calls using a read of the fileid variable in your stats dataset. It's difficult to fully solve that puzzle for you as I don't know what drives the subset of fileid which you want to analyze.

A gross example would be:

%macro fileid (min=, max=);

proc sort data=stats;

by fileid vehicleid;

data %do i=&min %to &max; fileid&num %end; ;

set stats;

by fileid vehicleid;

%do i=&min %to &max;

     if first.vehicleid and filedid=&i then output fileid&i;

%end;

run;

/* this creates each individual dataset in a single step (and what's even better, in a single pass through STATS dataset) */

%do i=&min %to &max;

proc sort data=fileid&i;

by vehicleid;

run;

%end;

/* Lets include the &min-1 fileid */

/* stats was already sorted so no need to resort */

data fileid%eval(&min-1);

set stats;

by fileid vehicleid;

if last.vehicleid and fileid=%eval(&min-1) then output;

run;

proc sort data=fileid%eval(&min-1);

by vehicleid;

run;

/* Now onto improving your big data step. - let's keep on using min and max params */

data want ;

  merge

     %do i=%eval(&min-1) %to &max;

          fileid&i (in=in&i keep=vehicleid)

     %end;

    end=eof

     ;

  by vehicleid;

  if first.vehicleid; /* NOTE: I BELIEVE THIS STATEMENT IS REDUNDANT AS ALL YOUR FILEID DATASET SHOULD HAVE ONLY EXACTLY ONE OCCURENCE OF EACH VEHICLE BY THEIR CONSTRUCT */

  %do i=&min %to &max;

     new&i+(in&i

          %do j=%eval(&min-1) %to %eval(&i-1);

               and ^in&j

          %end;

          );

   %end;

if eof then put (new&min-new&max)(=/); /* could also be rewritten with a %do %end block */

run;

%mend fileid;

%fileid(min=6, max=108);

Anyway, I've merely used the power of macros to write the code for you using a min and max value. If your conditions to include or not an ID are more complex and/or stored somehow in a dataset, it can be worked around but you'd need to be far more specific about your desired results.

Hopefully the above gives you some insight on how to use macro loops and %eval to write all the tedious code following the same logic for you.

Vincent

P.S. I didn't really look into improving code efficiency but I'm sure others will contribute in that way.

Super Contributor
Posts: 339

Re: Macro help for data step automation

Posted in reply to Vince28_Statcan

On a side note, it occurs to me that what you want to achieve can be done directly easily through a few simple steps as follow:

proc sort data=stats;

     by fileid vehicleid;

run;

data temp;

     set stats(where=(fileid between 5 and 108));

     by filedid vehicleid;

     if first.vehicleid then output;

run;

proc sort data=temp;

     by vehicleid fileid;

run;

data temp2;

     set temp;

     by vehicleid;

     if first.vehicleid then output;

run;

proc sql;

     select fileid, count(vehicleid);

     from temp2;

     group by fileid

     ;

quit; /* output to output window instead of log... */

The first data step outputs a single record per filedid, vehicleid. Re-sorting by vehicleid, the second datastep keeps the first fileid occurence for each vehicleid thus, simply using aggregate functions should achieve the same total as the huge macro above with clever use of sorting.

Vincent

Contributor
Posts: 24

Re: Macro help for data step automation

Posted in reply to Vince28_Statcan

Thanks Vince28. It works. I appreciate the help.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 252 views
  • 0 likes
  • 2 in conversation