Re: A macro needs to be run for 12 000 runners and produce one dataset...

ERJORD · Posted 09-26-2022 01:27 PM

I have written the macro below, but am new at this. I have run this for one runner but need to do this for 12 000 runners. So I probably need to do this in a do loop.

The 1st data set a contains the times each runner passes over 6 time mats during a race (time_lynn--time_fini. I selected one runner's times (runner 54915)

The 2nd data set ww contain the weather data measured on the day and this data is merged with the runner data to provide the weather values (WBGT) at each of the 6 mats. The 6 WBGT values form a curve for each runner and then the Area under the curve (AUC) is calculated for each runner into his dataset auc54915. My program produce the correct AUC value for an individual runner.

The AUC datasets needs to be combined for all 12 000 runners.

***********************************MACRO FROM HERE*********************************;
%macro wbgt(runner=54915);
****SELECT ONE FINISHER*********;
data one;set a(keep=race_number time_lynn--time_fini);
if race_number=&runner;
drop race_number;
run;

*****TRANSPOSE DATA FOR SELECTED RUNNER***********;
proc transpose data=one prefix=time out=onet;
run;
****RUNNER TIMES AT EACH OF THE 6 WEATHER STATIONS*********;
data onett;set onet(rename=time1=time);
if _NAME_='time_wins' then delete;
if _NAME_='time_lynn' then namen=1;
if _NAME_='time_cato' then namen=2;
if _NAME_='time_drum' then namen=3;
if _NAME_='time_pine' then namen=4;
if _NAME_='time_sher' then namen=5;
if _NAME_='time_fini' then namen=6;
drop _NAME_;
run;
proc sort data=onett;
by namen time;
run;

*************COMBINE THE INDIV RUNNER MAT TIMES WITH THE WEATHER@EACH MAT TIME;
proc sql;
create table wbgt_6indiv&runner as
select a.time, a.namen, b.namen, b.wbgt, b.time as wbgt_t format=time8.2
from onett a
left join
ww b
on a.namen=b.namen
where b.time>=a.time
group by a.time,a.namen
having abs(a.time-b.time)=min(abs(a.time-b.time))
order by a.namen, b.namen descending;
quit;
***CALCULATE THE AUC FOR THE SELECTED RUNNER;
data auc&runner;set wbgt_6indiv&runner;
runner=&runner;
retain sum 0;
sum_wbgt=((lag(wbgt)+wbgt)/2)*(wbgt_t-lag(wbgt_t)) ;
if _N_>1 then sum=sum_wbgt+sum;
if _N_=6 then output;
run;
%mend;
%wbgt;

PaigeMiller · Posted 09-26-2022 01:42 PM

I don't see a need for a macro. Why can't you just run the code (with some small modifications) of the data set with 12,000 runners? Then, all these macro issues go away.

For example, the PROC TRANSPOSE can be performed with a BY statement, all 12,000 runners' data will get transposed at once. Then data set ONETT can be run on all 12,000 runners' data at once and SQL can be run on all 12,000 runners' data at once.

--
Paige Miller

ERJORD · Posted 09-26-2022 01:49 PM

Paige I tried your solution but then I will have a 6 x 12 000 dataset with each column referencing one runner. I have to merge the individual runner record (6 x 1) with the weather data that best match his times at each mat time

PaigeMiller · Posted 09-26-2022 01:51 PM

Proc Transpose would result in a data set with 12,000 records if you use BY RUNNER;

However, if it is not helping, then probably transpose is not needed here, you can skip it and then somehow drop down to the later steps, but I continue to believe (in fact, I am convinced) that macros are not needed.

In fact, this is one of the major advantages of the SAS system, you can process data like this without writing loops. There are many features in SAS that enable processing of individual records across many steps without explicitly writing loops.

If you provide some sample data for like 3 runners, people here can try to develop code without macros to do what you want. Please provide the data as working SAS data step code, which you can type in yourself, or you can follow these instructions.

--
Paige Miller

ballardw · Posted 09-26-2022 01:55 PM

You should provide a small example of the data sets involved.

I suspect that no macro is needed. BY group processing would likely get what you need but can't tell for sure without example data.

Proc sort data=a;
   by race_number;
run;

proc transpose data=a (keep=race_number tim_lynn-tim_fini)
        out=onet;
    by race_number; 
run;

Then calculate your variables in Onett the same.

Change the sort to:

proc sort data=onett;
by race_number namen time;
run;

And finish with something like

proc sql;
   create table wbgt_6indiv as
   select a.race_number, a.time, a.namen, b.namen, b.wbgt, b.time as wbgt_t format=time8.2
   from onett a
   left join
   ww b
   on a.namen=b.namen
   where b.time>=a.time
   group by a.race_number, a.time,a.namen
   having abs(a.time-b.time)=min(abs(a.time-b.time))
   order by a.race_number, a.namen, b.namen descending;
quit;
data auc;
   set wbgt_6indiv;
   by race_number;
   runner=race_number;
   retain sum 0;
   lt = lag(wbgt);
   ltt = lag(wbgt_t);
   if first.race_number then do;
      sum=0;
      sum_wbgt=(wbgt)/2)*(wbgt_t)) ;
   end;
   else do;
      sum_wbgt=((lt)+wbgt)/2)*(wbgt_t-ltt)) ;
   end
   sum=sum_wbgt+sum;
   if last.race_number then output;
run;

ERJORD · Posted 09-26-2022 02:01 PM

Example of data set a
Race_number time_lynn time_cato time_drum time_pine time_sher time_fini
28951 6:28:53 7:22:32 8:13:22 9:40:28 10:24:47 11:00:37
17472 6:28:52 7:22:29 8:13:22 9:40:28 10:25:52 11:03:45
54169 6:29:31 7:22:32 8:13:23 9:42:11 10:30:13 11:06:24
31020 6:28:52 7:22:30 8:13:23 9:42:51 10:31:07 11:08:06
19952 6:28:53 7:22:33 8:13:32 9:47:14 10:35:03 11:11:35
30807 6:28:54 7:22:33 8:13:25 9:41:33 10:31:41 11:13:27
28773 6:28:52 7:22:32 8:13:24 9:41:07 10:29:52 11:14:56
44452 6:29:10 7:22:39 8:13:31 9:49:06 10:38:40 11:16:09
56834 6:28:53 7:22:28 8:15:04 9:51:15 10:40:29 11:17:03
29774 6:28:54 7:22:33 8:13:27 9:42:18 10:36:43 11:17:19

PaigeMiller · Posted 09-26-2022 02:14 PM

I'm not going to try to rewrite all of your code, but I still don't see anything that stops working if you do it all at once. You might need to add in something to your SQL so it operates by race_number rather than expecting a single race number, but this output data set ONET from my brief code should be usable in its entirety in each of your subsequent steps.

data have;
input Race_number (time_lynn time_cato time_drum time_pine time_sher time_fini) (time8.);
cards;
28951 6:28:53 7:22:32 8:13:22 9:40:28 10:24:47 11:00:37
17472 6:28:52 7:22:29 8:13:22 9:40:28 10:25:52 11:03:45
54169 6:29:31 7:22:32 8:13:23 9:42:11 10:30:13 11:06:24
31020 6:28:52 7:22:30 8:13:23 9:42:51 10:31:07 11:08:06
19952 6:28:53 7:22:33 8:13:32 9:47:14 10:35:03 11:11:35
30807 6:28:54 7:22:33 8:13:25 9:41:33 10:31:41 11:13:27
28773 6:28:52 7:22:32 8:13:24 9:41:07 10:29:52 11:14:56
44452 6:29:10 7:22:39 8:13:31 9:49:06 10:38:40 11:16:09
56834 6:28:53 7:22:28 8:15:04 9:51:15 10:40:29 11:17:03
29774 6:28:54 7:22:33 8:13:27 9:42:18 10:36:43 11:17:19
;

proc transpose data=have prefix=time out=onet; 
    by race_number notsorted;
run;

--
Paige Miller

ERJORD · Posted 09-26-2022 02:08 PM

section of the weather data that needs to be matched for stations/time-this section is only for the 1st weather station. I need to match for station and time to be able to get the correct WBGT value at each time mat
Lynnfield Park 6:21:00 11.65303143 1
Lynnfield Park 6:22:00 11.69803661 1
Lynnfield Park 6:23:00 11.71745095 1
Lynnfield Park 6:24:00 11.37899908 1
Lynnfield Park 6:25:00 11.4383722 1
Lynnfield Park 6:26:00 11.50659054 1
Lynnfield Park 6:27:00 11.29974837 1
Lynnfield Park 6:28:00 11.6058625 1
Lynnfield Park 6:29:00 11.56815185 1
Lynnfield Park 6:30:00 11.91617546 1
Lynnfield Park 6:31:00 11.77515562 1
Lynnfield Park 6:32:00 11.9490733 1
Lynnfield Park 6:33:00 11.77267187 1
Lynnfield Park 6:34:00 11.66682024 1
Lynnfield Park 6:35:00 11.4392493 1
Lynnfield Park 6:36:00 11.10735457 1
Lynnfield Park 6:37:00 10.70403504 1
Lynnfield Park 6:38:00 10.55250209 1
Lynnfield Park 6:39:00 10.37133156 1
Lynnfield Park 6:40:00 10.24038217 1
Lynnfield Park 6:41:00 10.22140107 1
Lynnfield Park 6:42:00 10.40222399 1
Lynnfield Park 6:43:00 10.30850302 1
Lynnfield Park 6:44:00 10.3926271 1
Lynnfield Park 6:45:00 11.00212959 1
Lynnfield Park 6:46:00 11.10916975 1
Lynnfield Park 6:47:00 11.17461482 1
Lynnfield Park 6:48:00 11.12582278 1
Lynnfield Park 6:49:00 11.10004124 1
Lynnfield Park 6:50:00 11.11450824 1

Reeza · Posted 09-26-2022 02:01 PM

I agree with others, using BY group processing is likely more efficient in the long run.

Here's a method that answers the question asked:

https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Basically for you - use CALL EXECUTE to call for all runners.

Then you can combine your results into one dataset using:

data auc_combined;
set auc:;
run;

This will combine all data sets that start with AUC.

If you do go down the macro approach, I would highly recommend adding a step to your code to not create intermediary data sets for each runner and to only keep the final results. That's a ton of wasted space on your machine. Basically use proc data sets to delete intermediary data sets.

A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Re: A macro needs to be run for 12 000 runners and produce one dataset with a record for each runner

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!