Can somebody explain to me the difference for these two data steps within a macro? One uses a macro do loop and the other uses a data step do loop. I don't understand how these two are acting differently. I got this from the internet so I'm trying to fully understand it so I can create my own triangles using real data.
Thanks,
Paul
%MACRO Create_Triangle();
DATA Loss_Triangle;
DO n = 1 to &Cols.;
Col_1 = 10000*(1+UNIFORM(_n_));
OUTPUT;
END;
RUN;
DATA LOSS_TRIANGLE;
SET LOSS_TRIANGLE;
%DO s=1 %TO &Cols. - 1;
%LET x = %EVAL(&Cols. - &s. + 1);
%LET t = %EVAL(&s. +1);
IF &x. > _n_ THEN
Col_&t. = Col_&s. * RAND("UNIFORM", (1/&t.)**0.5, (1 / &t.)**0.1);
ELSE col_&t. = .;
%END;
RUN;
*** This program uses a data step to create the first column of random data with &Cols observations, then uses a macro do-loop to add the remaining columns based on the values from each consecutive column. ***
/* This step creates the first column (Col_1) of your Loss_Triangle table, where &Cols is the number of observations. So macro &Cols is kind of like data-step _n_. */
DATA Loss_Triangle;
DO n = 1 to &Cols.;
Col_1 = 10000*(1+UNIFORM(_n_)); * assigning random values based on the _n_ observation counter;
OUTPUT;
END;
RUN;
/* This step creates the rest of the columns based on the values of the first column. */
DATA LOSS_TRIANGLE;
SET LOSS_TRIANGLE; * setting initial table with one column of random data;
%DO s=1 %TO &Cols. - 1; * s-counter goes through observations, it is a macro variable, even though no ampersand;
%LET x = %EVAL(&Cols. - &s. + 1); * x checks the distance between s and t, so that last calculation is at at second-to-the-last observation (that leaves room for t = EVAL(&s + 1));
%LET t = %EVAL(&s. +1); * t is one observation number ahead of the s-counter, first value of t is 2, it also numbers new columns;
IF &x. > _n_ THEN * from the first data part, data-step _n_ is like macro variable &Cols., so &x = _n_ when s=1, this says to skip Col1 that's already there;
Col_&t. = Col_&s. * RAND("UNIFORM", (1/&t.)**0.5, (1 / &t.)**0.1); * first column Col_2 calculated from Col_1, Col_3 on Col_2 and so on;
ELSE col_&t. = .; * this is the loss part of the loss triangle;
%END;
RUN;
@pchappus wrote:
Can somebody explain to me the difference for these two data steps within a macro? One uses a macro do loop and the other uses a data step do loop. I don't understand how these two are acting differently. I got this from the internet so I'm trying to fully understand it so I can create my own triangles using real data.
Thanks,
Paul
%MACRO Create_Triangle();
DATA Loss_Triangle;
DO n = 1 to &Cols.;
Col_1 = 10000*(1+UNIFORM(_n_));
OUTPUT;
END;
RUN;DATA LOSS_TRIANGLE;
SET LOSS_TRIANGLE;
%DO s=1 %TO &Cols. - 1;
%LET x = %EVAL(&Cols. - &s. + 1);
%LET t = %EVAL(&s. +1);
IF &x. > _n_ THEN
Col_&t. = Col_&s. * RAND("UNIFORM", (1/&t.)**0.5, (1 / &t.)**0.1);
ELSE col_&t. = .;
%END;
RUN;
When questioning what any macro does a good starting point is to use OPTION MPRINT; before the macro call so the log will display the generated code. You may also need MLOGIC to trace macro logic.
Something like:
options mprint;
%create_triangle
BTW that example macro has what is often a very bad for debugging code style: Use of
Data result;
set result;
<other code>
means that with the single resulting data set for the macro you cannot tell which data set may have an error.
I would actually suggest modifying the macro so the second data creates LOSS_TRIANGLE2 so you can actually see differences between the initial LOSS_TRIANGLE and the final result.
There are many differences between these two data steps, it's hard to see exactly what your question is, or why you think they should act the same.
Maybe you should state what you are actually trying to do and then someone here will have a solution, or can point you in the right direction.
I understand that the first do loop is creating the rows. The second %DO loop creates the columns. I don't understand why the first datastep doesn't use %DO. I'm still learning SAS and think I have something stuck in the back of my head from somebody telling me to use DO within data steps and %DO outside of data steps.
This is my first instance seeing a %DO within a data step so I'm trying to understand how a %DO acts differently than a DO.
It is NOT using a %DO inside a data step. It is using %DO inside a macro to generate lines of code that form PART of a data step.
MPRINT(CREATE_TRIANGLE): data loss_triangle_2; MPRINT(CREATE_TRIANGLE): set loss_triangle; MPRINT(CREATE_TRIANGLE): if 3 > _n_ then col_2 = col_1 * rand("uniform", (1/2)**0.5, (1 / 2)**0.1); MPRINT(CREATE_TRIANGLE): else col_2 = .; MPRINT(CREATE_TRIANGLE): if 2 > _n_ then col_3 = col_2 * rand("uniform", (1/3)**0.5, (1 / 3)**0.1); MPRINT(CREATE_TRIANGLE): else col_3 = .; MPRINT(CREATE_TRIANGLE): run;
You could generate those statements into the first data step instead. Just make sure to reference your iteration variable from your DO loop instead of the automatic data step loop iteration variable.
data loss_triangle;
do n = 1 to &cols.;
col_1 = 10000*(1+uniform(_n_));
%do s=1 %to &cols. - 1;
%let x = %eval(&cols. - &s. + 1);
%let t = %eval(&s. +1);
if &x. > N then
col_&t. = col_&s. * rand("uniform", (1/&t.)**0.5, (1 / &t.)**0.1);
else col_&t. = .;
%end;
output;
end;
run;
PROC COMPARE can compare the datasets (if you don't overwrite the first one with the second one).
(Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.LOSS_TRIANGLE 12OCT18:15:39:29 12OCT18:15:39:29 2 3 WORK.LOSS_TRIANGLE_2 12OCT18:15:39:29 12OCT18:15:39:29 4 3 Variables Summary Number of Variables in Common: 2. Number of Variables in WORK.LOSS_TRIANGLE_2 but not in WORK.LOSS_TRIANGLE: 2. The COMPARE Procedure Comparison of WORK.LOSS_TRIANGLE with WORK.LOSS_TRIANGLE_2 (Method=EXACT) Observation Summary Observation Base Compare First Obs 1 1 Last Obs 3 3 Number of Observations in Common: 3. Total Number of Observations Read from WORK.LOSS_TRIANGLE: 3. Total Number of Observations Read from WORK.LOSS_TRIANGLE_2: 3. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 3. NOTE: No unequal values were found. All values compared are exactly equal.
Looks like the main difference is the second step is adding additional variables.
Okay that makes sense. If I'm trying to loop through observations use DO, if I'm trying to create new columns, use %DO.
You don't really need to generate any code for that problem. Just use an ARRAY.
%let cols=3;
%let seed=1234;
data loss_triangle_4;
array col_ (&cols);
do n = 1 to dim(col_);
col_(1) = 10000*(1+uniform(&seed));
do j=2 to dim(col_)-n+1 ;
col_(j) = col_(j-1) * rand("uniform", (1/J)**0.5, (1 / j)**0.1);
end;
output;
call missing(of col_(*));
end;
run;
But you do need to figure out how you are generating your random numbers and keep track of your seeds.
@pchappus wrote:
Okay that makes sense. If I'm trying to loop through observations use DO, if I'm trying to create new columns, use %DO.
No, @pchappus that's not it. If you are trying to loop through COLUMNS in a data set, you can use a DO (along with an ARRAY as suggested by @Tom). You can also DO within an observation if you need to perform something iterative.
DO does not loop through observations.
You can create new columns in a data step without a %DO.
*** This program uses a data step to create the first column of random data with &Cols observations, then uses a macro do-loop to add the remaining columns based on the values from each consecutive column. ***
/* This step creates the first column (Col_1) of your Loss_Triangle table, where &Cols is the number of observations. So macro &Cols is kind of like data-step _n_. */
DATA Loss_Triangle;
DO n = 1 to &Cols.;
Col_1 = 10000*(1+UNIFORM(_n_)); * assigning random values based on the _n_ observation counter;
OUTPUT;
END;
RUN;
/* This step creates the rest of the columns based on the values of the first column. */
DATA LOSS_TRIANGLE;
SET LOSS_TRIANGLE; * setting initial table with one column of random data;
%DO s=1 %TO &Cols. - 1; * s-counter goes through observations, it is a macro variable, even though no ampersand;
%LET x = %EVAL(&Cols. - &s. + 1); * x checks the distance between s and t, so that last calculation is at at second-to-the-last observation (that leaves room for t = EVAL(&s + 1));
%LET t = %EVAL(&s. +1); * t is one observation number ahead of the s-counter, first value of t is 2, it also numbers new columns;
IF &x. > _n_ THEN * from the first data part, data-step _n_ is like macro variable &Cols., so &x = _n_ when s=1, this says to skip Col1 that's already there;
Col_&t. = Col_&s. * RAND("UNIFORM", (1/&t.)**0.5, (1 / &t.)**0.1); * first column Col_2 calculated from Col_1, Col_3 on Col_2 and so on;
ELSE col_&t. = .; * this is the loss part of the loss triangle;
%END;
RUN;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.