FreelanceReinh
Jade | Level 19
Member since
02-01-2015
- 4,405 Posts
- 2,030 Likes Given
- 1,193 Solutions
- 3,950 Likes Received
This widget could not be displayed.
-
Latest posts by FreelanceReinh
Subject Views Posted 59 2 hours ago 79 2 hours ago 173 yesterday 304 Friday 348 Friday 414 a week ago 2030 a week ago 484 a week ago 510 a week ago 250 2 weeks ago -
Activity Feed for FreelanceReinh
- Posted Re: Sample size for (average) bioequivalence on Statistical Procedures. 2 hours ago
- Posted Re: Sample size for (average) bioequivalence on Statistical Procedures. 2 hours ago
- Got a Like for Re: Sample size for (average) bioequivalence. yesterday
- Posted Re: Sample size for (average) bioequivalence on Statistical Procedures. yesterday
- Posted Re: How to get posterior parameters used in each imputation with monotone regression method? on Statistical Procedures. Friday
- Posted Re: How to get posterior parameters used in each imputation with monotone regression method? on Statistical Procedures. Friday
- Got a Like for Re: Delete/remove a custom format. Wednesday
- Liked Re: Varying x-axis tick marks in sgplot for Ksharp. a week ago
- Got a Like for Re: How to mask the % in a prxmatch statement. a week ago
- Posted Re: How to mask the % in a prxmatch statement on SAS Programming. a week ago
- Liked Re: Join vs Merge 1.2 TB with 110 GB Datasets for kenkaran. a week ago
- Got a Like for Re: Montecarlo Simulations. a week ago
- Posted Re: Montecarlo Simulations on SAS Software for Learning Community. a week ago
- Got a Like for Re: Montecarlo Simulations. a week ago
- Posted Re: Montecarlo Simulations on SAS Software for Learning Community. a week ago
- Posted Re: Montecarlo Simulations on SAS Software for Learning Community. a week ago
- Posted Re: How do I create a loop that keeps on subtracting a value? on SAS Programming. 2 weeks ago
- Got a Like for Re: Macro code to repeat a code chunk. 2 weeks ago
- Posted Re: Macro code to repeat a code chunk on SAS Programming. 2 weeks ago
- Posted Re: Montecarlo Simulations on SAS Software for Learning Community. 2 weeks ago
-
-
My Liked Posts
Subject Likes Posted 1 yesterday 2 a week ago 1 a week ago 1 a week ago 1 2 weeks ago
2 hours ago
[continuation of the previous post]
Now let's turn to your PROC POWER step where you obtained a sample size of 69 subjects per group. The corresponding power estimate is even more suitable for comparisons:
ods output output=ppow;
proc power;
twosamplemeans test=equiv_diff
dist=normal
lower = -0.223
upper = 0.223
meandiff = 0.05
stddev = 0.4
npergroup = 69
power = .;
run;
proc print data=ppow;
format power best16.;
run; /* 0.80179614325271 */
/* The same result (up to 14 decimals!) can be obtained by using a formula from section 3.2.3 of the book (p. 62): */
data _null_;
a=0.05; /* significance level alpha */
dU=0.223; /* bioequivalence limit (log(1.25)) */
d=0.05; /* mean difference delta */
s=0.4; /* common standard deviation in both groups */
n=69; /* sample size per group */
power=1-cdf('T',quantile('T',1-a,2*n-2),2*n-2,(dU-d)/(s*sqrt(2/n)))
-cdf('T',quantile('T',1-a,2*n-2),2*n-2,(dU+d)/(s*sqrt(2/n)));
put power= best16.;
run;
The power computed above is that of an equivalence test based on a much simpler model than model (10.2.1) in the book: Just two independent normal distributions (ideally) with equal variances (which are assumed to be unknown). No cross-over design, just overall mean plus fixed treatment effect plus normal random error ("two-sample parallel design").
/* Simulate data of 500,000 experiments in a two-sample parallel design */
%let n=69; /* sample size per group */
%let mu=1; /* overall mean, arbitrarily selected */
%let d=0.05; /* fixed treatment effect */
%let s=0.4; /* common standard deviation in both groups */
data sim2(drop=F:);
call streaminit('MT64',31415927);
retain sim k i y e;
array F[*] FT FR (%sysevalf(&d/2) -%sysevalf(&d/2));
do sim=1 to 500000;
do k=1 to 2;
do i=1 to &n;
e=rand('normal',0,&s);
y=&mu+F[k]+e;
output;
end;
end;
end;
run;
/* Perform equivalence test */
options nonotes;
ods select none;
ods noresults;
ods output EquivLimits=eqlim2(where=(method='Pooled'));
proc ttest data=sim2 tost(-0.223, 0.223);
by sim;
class k;
var y;
run;
options notes;
ods select all;
ods results;
proc freq data=eqlim2;
tables assessment / binomial;
run; /* proportion assessment="Equivalent", i.e., estimated power: 0.8011 (95%-CI: [0.8000, 0.8022]) */
Again, the result of the simulation matches the computed power 0.8017... obtained before.
... View more
2 hours ago
As promised, here is the SAS code I developed during my investigation.
/* Simulation of 500,000 clinical trials with 2x2 crossover design to establish average bioequivalence between
two treatments using log-transformed data (cf. the example on p. 262 f. of
Chow, S.C., Shao, J., and Wang, H. (2008): Sample Size Calculations in Clinical Research, 2nd ed.,
Chapman & Hall/CRC Press/Taylor & Francis Group, New York
and the derivation of the sample size estimate in this book) */
%let n=18; /* sample size per group */
%let mu=1; /* overall mean in the statistical model (10.2.1) used in the book, arbitrarily selected */
%let d=0.05; /* fixed treatment effect ("delta" in the book) */
%let rho=0.5; /* Between-subject std. deviations (sBT, sBR) and within-subject std. dev. (sWT, sWR), */
%let sBT=0.35; /* denoted "sigma_BT" etc. in the book, as well as the correlation rho of each subject's random */
%let sBR=0.25; /* effect between test (T) and reference (R) treatment have been arbitrarily selected, but */
%let sWT=0.2; /* aiming at the value 0.4 for the derived standard deviation parameter s11 (see below) used in */
%let sWR=0.15; /* the book example. */
%let P1=0.07; /* fixed effect of the first treatment period */
%let Q1=0.03; /* fixed effect of the first sequence of treatments (T, R) */
/* Alternative set of parameters:
%let n=18;
%let mu=1;
%let d=0.05;
%let rho=0.15;
%let sBT=0.10;
%let sBR=0.05;
%let sWT=0.37;
%let sWR=0.11;
%let P1=0.12;
%let Q1=0.13;
*/
%let s11=%sysfunc(sqrt(&sBT**2+&sBR**2-2*&rho*&sBT*&sBR+&sWT**2+&sWR**2)); /* in the book: "sigma_1,1", described */
%put &=s11; /* =0.4 */ /* as "standard deviation for intra-subject comparison" */
data sim(drop=FT--sWR);
call streaminit('MT64',27182818);
retain sim k i j l S e y;
array F[*] FT FR (%sysevalf(&d/2) -%sysevalf(&d/2));
array P[2] (&P1 -&P1);
array Q[2] (&Q1 -&Q1);
array S_[*] ST SR;
array sW[*] sWT sWR (&sWT &sWR);
do sim=1 to 500000; /* Reduce this number for shorter run times. */
do k=1 to 2; /* treatment sequence k=1: T, R; sequence k=2: R, T */
do i=1 to &n;
ST=rand('normal',0,&sBT); /* i-th subject's random effect; its bivariate normal distribution is simulated ... */
SR=rand('normal',%sysevalf(&rho*&sBR/&sBT)*ST,%sysevalf(&sBR*%sysfunc(sqrt(%sysevalf(1-&rho**2)))));
/* ... using the conditional distribution of SR given ST; alternatively, PROC SIMNORMAL could be used. */
do j=1 to 2; /* first and second treatment period */
l=2-(j=k); /* Values l=1, 2 correspond to treatments T and R, respectively. */
S=S_[l];
e=rand('normal',0,sW[l]); /* random errors */
y=&mu+F[l]+P[j]+Q[k]+S+e; /* see equation (10.2.1) in the book */
output;
end;
end;
end;
end;
run;
/* Compute mean values of the (log-transformed) measurements per period-sequence combination */
proc summary data=sim nway;
by sim k;
class j;
var y;
output out=stats(drop=_:) mean=;
run;
/* Estimate the treatment effect ("delta hat" in the book) */
data delta(keep=sim d_hat);
set stats;
by sim;
if first.sim then d_hat=0;
d_hat+((-1)**(j~=k)*y);
if last.sim;
d_hat=d_hat/2;
run;
/* Optional: Check the (normal) distribution of "delta hat" (formula (10.2.3) in the book)
proc univariate data=delta;
var d_hat;
histogram / normal;
run;
*/
/* Prepare estimation of "sigma_1,1 hat" squared */
proc transpose data=stats out=stats2(drop=_:) prefix=y;
by sim k;
id j;
var y;
run;
/* Estimate "sigma_1,1 hat" squared using formula (10.2.5) of the book */
data s112(keep=sim s112);
merge sim(keep=sim k i j y) stats2;
by sim k;
if first.sim then s112=0;
lag_y=lag(y);
if j=2 then s112+(lag_y-y-y1+y2)**2;
if last.sim;
s112=s112/(2*&n-2);
run;
/* Optional: "sigma_1,1 hat" squared and "delta hat" are independent; check uncorrelatedness and distribution
of "sigma_1,1 hat" (cf. p. 261 of the book)
data chk;
merge delta s112;
by sim;
run;
proc corr data=chk;
var d_hat s112;
run;
data s112n;
set s112;
s112n=s112*(2*&n-2)/&s11**2; * This quantity should have a chi-square distribution with 2*n-2 degrees of freedom.;
run;
proc univariate data=s112n;
var s112n;
histogram / gamma(alpha=%eval(&n-1) sigma=2); * chi-square(df) distribution equals gamma(df/2, 2) distribution;
run;
*/
/* Compute 90% confidence interval for delta and perform the equivalence test ("two one-sided tests procedure")
using delta_U=log(1.25)=0.223... and delta_L=log(0.8)=-delta_U (bioequivalence limits as per 2001 FDA guidance
"Statistical Approaches to Establishing Bioequivalence" [https://www.fda.gov/media/70958/download]) */
data confd;
merge delta s112;
by sim;
hw=quantile('T',0.95,2*&n-2)*sqrt(s112)/2*sqrt(2/&n); /* half-with of the CI according to book p. 261 */
c=(abs(&d-d_hat)<=hw); /* =1 in case of coverage of the true parameter, 0 otherwise */
rej=(d_hat-hw>log(0.8) & d_hat+hw<log(1.25)); /* =1 if null hypotheses H_01 and H_02 are rejected */
run; /* at the 5% significance level */
/* Check coverage of the 90% CI for delta and estimate the power of the equivalence test */
proc freq data=confd;
tables c rej / binomial(level='1');
run; /* proportion c=1: 0.8994 (95%-CI: [0.8985, 0.9002])
prop. rej=1: 0.8074 (95%-CI: [0.8063, 0.8085]) */
/* with &n=17: */
/* proportion c=1: 0.8995 (95%-CI: [0.8987, 0.9003])
prop. rej=1: 0.7837 (95%-CI: [0.7825, 0.7848]) */
The above estimates of the test's power indicate that n = 18 is the minimum sample size required for 80% power, given the assumed values of parameters d and s 1,1 . Very similar results are obtained with the "alternative set of parameters" (see that comment in the code).
/* Suggested better approximation of the power than the book's 2Phi(...)-1 formula (p. 261) */
data _null_;
do n=17, 18;
pow= cdf('normal', log(1.25)-quantile('T',0.95,2*n-2)*sqrt(&s11**2/(2*n)),&d,sqrt(&s11**2/(2*n)))
-cdf('normal',-log(1.25)+quantile('T',0.95,2*n-2)*sqrt(&s11**2/(2*n)),&d,sqrt(&s11**2/(2*n)));
put n= "--> approx. power=" pow 6.4;
end;
run;
The above approximation yields 0.8095 for n = 18 and 0.7857 for n = 17, suggesting the same minimum sample size as the simulation.
As mentioned in my previous post, the equivalence test can also be implemented with PROC TTEST rather than in several steps as above:
/* Restructure simulated data for application of PROC TTEST */
proc transpose data=sim out=simt(drop=_:) prefix=y;
by sim k i;
id j;
var y;
run;
data simtt;
set simt;
trt1=k;
trt2=3-k;
run;
/* Perform the equivalence test with PROC TTEST */
options nonotes;
ods select none;
ods noresults;
ods output EquivLimits=eqlim(where=(period=' ' & method='Pooled'));
proc ttest data=simtt tost(%sysfunc(log(0.8)), %sysfunc(log(1.25)));
by sim;
var y1 y2 / crossover=(trt1 trt2);
run;
options notes;
ods select all;
ods results;
proc freq data=eqlim;
tables assessment / binomial;
run;
/* Compare previous results to those from PROC TTEST */
data cmptt;
merge confd(keep=sim rej)
eqlim(keep=sim assessment);
by sim;
run;
proc freq data=cmptt;
tables rej*assessment;
run; /* Results are identical: rej=1 <==> Assessment="Equivalent" */
My suggestion for an exact calculation of the power is to compute the integral of the joint probability density of "d hat" and "s 1,1 hat" over the rejection region, i.e., the set of ("d hat" and "s 1,1 hat") pairs satisfying the inequality |"d hat"| < d U − t a,2n−2 "s 1,1 hat"/√(2n) where the two null hypotheses H 01 and H 02 are rejected. As those two random variables are independent (see book, p. 261), their joint pdf is just the product fg of the pdfs of a normal distribution (f ) and a scaled chi distribution (g ). I found the pdf of a chi distribution in section 8.3 of Evans, M., Hastings, N. and Peacock, B. (2000), Statistical Distributions, 3rd ed., John Wiley & Sons, New York.
%let n=18;
%let d=0.05;
%let s11=0.4;
%let s112=%sysevalf(&s11**2);
%let alpha=0.05;
/* Select a subset of points where the joint pdf does not almost vanish */
data intset;
do i=-5000 to 6000;
x=i/10000;
do j=1000 to 8000;
y=j/10000;
fg= sqrt(&n/constant('PI'))/&s11*exp(-&n/&s112*(x-&d)**2)
*2*((&n-1)/&s112)**(&n-1)/fact(&n-2)*y**(2*&n-3)*exp(-(&n-1)/&s112*y**2);
if fg>1e-10 then output;
end;
end;
run; /* 52891353 obs. */
proc means data=intset min max;
var i j;
run; /* Min and max are well inside the ranges, so it's safe to limit the integration to this set. */
/* Compute the integral numerically */
proc means data=intset sum;
where abs(x)<log(1.25)-quantile('T',1-&alpha,2*&n-2)*y/sqrt(2*&n);
var fg;
run; /* --> Power = 80703401.67/1e8 = 0.80703... . */
This value matches the result of the simulation, 0.8074 (95%-CI: [0.8063, 0.8085]), quite well.
I will continue in a subsequent post to avoid that this one is getting too long.
... View more
yesterday
1 Like
Hi, Agnieszka (@A_Tomczyk), and welcome to the SAS Support Communities!
Thank you very much for posting this interesting question. I have investigated the issue using the second edition (2008) of your book (in which I found basically the same texts and formulas as shown in your photographs of the third edition, so I assume the preceding paragraphs of that section "10.2 Average Bioequivalence" are very similar as well).
Everything seemed relatively straightforward to me (note the typo −d L , which should read d L ), until I stumbled across the sentence "Under the alternative hypothesis that |e| < d, ..." ending with an approximative power formula of the form 2F(...)−1. As it turned out, the authors switched the notation there: It would have been more consistent to write that alternative hypothesis like |d| < d U .
Also, the approximation 2F(...)−1 seems unnecessarily coarse to me: I say "unnecessarily" because their equations "do not have an explicit solution" (quote from p. 261) anyway, so they could also have suggested a better approximation such as
F(d U −t a,2n-2 s 1,1 /√(2n)) − F(−d U +t a,2n-2 s 1,1 /√(2n))
with the cumulative distribution function F of "d hat", i.e., the cdf of a normal distribution (see formula 10.2.3 in the book if the numbering is the same in your edition). I use this in my code (which I will post separately). It is still an approximation because s 1,1 is used instead of its estimate "s 1,1 hat", but unlike theirs it does not require |d| (i.e., their |e|) to be very small. Using their approximation I sometimes got nonsensical negative values for the estimated power.
For an exact computation you can evaluate an integral of the joint probability density of "d hat" and "s 1,1 hat", which is not very difficult to do in a DATA step (also shown in my code) thanks to the independence of these two random variables.
In the next step of their derivation, the authors of the book use the upper b/2 quantile of the t distribution with 2n-2 degrees of freedom, although the corresponding normal quantile z b/2 would better fit the previous "2F(...)−1" expression. But, as we know, those two quantiles don't differ much if n is sufficiently large. (And they do use z b/2 in their "further simplified" sample size formula.)
Surprisingly, they made two mistakes in their example calculation (which you show in one of your photographs): z 0.05 =1.64..., not 1.96, and z 0.10 =1.28..., not 0.84. Using the correct quantiles, the result for the (approximate) estimated minimum sample size is 23, not 21. Interestingly, using the "better approximation" I've suggested further above, I obtain n = 18 -- and this is even confirmed by both the exact calculation (using the integral mentioned above) and the simulation of 500,000 cross-over studies (each for n = 17 and n = 18) that I did in my code.
The relevant hypothesis test can be implemented with PROC TTEST using the TOST option of the PROC TTEST statement in conjunction with the CROSSOVER= option of the VAR statement (also shown in my code). However, it seems to me that the corresponding sample size calculation is beyond the capabilities of PROC POWER (at least in my SAS/STAT 14.3 version), which does support calculations corresponding to using either TOST or CROSSOVER= in PROC TTEST. Your PROC POWER code uses the "TOST only" variant. In my code I show the corresponding PROC TTEST step as well as a simulation.
The GLMPOWER procedure doesn't help either (I think) because in the documentation it says: "Tests and contrasts that involve random effects are not supported." But the model used in Chow/Shao/Wang does involve a random effect.
So, unless someone else has other ideas, you can use
one of the approximations from the Chow et al. book or
the "better" approximation mentioned above or
the exact integral calculation or
a simulation
to obtain power and sample size for the model in question with SAS.
I am so sorry that I have to leave the office now, but I will post my code tomorrow (CEST time zone).
... View more
Friday
@PanShuyao wrote:
I think RANNOR and RANGAM use the older RNG.
This is true, but I'm not surprised that the results still don't match. The (pseudo-)random numbers depend on the order of operations and at least I don't know how PROC MI works internally, so it is not obvious how to replicate its computations manually.
Please see the addendum I added to my first reply for a more promising approach.
... View more
Friday
Hello @PanShuyao,
I haven't checked more details, but my first guess is that the difference you observed is due to the fact that PROC MI and PROC IML use different random number generators (RNG): The documentation of the SYSRANEND Macro Variable mentions that PROC MI uses the "older RNG, which is used by the RANUNI function", whereas both CALL RANDSEED and CALL STREAMINIT use the Hybrid 1998/2002 32-bit Mersenne twister as the default (and they don't even provide an option to use the old RNG). So, using the same random seed (769097) or not, the random number streams will be totally different.
Addendum:
Of course, make sure that you are using the same input dataset. Currently, you have FISH1 and FISH2.
For a better comparison of the two approaches, you could run both on multiple copies of the input dataset (ideally avoiding loops) and then see if the mean values of the statistics of interest converge. On the part of PROC MI the analysis of 1000 copies of dataset FISH1 could look like this:
proc surveyselect data=fish1 out=morefish reps=1000 rate=1;
run;
ods select none;
ods noresults;
proc mi data=morefish nimpute=5 seed=769097 out=outex3_m;
by replicate;
monotone reg(Length2= Length1 / details);
var Length1 Length2;
ods output MonoReg=MonoReg_m;
run;
ods select all;
ods results;
proc means data=monoreg_m mean std clm;
class effect;
var _1;
run;
... View more
a week ago
2 Likes
Hello @acordes,
Use single quotes to avoid triggering the macro processor:
prxmatch('m/%macro/i',code_line);
(The "/o" option is redundant because this regular expression is a constant.)
... View more
a week ago
1 Like
You're welcome.
The (pseudo-)random values of random_PD and random_LGD created by the RAND function depend on the random seed used in the CALL STREAMINIT routine (the value 1 in our code), the random-number generator (MTHybrid is the default, but only since SAS 9.4M3), the order in which they are calculated and possibly even on the platform (hardware and operating system).
In our case the order of calculations might be the reason why we get different results. For example, if scenario "Adverse" came before "Baseline" in dataset PORTFOLIO_SIM, the random values for "Adverse" would be different from what they are with "Baseline" being the first scenario. The sort order of the borrower IDs is similarly important. Note that since PORTFOLIO_SIM was created by PROC SQL without an ORDER BY clause, the order of observations in PORTFOLIO_SIM can be different on different computers, even using the same program code!
I have just created one million pairs of random_PD and random_LGD values based on the Sim_PD and Sim_LGD values of the case in question and I found your values in observations 3901, 3902, ..., 3910 of this series. So, maybe 3900 other values had been created before due to sort order differences or other technical reasons (consider perhaps multi-threaded parallel computing on the server SAS OnDemand connects to).
As a matter of fact, you don't have to worry about those differences. Your values are valid and it is reassuring that I could reproduce them on my computer.
... View more
a week ago
1 Like
@skumar46 wrote:
(...) for the borrower id :B0000 under baselines scenerio, i am getting same result for sim_PD= 0.2324, sim_LGD = 0.58, exp loss: 3,608.73 for all sim_id from 1 to 10, why these are not changing.
This is because these three variables were created in your PROC SQL step: Look at the first occurrences of their names in your code. Sim_PD and Sim_LGD are derived from PD and LGD, respectively, in CASE-WHEN expressions. Expected_Loss is the result of a multiplication involving Sim_PD, Sim_LGD and two other factors.
In the final DATA step, Sim_PD and Sim_LGD are used as arguments of the RAND function, which does not change their values. Instead, the randomly changing values returned by this function are stored in variables named random_PD and random_LGD. These, in turn, are used in the calculation of variable sim_loss.
As a result, the only variables that have a chance to differ between observations in dataset MONTECARLO with the same Borrower_ID and Scenario are: random_PD, random_LGD, sim_loss and (trivially) sim_id. All other variables just contribute 10 copies (per Borrower_ID-Scenario combination) of their corresponding value in dataset PORTFOLIO_SIM.
@skumar46 wrote:
Now, i understand that like you added three last columnns i.e random_PD random_LGD & sim_loss but that not fetched by sas on demand when i run this in sam on demand.
These variables must be contained in dataset MONTECARLO if the DATA step creating it ran successfully. You may need to scroll your output to the rightmost columns to make them visible in the window. Or use the VAR statement of PROC PRINT to select and order the most important variables to be displayed in the output.
Example:
proc print data=montecarlo(obs=10);
var Borrower_ID Scenario sim_id random_PD random_LGD sim_loss;
run;
@skumar46 wrote:
Are you using also sas on demand?
No, I have never used SAS OnDemand because I have always used SAS software provided by my employers or clients and, since 2015, my own local SAS installation.
... View more
a week ago
Thanks for providing more complete information.
On page 3 of your Word document (which I read using the document preview feature of the forum as no MS Word is installed on my SAS workstation) you show a DATA step that you say is "not working with the purpose". However, if I run this and the two preceding steps using a simplified version of your LOANS dataset, it appears to work perfectly: It creates 10 observations for each Borrower_ID-Scenario combination.
Here are the first three observations for borrower ID B00000 in the "Adverse" scenario (i.e., observations 11 through 13 of the output dataset MONTECARLO), limited to the most important variables:
Borrower_ random_ random_
ID Scenario PD LGD Sim_PD Sim_LGD sim_id PD LGD sim_loss
B00000 Adverse 0.2324 0.58 0.2905 0.638 1 0.29511 0.70829 5596.03
B00000 Adverse 0.2324 0.58 0.2905 0.638 2 0.28944 0.59881 4640.23
B00000 Adverse 0.2324 0.58 0.2905 0.638 3 0.29920 0.64346 5154.29
Variables PD and LGD are copied from the LOANS dataset, Sim_PD and Sim_LGD were created in your PROC SQL step. In the DO loop of the final DATA step (do sim_id = 1 to 10;) the randomly varying values of random_PD and random_LGD were generated and used in the calculation of sim_loss. (My suggested renaming of variables would have applied to a PORTFOLIO_SIM dataset using the variable names from the LOANS dataset, which turned out to be different datasets, so no renaming is necessary.)
So, while the first six of the variables shown above have necessarily the same values in these three observations, the newly computed last three variables exhibit the random variation to be expected from a Monte Carlo simulation. Why do you think that these results are not suitable for your purposes?
@skumar46 wrote:
If you are from Sas team, can we connect through zoom/ teams?
No, I am not affiliated with SAS Institute Inc. Like most of the other forum members I am a volunteer, just another SAS user like you (with 27 years of SAS experience, though), helping others in parallel to my actual work.
... View more
2 weeks ago
Just a minor side note: When you do arithmetic with money amounts with (typically) two decimal places, rounding errors can easily occur that you might not expect. I would use the ROUND function to rectify those errors.
Example:
655 data test;
656 amount1=2.20;
657 amount2=1.20;
658 unearned_bal=1;
659 crude = amount1 - amount2 - unearned_bal;
660 if crude ne 0 then put 'Surprised?' +1 crude;
661 amount2 = round(amount1 - amount2 - unearned_bal, 1e-7);
662 if amount2=0 then put 'OK!';
663 run;
Surprised? 2.220446E-16
OK!
If you know that none of your amounts and balances has more than two decimals (i.e., involves fractions of a cent), using 0.01 as the rounding unit is even safer than the suggested 1e-7.
... View more
2 weeks ago
1 Like
Hello @pink_poodle,
If your SAS installation (like my ageing SAS 9.4M5 from 2016) does not yet support the convenient variable list syntax using array references suggested by ballardw, you may want to resort to your idea of a macro loop, which could look like this:
%macro cdef(m);
%local i;
%do i=1 %to %eval(&m-1);
c&i=log((sum(of col1-col&i)+const)/(sum(of col%eval(&i+1)-col&m)+const));
%end;
%mend cdef;
data want;
set tran;
const=0;
%cdef(15)
run;
Note that your description
@pink_poodle wrote:
I would like the lines below for c1= to c4= to go on for 14 columns. That means col5 is col14 and I should have c1= to c14=.
is not consistent with your code, where the maximum col index equals the maximum c index plus 1, so not both maximum indices can be 14. Hence, you'll need to replace %cdef(15) with %cdef(14) in my suggested code if col15 does not exist in your real TRAN dataset.
... View more
2 weeks ago
It looks like you work with Excel data on a regular basis so that you think of an .xlsx-file as a dataset. This is good because I can assume that you know how to read your attached Excel table into a SAS dataset named portfolio_sim. (I don't have Excel installed on my SAS workstation, nor is the SAS/ACCESS Interface to PC Files part of my SAS license, so I can't work with .xlsx data directly.)
Then you can slightly adapt the beginning of the first DATA step (called "Step 4" in a comment) from your initial post as follows
data montecarlo(rename=(random_PD=PD random_LGD=LGD Sim_PD=original_PD Sim_LGD=original_LGD));
call streaminit(1);
set portfolio_sim(rename=(PD=Sim_PD LGD=Sim_LGD));
...
and run the DATA step on the newly created dataset portfolio_sim. The resulting output dataset montecarlo should contain 5000 observations: 10 per borrower ID, with randomly varying PD and LGD values, as desired. (The "constraining to valid ranges" will likely be applied to only very few values and should not be a problem for learning purposes.)
... View more
2 weeks ago
2 Likes
Hello @Mombi,
I think you can add the desired table by using the COLAXISTABLE statement:
proc sgpanel data = my_data_SGPANEL;
panelby sex;
vbarparm category = food_group response = Mean / group = select_var groupdisplay = cluster DataLabel LimitLower = LowerCLMean LimitUpper = UpperCLMean LimitAttrs = (Color = black);
format sex sexfmt. food_group fdgpfmt. select_var ynfmt. Mean F4.1 LowerCLMean F4.1 UpperCLMean F4.1;
ColAxis display = (NoLabel);
RowAxis label = "Percents";
colaxistable LowerCLMean Mean UpperCLMean / class=select_var classdisplay=cluster;
run;
At least this works on my SAS 9.4M5 (where I do not get notes in the log about incompatibility of DATALABEL and error bars).
@Mombi wrote:
NOTE: La barre d'erreur ne prend en charge l'option DATALABEL=. Le libellé ne sera pas ajouté.
(...) not sure where French is coming from here since all the other messages in my LOG are in English
You can obtain those special log messages in English by (temporarily) changing the value of the LOCALE system option:
options locale=EN_US;
... View more
2 weeks ago
In your initial post you wrote
(...) I have attached dataset
But there is no dataset attached. So let me create test data using some of the values shown in the partial output that you did attach.
data portfolio_sim;
input Borrower_ID $ Sim_PD Sim_LGD EAD Current_Balance;
cards;
B0001 0.2905 0.638 2 500
B0002 2.905 6.38 2 500
B0003 0.2905 0.638 2 500
;
If you run either of your two DATA steps on the above test data, you will see
for Borrower_ID B0001 and B0003: trivially constant values of Sim_PD, Sim_LGD, EAD and Current_Balance (as explained in my previous post), but randomly varying values of random_PD, random_LGD and sim_loss which differ between B0001 and B0003 (although the values of Sim_PD, Sim_LGD, EAD and Current_Balance are the same for both borrower IDs)
for Borrower_ID B0002: constant values also of random_PD, random_LGD and sim_loss because all the initial random values of random_PD and random_LGD were greater than 1 and therefore replaced with 1 by the assignment statements with the comment "Constrain to valid ranges" (as pointed out in @PaigeMiller's first reply).
@skumar46 wrote:
i am getting the same result.
Again, if you observed variable values that are "the same" contrary to expectations, then please show us.
... View more
2 weeks ago
@skumar46 wrote:
I wanted to run ... 10 Iterations but dont know that why its giving me the same out for all 10 itertaions (output attached).
Your output shows 10 out of 15,000 observations and only 4 out of 33 variables, whose (possibly formatted) values are constant on those 10 observations. None of these variables (Scenario, Sim_PD, Sim_LGD and Expected_Loss) is created in either of the two DATA steps that you posted. Hence, these variables must be contained in the input dataset PORTFOLIO_SIM. Their values are read by the SET statement, left unchanged by the other DATA step statements and written to the output dataset MONTECARLO by the OUTPUT statement. As this occurs in a DO loop with 10 iterations (do sim_id = 1 to 10;), those four variable values are necessarily the same in all 10 observations created from one observation of PORTFOLIO_SIM. They are just copies of the original values from the input dataset.
If you observed variable values that are equal contrary to expectations (e.g., values created by the RAND function calls in your code), then please show us.
... View more