About FreelanceReinh

FreelanceReinh · ‎01-11-2016

Otherwise, you could prefix the function call with a double NOT to obtain a 0-1 flag: new=~~index(old,'-'); (Or new=~~find(old,'-'); to save one more character.)

FreelanceReinh · ‎01-10-2016

Or without IF-THEN-ELSE: new_variable=(index(old_variable,'-')>0); But you should still keep in mind, that missing values of Old_Variable (if any) would be coded as 0. Depending on what you plan to do with New_Variable, this may or may not be useful.

FreelanceReinh · ‎01-10-2016

Sure, PROC SQL can do the trick, but PROC MEANS (or PROC SUMMARY for that matter) can compute more statistics than PROC SQL, e.g. skewness (thanks, @Reeza, for the hint!). Please note that your S statistic can be derived from the coefficient of variation: S=100/CV. Your g statistic can be calculated directly as skewness with the VARDEF=N option of the PROC MEANS statement, whereas your sample standard deviation would require VARDEF=DF (for the denominator n-1, which I assume is what you want). Instead of merging two datasets with summary statistics (one for each setting of VARDEF), I decided to convert the default "DF skewness" to "N skewness" by multiplying with the appropriate conversion factor f := (n-1)(n-2)/n² = 1 - 3/n + 2/n². proc summary data=bootsample; by replicate; var a b c d; output out=stats(drop=_:) cv= skew= / autoname; run; %let f=%sysevalf(1-3/&record_count+2/&record_count**2); data want; set stats; a_t=100/a_CV+&f*a_Skew; b_t=100/b_CV+&f*b_Skew; c_t=100/c_CV+&f*c_Skew; d_t=100/d_CV+&f*d_Skew; drop a_CV--d_Skew; run; ods html file="C:\Temp\t_stat.html"; ods listing close; title 'The t Statistic'; proc print data=want label noobs; run; ods html close; ods listing; title; This could be the PROC SQL code (perhaps for validation purposes or for comparison of run times and numerical accuracy): proc sql; create table want as select replicate, m_a/s_a+sum(x_a)/(&record_count*s_a**3) as a_t, m_b/s_b+sum(x_b)/(&record_count*s_b**3) as b_t, m_c/s_c+sum(x_c)/(&record_count*s_c**3) as c_t, m_d/s_d+sum(x_d)/(&record_count*s_d**3) as d_t from (select replicate, mean(a) as m_a, std(a) as s_a, (a-calculated m_a)**3 as x_a, mean(b) as m_b, std(b) as s_b, (b-calculated m_b)**3 as x_b, mean(c) as m_c, std(c) as s_c, (c-calculated m_c)**3 as x_c, mean(d) as m_d, std(d) as s_d, (d-calculated m_d)**3 as x_d from bootsample group by replicate) group by replicate; quit; Minor differences (like 1E-13, but depending on your a, b, c, d values) between the two approaches are likely to occur.

FreelanceReinh · ‎01-09-2016

Hi Moh, Good idea to use PRXCHANGE here, so you can perform this fairly complex operation within the %LET statement. I think you should insert word boundary metacharacters and put the "or" expression in parentheses to make sure that only complete words are replaced, not substrings of longer words: %let _stdvar=%cmpres(%sysfunc(prxchange(s/\b(%sysfunc(translate(&_EffectRemoved,%str(|),%str( ))))\b//,-1,&_EffectEntered))); %put &_stdvar; The %CMPRES function (or macro) replaces multiple blanks by single blanks.

FreelanceReinh · ‎01-09-2016

Great that you found a good solution using PROC SQL. Please note that your data step approach could be simplified considerably: You don't actually perform BY group processing. Hence, the BY statement in the data step and, most importantly, the (possibly resource-intensive) PROC SORT step could be omitted. The initialization of ZERO and HUNDRED to 0 is redundant. It is implicitly done if you use the "variable+increment" syntax (sum statement). If you like your code very short, you could make use of the fact that logical expressions evaluate to 0 (if they are false) or 1 (if they are true). With these suggestions your initial code could have looked like this: data count; set mydata; zero+(age<0); hundred+(age>100); run; You should be aware that not only negative values, but also missing values of AGE would satisfy the condition AGE<0. If you want to restrict your count to negative values, the third line should read zero+(.<age<0) or, to exclude special missing values as well, zero+(.z<age<0). Of course, you don't need to create a new (possibly large) dataset COUNT to count those observations. You could avoid this with a data step as the following: data _null_; set mydata end=last; zero+(age<0); hundred+(age>100); if last then put zero= hundred=; run; By default, the PUT statement would write the desired numbers to the log (not to the output window like PROC SQL).

FreelanceReinh · ‎01-09-2016

I guess you wrote "merge" and not "concatenate" because you would like your "Clist" to be free of duplicates. I'm not aware of an easy solution to this task, but a quick search on the web brought up this discussion: http://stackoverflow.com/questions/33640976/how-do-i-deduplicate-words-in-a-character-variable The solution provided by user "Chris J" promises to deduplicate the concatenated list suggested by @Reeza, but I haven't tested it. However, it depends on how you are going to use Clist, whether you really need to have it duplicate free. In DROP or KEEP statements (or dataset options), for example, duplicate variable names do no harm.

FreelanceReinh · ‎01-09-2016

Hello, @ArpitSharma, It appears that I should have commented my suggestion a little more. Thanks to @JoshB for stepping in during the night (in my time zone). Yes, the SET statement reading SASHELP.VTABLE is the constant part of the suggested data _null_ step. It was just coincidence that library SASHELP occurred in your example, too. SASHELP.VTABLE is one of more than 30 SQL views on the so called DICTIONARY tables. These, in turn, are "special read-only PROC SQL tables or views [which] retrieve information about all the SAS libraries, SAS data sets, SAS system options, and external files that are associated with the current SAS session" (online documentation). So, there is a lot of information, namely metadata, available through these views. As @Reeza suggested, the information you are looking for could most likely be retrieved directly from the DICTIONARY tables. This would be particularly convenient if you want to store or process those metadata. However, if you just want to quickly look through selected PROC CONTENTS outputs on the screen, my suggested code should work well for you. Please note how flexible the WHERE condition is: You can filter by any combination of the 40+ variables in SASHELP.VTABLE, using operators and SAS functions to specify your selection criteria. Example: If you are interested in the PROC CONTENTS outputs of all datasets excluding views in all libraries whose librefs start with "PROJ" which have been modified within the previous month whose names contain the substring "LAB" whose dataset labels do not contain the substring "2015" excluding small files <256 KB, unless they have at least 500 observations or more than 10 variables ... you could easily adapt the WHERE clause to match exactly these selection requirements (using variables MEMTYPE, LIBNAME, MODATE, MEMNAME, MEMLABEL, FILESIZE, NOBS, NVAR, etc. of SASHELP.VTABLE). As you can see, this goes far beyond what could be achieved with "wild cards in proc contents."

FreelanceReinh · ‎01-08-2016

Maybe this can be a start: %let now1=%sysfunc(compress(%sysfunc(TIME(),timeampm11.),%str( :))); %put &now1; The result, however, tells me that it has become fairly late here in Central Europe, so I have to call it a day. Sorry. 🙂

FreelanceReinh · ‎01-08-2016

Try this: data _null_; set sashelp.vtable; where libname='SASHELP' & memname=:'C'; call execute(cats('proc contents data=',libname,'.',memname,'; run;')); run;

FreelanceReinh · ‎01-08-2016

Hi Aj, Sorry for not responding earlier (was busy with other work). I have a draft version of a function f1 which I thought would be an implementation of G. T. Fosgate's algorithm. However, when I compared his Table I with my results, I got a lot of discrepancies for the "Exact" columns (whereas the "Approx" columns matched perfectly, which was of course not that surprising). Then I created a slightly modified version f2 of my function (introduced a "tolerance" parameter), which takes the "sawtooth" effect into account (cf. @ballardw's post). As a result, the number of discrepancies decreased (and could be further decreased by modifying the tolerance parameter), but was still substantial, unfortunately. I have scrutinized two of the discrepant cases and used computer algebra software to evaluate equation (1) of the article for one case, in order to make sure that no numerical accuracy issues were involved. Result: The probability value calculated by SAS was accurate to 13 significant decimal places! So, I'm fairly confident that the accuracy is sufficient (at least for this case). In fact, the calculation showed that for n=67, p0=0.55, d=0.1 (i.e., the confidence interval [0.45, 0.65]) and alpha=0.1 the LHS of equation (1), using x=37 (because 0.55*67=36.85) results in 0.0976628..., which is apparently <alpha. According to p. 2860 of the article, the "appropriate sample size has been reached when the sum of the tail probabilities [calculated with eqn. (1)] is less than ... alpha ..." Hence, it should be 67 (since all smaller sample sizes led to values >0.1). Interestingly, the "sawtooth" effect does not interfere with this result (for this specific combination of parameters), i.e., when the sample size is increased, the calculated probability remains <0.1. So, the question is, why G.T. Fosgate's algorithm obtained 68, not 67 (see Table I). It is true, however, that the exact tail probabilities (as calculated with the formulas on p. 2858) have a sum greater than 0.1, but they would drop below this value not before n=76, if I'm not mistaken. There is also another minor issue: On p. 2860 it says that the "value of x ... is always 1 at the first iteration ..." I'm not too sure about this: Wouldn't x be 99 for p0=0.99 and n=100 or did I misunderstand the author? Anyway, here is my draft code: /* Define sample size functions f1 and the modified version f2 */ proc fcmp outlib=work.funcs.test; function f1(pL, pU, conf); /* pL, pU: lower/upper confidence limit, conf: confidence level */ alpha=round(1-conf, 1e-10); p0=(pL+pU)/2; do n0=1 to 10000 until(n0*p0-int(n0*p0)<1e-10); end; do n=1 to 10000 until(p<alpha); x=round(n*p0); p=round((pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n),1e-10); end; return(n); endsub; function f2(pL, pU, conf, tol); /* tol: tolerance parameter */ alpha=round(1-conf, 1e-10); p0=(pL+pU)/2; do n0=1 to 10000 until(n0*p0-int(n0*p0)<1e-10); end; do n=n0 to 10000; x=round(n*p0); p=round((pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n),1e-10); if p<alpha & n1=. then n1=n; else if p>=alpha+tol then n1=.; end; return(n1); endsub; quit; options cmplib=work.funcs; /* Try to replicate the values corresponding to Table I of the article */ data test; do i=50 to 90 by 5; p0=i/100; do conf=0.9, 0.95, 0.99; do d=0.1, 0.05; e1=f1(p0-d, p0+d, conf); e2=f2(p0-d, p0+d, conf, 0); e2t=f2(p0-d, p0+d, conf, 0.0005); a=ceil(p0*(1-p0)*(probit(1-(1-conf)/2)/d)**2); output; end; end; end; drop i; run; proc sort data=test; by conf descending d p0; run; /* Read Table I of the article, "Exact" and "Approx" columns stacked side by side */ data orig; input e a; p0=(50+5*mod(_n_-1,9))/100; if mod(_n_-1,18)<9 then d=0.1; else d=0.05; if _n_<=18 then conf=0.9; else if _n_<=36 then conf=0.95; else conf=0.99; cards; 68 68 68 67 65 65 ... /* please copy from the PDF yourself */ 353 339 260 239 ; /* Compare the results */ proc compare data=test c=orig; id p0 conf d notsorted; var e:; with e e e; run; /* Investigate a particular discrepancy */ data ttt; pL=0.45; pU=0.65; conf=0.9; alpha=round(1-conf, 1e-10); put alpha= best20.; put alpha= hex16.; p0=round((pL+pU)/2, 1e-10); put p0= best20.; put p0= hex16.; do n=1 to 120; x=round(n*p0); p=(pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n); output; end; proc print width=min; format p best16.; run; /* Calculate some exact tail probabilities */ data _null_; x1=1-cdf('binom',36,0.45,67); x2=cdf('binom',37,0.65,67); x=x1+x2; put (x:)(=); run; /* 0.122179 */ data _null_; x1=1-cdf('binom',36,0.45,68); x2=cdf('binom',37,0.65,68); x=x1+x2; put (x:)(=); run; /* 0.121597 */ data _null_; x1=1-cdf('binom',40,0.45,75); x2=cdf('binom',41,0.65,75); x=x1+x2; put (x:)(=); run; /* 0.100311 */ data _null_; x1=1-cdf('binom',41,0.45,76); x2=cdf('binom',42,0.65,76); x=x1+x2; put (x:)(=); run; /* 0.096772 */ I hope this helps. Please don't hesitate to ask if you have further questions. In any case, as @ballardw has suggested, you can calculate tail probabilities fairly easily, both exactly and approxmately using equation (1), see my code above. So, for a given combination of parameters you will be able to obtain the optimal sample size with only little effort and without complicated general algorithms.

FreelanceReinh · ‎01-08-2016

Please note that the notation (0:100) stands for the set of integers {0, 1, 2, ..., 99, 100}. So, for example age=12.3 does not satisfy the condition age in (0:100), although it satisfies 0<age<100. Are you sure that all your age values are integers? You could check this with the following step: proc freq data=yourdata; where age ne intz(age); tables age; run; If the output of the above PROC FREQ contains any integer value of AGE, then you've encountered a numeric representation issue and you may want to round your AGE values, e.g. like age=round(age, 1e-9). Example: data ttt; age=9.9/3.3; /* should equal 3, mathematically */ run; proc freq data=ttt; where age ne intz(age); tables age; /* looks like 3, but is not 3! */ run;

FreelanceReinh · ‎01-08-2016

How about this? filename flist pipe 'dir C:\Temp /a-d/b/s'; /* please enter your parent folder */ data fnames; length fname $400; /* please adapt the length up/down as necessary/convenient */ infile flist; input; fname=scan(_infile_,-1,'\'); run;

FreelanceReinh · ‎01-08-2016

@Rick_SAS: Well, mean(premium/fraction) was my first suggestion, but then @EH replied "I want the premium paid by No 2 to weigh less than the premium paid by No 1" and presented an example calculation that led us to the sum(premium)/sum(fraction) solution.

FreelanceReinh · ‎01-08-2016

So, you have more variables like PREMIUM, let's say three, for example: PREMIUM, VAR2, VAR3. And let's assume for a moment that the weights ("fractions") are the same for all these variables. (Or do you have individual fractions for each of them?) In this case you could extend the DOW loop approach cited by Chris. But I'm afraid there is a correction necessary: The SET statement must be moved inside the DO-UNTIL loop (that's the trick of the DOW loop after all), otherwise we have an infinite loop. Now you can calculate the target variables, let's call them MEAN_PREMIUM, MEAN_VAR2 and MEAN_VAR3, as follows: data premium1; input id fraction premium var2 var3; cards; 1 1 1000 900 500 2 0.5 500 600 400 ; %let n=3; /* number of variables like PREMIUM */ data _null_; array p[&n] premium var2 var3; array s[&n]; array avg[&n] mean_premium mean_var2 mean_var3; do until (eof); set premium1 end=eof; do i=1 to &n; s[i]+p[i]; end; wgt+fraction; end; do i=1 to &n; avg[i] = s[i]/wgt; end; put avg(*)=; run; If you need individual fractions for each of the three variables, we could introduce a fourth array, which would be no problem at all.

FreelanceReinh · ‎01-08-2016

No, wait a minute! You don't need a macro. I can prepare an array solution for you.

Re: How to Reg on each row?! with Slope/Intercept saved out?!

Re: How to use a macro variable in a if else condition

Re: How to use a macro variable in a if else condition

Re: problem with where clause on numeric

Re: INTCK Question

Re: How to tell macro variable created or not inside PROC SQL?!

Re: INPUT not converting character to numeric

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: Proc Optmodel - output

Re: PROC SGSCATTER: MATRIX plot is missing tick values when used with...

Re: Why doesn't the "contains "yes"" and select subquery produce the s...

Re: VALIDVARNAME=V7

Re: problem with ODS in SAS EG 8.3

Re: is there a minimum file size for .sas7bdat files?

Re: How to use a macro variable in a if else condition

Re: INTCK Question

Re: How to tell macro variable created or not inside PROC SQL?!

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: IF statement not working consistently

Re: Recode values with a suffix

Re: Recode values with a suffix

Re: Calculating expression of a formula for each resample

Re: Excluding variables of varlist A from variables of varlist B

Re: count number of observations with certain values

Re: How to merge two varlists

Re: Wild Cards in Proc Contents?

Re: Time Format, no colons!

Re: Wild Cards in Proc Contents?

Re: SAS code for Sample size estimations for Binomial vs Normal propor...

Re: proc means with some conditions

Re: How to get complete file names in many folders

Re: (sort of) weighted average

Re: (sort of) weighted average

Re: (sort of) weighted average

SAS Analytics Explorers

CoDe SAS German