Hi,
I'm doing an efficiency exercise for data manipulation. Let's say I'm given the task of doing a descriptive summary that involves only taking the mean and standard deviation. And there is a rule I have to follow. The rule is that I'm supposed to extend out 1 decimal place more than the original data's highest decimal length value for the mean, and 2 decimal places more for the standard deviation. So if the maximum decimal length extends out 3 decimals places, it should look something like this:
meanSD: x.xxxx (x.xxxxx)
I've heard about the putn function, but haven't been able to find many examples online on how to use it.
This is what I've come up with below. The reason I would like a dynamic format is so that in case the data changes and the amount of decimals of the original data changes, and I have to re-run my program, my logic will account for it.
This is what I was able to come up with so far.
data have;
infile datalines dsd dlm=",";
input val;
datalines;
54.5
67.657
578.2
284.5
546.32
4687.2
6584.5
567.3
498.1
89.31
;
run;
data have1; set have;
valchar=put(val,best.);
decimal=scan(valchar, 2, ".");
run;
proc sql;
select max(length(decimal)) into: maxdec trimmed
from have1;
quit;
%put The maximum number of decimals for this data is &maxdec.;
proc means data=have mean std noprint;
var val;
output out=a (drop=_:) mean=mean std=std;
run;
* mean and std rule;
%let mean=1;
%let std=2;
data want; set a;
meanSD=strip(put(round(mean, 10**(%eval(-&maxdec +&mean))), 32.%eval(&maxdec+&mean))) || " (" ||
strip(put(round(std, 10**(%eval(-&maxdec +&std))), 32.%eval(&maxdec+&std))) || ")";
run;
Thanks
I would go for something like this:
%let var1=val;
data have;
input &var1.chr $12.;
&var1. = input(&var1.chr, best.);
&var1.dec = max(0, length(&var1.chr) - indexc(&var1.chr, ".", " "));
drop &var1.chr;
datalines;
54.5
67.657
578.2
284.5
546.32
4687.2
6584.5
567.3
498.1
89.31
25
;
/* Set a limit to the max nb of decimals,
to safeguard against calculated values such as 1.3333333333 */
%let maxdec=4;
proc sql noprint;
select
1 + min(&maxdec., max(&var1.dec)) as d,
1 + calculated d
into : fmtavg&var1. trimmed, : fmtstd&var1. trimmed
from have;
quit;
proc sql;
select
mean(&var1.) format=12.&&fmtavg&var1. label="Mean",
std(&var1.) format=12.&&fmtstd&var1. label="Std. Dev."
from have;
quit;
You are going to have a hard time with your rules about decimal places. If you read the data as numeric then the value is numeric and there isn't really any natural end of decimal places stored. Additionally if you have lots of decimal places in your data you may run into issues with storage precision in 8 bytes.
So the only way I see to accurately follow your rules would be to also read your source data as a character value to count decimal places.
Note, if your external data source has already treated values a numeric then you may already be hosed as they other system may well have already rounded values, added/removed positions treated as significant.
If you want to see an example of what I mean instead of 67.657 the value 67.6570. Which would imply that last 0 is significant for some reason or the source wouldn't have used that last 0. But your approach would not count that, or any other trailing 0.
Instead of exponentiation and rounding let the format do that for you. Look into the CAT, CATS, CATX and CATQ functions instead of || operator. You can generally get away from having to STRIP, LEFT or TRIM as many values.
data want; set a; meanSD= catx(' ',put(mean,16.%eval(&maxdec.+1)),cats('(',put(std,16.%eval(&maxdec.+2)),')')); run;
You cannot count on being able to determine the number of decimal places from looking at the actual data. What if everything happened to be an integer?
Store the number of decimal places with the other metadata about the variable.
Then you can use that value with either the PUTN() function or perhaps just a ROUND() function.
@Hello_there wrote:
Thanks for your response, Tom.
The PUTN() function is a little bit of a mystery to me. I read the documentation on it and it still isn't clear to me how that can help me and how that's different than the PUT() function. Would you be able to offer an example and explain the difference?
The PUT() function requires the format to be known at compile time. That is why it works for either numeric or character values. The PUTN() and PUTC() can wait until execution time to determine the format to apply, but that means you have to tell the compiler which type of values to expect by using PUTN() or PUTC().
So what this means is that the for the SECOND argument the PUT() function was an actual format specification. Such as COMMA12. and the PUTN() function wants a character string that has the format specification, such as "COMM12.".
So if you have a variable named MAXDEC you can use it to generate the string you pass to PUTN().
Example:
data test;
do x=0.1,1,10 ;
do maxdec=0,1,2 ;
string=putn(x,cats('5.',maxdec));
output;
end;
end;
run;
OBS x maxdec string 1 0.1 0 0 2 0.1 1 0.1 3 0.1 2 0.10 4 1.0 0 1 5 1.0 1 1.0 6 1.0 2 1.00 7 10.0 0 10 8 10.0 1 10.0 9 10.0 2 10.00
I would go for something like this:
%let var1=val;
data have;
input &var1.chr $12.;
&var1. = input(&var1.chr, best.);
&var1.dec = max(0, length(&var1.chr) - indexc(&var1.chr, ".", " "));
drop &var1.chr;
datalines;
54.5
67.657
578.2
284.5
546.32
4687.2
6584.5
567.3
498.1
89.31
25
;
/* Set a limit to the max nb of decimals,
to safeguard against calculated values such as 1.3333333333 */
%let maxdec=4;
proc sql noprint;
select
1 + min(&maxdec., max(&var1.dec)) as d,
1 + calculated d
into : fmtavg&var1. trimmed, : fmtstd&var1. trimmed
from have;
quit;
proc sql;
select
mean(&var1.) format=12.&&fmtavg&var1. label="Mean",
std(&var1.) format=12.&&fmtstd&var1. label="Std. Dev."
from have;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.