Solved: Is there a way to create a dynamic format more efficiently?

Hello_there · Posted 01-30-2023 12:45 PM

Hi,

I'm doing an efficiency exercise for data manipulation. Let's say I'm given the task of doing a descriptive summary that involves only taking the mean and standard deviation. And there is a rule I have to follow. The rule is that I'm supposed to extend out 1 decimal place more than the original data's highest decimal length value for the mean, and 2 decimal places more for the standard deviation. So if the maximum decimal length extends out 3 decimals places, it should look something like this:

meanSD: x.xxxx (x.xxxxx)

I've heard about the putn function, but haven't been able to find many examples online on how to use it.

This is what I've come up with below. The reason I would like a dynamic format is so that in case the data changes and the amount of decimals of the original data changes, and I have to re-run my program, my logic will account for it.

This is what I was able to come up with so far.

data have; 
infile datalines dsd dlm=",";
	input val;
datalines;
54.5
67.657
578.2
284.5
546.32
4687.2
6584.5
567.3
498.1
89.31
;
run;

data have1; set have;
valchar=put(val,best.);
decimal=scan(valchar, 2, "."); 
run;

proc sql;
	select max(length(decimal)) into: maxdec trimmed
	from have1;
quit;
%put The maximum number of decimals for this data is &maxdec.;

proc means data=have mean std noprint;
var val;
output out=a (drop=_:) mean=mean std=std;
run;

* mean and std rule;
%let mean=1;
%let std=2;

data want; set a;
	meanSD=strip(put(round(mean, 10**(%eval(-&maxdec +&mean))), 32.%eval(&maxdec+&mean))) || " (" ||
	strip(put(round(std, 10**(%eval(-&maxdec +&std))), 32.%eval(&maxdec+&std))) || ")";
run;

Thanks

PGStats · Posted 01-30-2023 03:28 PM

I would go for something like this:

%let var1=val;

data have; 
input &var1.chr $12.;
&var1. = input(&var1.chr, best.);
&var1.dec = max(0, length(&var1.chr) - indexc(&var1.chr, ".", " "));
drop &var1.chr;
datalines;
54.5
67.657
578.2
284.5
546.32
4687.2
6584.5
567.3
498.1
89.31
25
;

/* Set a limit to the max nb of decimals, 
   to safeguard against calculated values such as 1.3333333333 */
%let maxdec=4;

proc sql noprint;
select 
    1 + min(&maxdec., max(&var1.dec)) as d,
    1 + calculated d
    into : fmtavg&var1. trimmed, : fmtstd&var1. trimmed
from have;
quit;

proc sql;
select 
    mean(&var1.) format=12.&&fmtavg&var1. label="Mean",
    std(&var1.) format=12.&&fmtstd&var1. label="Std. Dev."
from have;
quit;

PG

View solution in original post

ballardw · Posted 01-30-2023 01:28 PM

You are going to have a hard time with your rules about decimal places. If you read the data as numeric then the value is numeric and there isn't really any natural end of decimal places stored. Additionally if you have lots of decimal places in your data you may run into issues with storage precision in 8 bytes.

So the only way I see to accurately follow your rules would be to also read your source data as a character value to count decimal places.

Note, if your external data source has already treated values a numeric then you may already be hosed as they other system may well have already rounded values, added/removed positions treated as significant.

If you want to see an example of what I mean instead of 67.657 the value 67.6570. Which would imply that last 0 is significant for some reason or the source wouldn't have used that last 0. But your approach would not count that, or any other trailing 0.

Instead of exponentiation and rounding let the format do that for you. Look into the CAT, CATS, CATX and CATQ functions instead of || operator. You can generally get away from having to STRIP, LEFT or TRIM as many values.

data want; set a;
	meanSD= catx(' ',put(mean,16.%eval(&maxdec.+1)),cats('(',put(std,16.%eval(&maxdec.+2)),')'));
run;

Hello_there · Posted 01-30-2023 03:07 PM

Maybe it was my imagination, but I thought I needed the Round() function because PUT() function just cuts the numbers off regardless if the next digit is supposed to round the last presentable digit up. But I will experiment with it and see if that's the case.

And thanks for the CAT tip, I will do that more often than using the pipes.

Tom · Posted 01-30-2023 02:55 PM

You cannot count on being able to determine the number of decimal places from looking at the actual data. What if everything happened to be an integer?

Store the number of decimal places with the other metadata about the variable.

Then you can use that value with either the PUTN() function or perhaps just a ROUND() function.

Hello_there · Posted 01-30-2023 03:04 PM

Thanks for your response, Tom.

The PUTN() function is a little bit of a mystery to me. I read the documentation on it and it still isn't clear to me how that can help me and how that's different than the PUT() function. Would you be able to offer an example and explain the difference?

Tom · Posted 01-30-2023 04:42 PM

@Hello_there wrote:
Thanks for your response, Tom.

The PUTN() function is a little bit of a mystery to me. I read the documentation on it and it still isn't clear to me how that can help me and how that's different than the PUT() function. Would you be able to offer an example and explain the difference?

The PUT() function requires the format to be known at compile time. That is why it works for either numeric or character values. The PUTN() and PUTC() can wait until execution time to determine the format to apply, but that means you have to tell the compiler which type of values to expect by using PUTN() or PUTC().

So what this means is that the for the SECOND argument the PUT() function was an actual format specification. Such as COMMA12. and the PUTN() function wants a character string that has the format specification, such as "COMM12.".

So if you have a variable named MAXDEC you can use it to generate the string you pass to PUTN().

Example:

data test;
  do x=0.1,1,10 ;
    do maxdec=0,1,2 ;
      string=putn(x,cats('5.',maxdec));
      output;
    end;
  end;
run;

OBS       x    maxdec    string

 1      0.1       0          0
 2      0.1       1        0.1
 3      0.1       2       0.10
 4      1.0       0          1
 5      1.0       1        1.0
 6      1.0       2       1.00
 7     10.0       0         10
 8     10.0       1       10.0
 9     10.0       2      10.00

Hello_there · Posted 01-30-2023 05:24 PM

Tom, thanks for the explanation and the example! This helps my understanding.

PGStats · Posted 01-30-2023 03:28 PM

I would go for something like this:

%let var1=val;

data have; 
input &var1.chr $12.;
&var1. = input(&var1.chr, best.);
&var1.dec = max(0, length(&var1.chr) - indexc(&var1.chr, ".", " "));
drop &var1.chr;
datalines;
54.5
67.657
578.2
284.5
546.32
4687.2
6584.5
567.3
498.1
89.31
25
;

/* Set a limit to the max nb of decimals, 
   to safeguard against calculated values such as 1.3333333333 */
%let maxdec=4;

proc sql noprint;
select 
    1 + min(&maxdec., max(&var1.dec)) as d,
    1 + calculated d
    into : fmtavg&var1. trimmed, : fmtstd&var1. trimmed
from have;
quit;

proc sql;
select 
    mean(&var1.) format=12.&&fmtavg&var1. label="Mean",
    std(&var1.) format=12.&&fmtstd&var1. label="Std. Dev."
from have;
quit;

PG

Hello_there · Posted 01-30-2023 05:25 PM

Thanks for taking time to write this! This gives me an alternative view and teaches me some things that are new to me.

Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Re: Is there a way to create a dynamic format more efficiently?

Classroom Training Available!