Hi,
I have searched far and wide and haven't been able to find an answer to the following:
I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:
Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g.
TEST | RESULT
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g.
best format for
TEST1 = character with length 7
TEST2 = float with length 6.3
TEST3 = integer
?
Only thing that jumps to mind:
I guess my question is why?
@_phintaC_ wrote:
Hi,
I have searched far and wide and haven't been able to find an answer to the following:
I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:
Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g.
TEST | RESULT
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g.
best format for
TEST1 = character with length 7
TEST2 = float with length 6.3
TEST3 = integer
?
Character variables basically have one format $. So no such thing a "integer" "float" or similar.
The bit that you have some values that appear to want to display in a more numeric method means that perhaps you have the wrong variable type. You appear to want a numeric value for most of the values that has the option of displaying text "Unknown". That would be a custom format where instead of a value of "Unknown" you have a missing value, possibly a special missing (.U perhaps) and a BEST for the remaining.
A custom informat can read text into numeric values such as:
proc format; invalue result (upcase) 'UNKNOWN'=.U ; value result .U='Unknown' other=[best8.] ; run; data example; input result :result.; datalines; UNKNOWN 10.411 1 10.1 10.334 11 ; proc print data=example; format result result.; run;
The special missing values .A to .Z and ._ can represent different reasons that data is missing and a format can provide a description as desired. I have included an example of an Informat that reads your values into numeric and a matching format. The Other=[best8.] means any value other than the .U will display in the best manner to fit within 8 characters.
When no value is explicitly provided in the Invalue statement it is assumed to be the default for the variable type, so treats the numeric values as if read with a 12. informat.
@_phintaC_ wrote:
Hi,
I have searched far and wide and haven't been able to find an answer to the following:
I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:
Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g.
TEST | RESULT
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g.
best format for
TEST1 = character with length 7
TEST2 = float with length 6.3
TEST3 = integer
?
That is essentially the exercise I went through in creating this macro:
https://github.com/sasutils/macros/blob/master/csv2ds.sas
So using the idea of converting your tall file into a wide file.
data have;
infile cards dlm='|';
id = ceil(_n_/3);
input name :$32. value :$200.;
cards;
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
;
proc transpose data=have out=wide(drop=_name_);
by id;
id name;
var value;
run;
filename csv temp;
data _null_ ;
file csv dsd ;
set have;
where id=1;
put name @;
run;
data _null_;
set wide (drop=id);
file csv dsd mod ;
put (_all_) (+0);
run;
%csv2ds(csv,out=test,replace=1);
proc print data=_types_;
run;
m m i i a d a n n x n n i a e n v l f f l l o u n t y m d 8 y a x e o o l e e n m c t e y m d 6 d r n t t n r r a n n m e o e d t t m d m 0 t O n a y y g m m b g g m m i r m g a i i m d m 1 d b u m p p t a a e t t i a s i m e t m m d y y d t s m e e e h t t l h h n x s c a r e e e d y y z m 1 1 TEST1 character char $7 4 7 10.1 UNKNOWN 2 1 1 0 0 0 0 0 0 0 0 0 2 2 TEST2 numeric num 8 6 6 10.334 10.411 2 2 2 0 0 0 0 0 0 0 0 2 3 3 TEST3 integer num 8 1 2 1 11 2 2 2 2 0 0 0 0 0 0 0 0
Code generated to read the CSV file.
data test;
infile CSV dlm=',' dsd truncover firstobs=2 ;
length TEST1 $7 TEST2 8 TEST3 8 ;
input TEST1 -- TEST3 ;
run;
Given data like this
data have;
infile cards dlm='|';
input name :$32. value :$200.;
cards;
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
;run;
You can check the values you want like this:
First scan for length and decimals:
data test;
set have;
prxid=prxparse('/^\d+\.?(\d*)$/');
if prxmatch(prxid,trim(value)) then do;
length=length(trim(value));
decimals=lengthn(prxposn(prxid,1,value));
end;
keep name length decimals;
run;
The PRX expression looks for a string consisting of one or more digits, a possible period, and then zero or more digits after that (and nothing else). I used the TRIM function to guard against leading blanks.
Now you just have to find the maximum values:
proc summary data=test nway;
class name;
var length decimals;
output out=want (drop=_:) max=;
run;
Edit note: I changed the second LENGTH function to LENGTHN, so that zero decimals will not be reported as one.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.