Hi,
I have searched far and wide and haven't been able to find an answer to the following:
I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:
Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g.
TEST | RESULT
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g.
best format for
TEST1 = character with length 7
TEST2 = float with length 6.3
TEST3 = integer
?
Only thing that jumps to mind:
I guess my question is why?
@_phintaC_ wrote:
Hi,
I have searched far and wide and haven't been able to find an answer to the following:
I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:
Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g.
TEST | RESULT
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g.
best format for
TEST1 = character with length 7
TEST2 = float with length 6.3
TEST3 = integer
?
Character variables basically have one format $. So no such thing a "integer" "float" or similar.
The bit that you have some values that appear to want to display in a more numeric method means that perhaps you have the wrong variable type. You appear to want a numeric value for most of the values that has the option of displaying text "Unknown". That would be a custom format where instead of a value of "Unknown" you have a missing value, possibly a special missing (.U perhaps) and a BEST for the remaining.
A custom informat can read text into numeric values such as:
proc format; invalue result (upcase) 'UNKNOWN'=.U ; value result .U='Unknown' other=[best8.] ; run; data example; input result :result.; datalines; UNKNOWN 10.411 1 10.1 10.334 11 ; proc print data=example; format result result.; run;
The special missing values .A to .Z and ._ can represent different reasons that data is missing and a format can provide a description as desired. I have included an example of an Informat that reads your values into numeric and a matching format. The Other=[best8.] means any value other than the .U will display in the best manner to fit within 8 characters.
When no value is explicitly provided in the Invalue statement it is assumed to be the default for the variable type, so treats the numeric values as if read with a 12. informat.
@_phintaC_ wrote:
Hi,
I have searched far and wide and haven't been able to find an answer to the following:
I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:
Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g.
TEST | RESULT
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g.
best format for
TEST1 = character with length 7
TEST2 = float with length 6.3
TEST3 = integer
?
That is essentially the exercise I went through in creating this macro:
https://github.com/sasutils/macros/blob/master/csv2ds.sas
So using the idea of converting your tall file into a wide file.
data have;
infile cards dlm='|';
id = ceil(_n_/3);
input name :$32. value :$200.;
cards;
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
;
proc transpose data=have out=wide(drop=_name_);
by id;
id name;
var value;
run;
filename csv temp;
data _null_ ;
file csv dsd ;
set have;
where id=1;
put name @;
run;
data _null_;
set wide (drop=id);
file csv dsd mod ;
put (_all_) (+0);
run;
%csv2ds(csv,out=test,replace=1);
proc print data=_types_;
run;
m m i i a d a n n x n n i a e n v l f f l l o u n t y m d 8 y a x e o o l e e n m c t e y m d 6 d r n t t n r r a n n m e o e d t t m d m 0 t O n a y y g m m b g g m m i r m g a i i m d m 1 d b u m p p t a a e t t i a s i m e t m m d y y d t s m e e e h t t l h h n x s c a r e e e d y y z m 1 1 TEST1 character char $7 4 7 10.1 UNKNOWN 2 1 1 0 0 0 0 0 0 0 0 0 2 2 TEST2 numeric num 8 6 6 10.334 10.411 2 2 2 0 0 0 0 0 0 0 0 2 3 3 TEST3 integer num 8 1 2 1 11 2 2 2 2 0 0 0 0 0 0 0 0
Code generated to read the CSV file.
data test;
infile CSV dlm=',' dsd truncover firstobs=2 ;
length TEST1 $7 TEST2 8 TEST3 8 ;
input TEST1 -- TEST3 ;
run;
Given data like this
data have;
infile cards dlm='|';
input name :$32. value :$200.;
cards;
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1
TEST2 | 10.334
TEST3 | 11
;run;
You can check the values you want like this:
First scan for length and decimals:
data test;
set have;
prxid=prxparse('/^\d+\.?(\d*)$/');
if prxmatch(prxid,trim(value)) then do;
length=length(trim(value));
decimals=lengthn(prxposn(prxid,1,value));
end;
keep name length decimals;
run;
The PRX expression looks for a string consisting of one or more digits, a possible period, and then zero or more digits after that (and nothing else). I used the TRIM function to guard against leading blanks.
Now you just have to find the maximum values:
proc summary data=test nway;
class name;
var length decimals;
output out=want (drop=_:) max=;
run;
Edit note: I changed the second LENGTH function to LENGTHN, so that zero decimals will not be reported as one.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.