I want to use proc format to transform my numerical variable into a range variable. This works fine.
But if I want to do some further analysis with this new range variable I have the problem that proc format has created duplicates but I don't see why.
So let's say I have the following:
data have;
input ID KBZ;
datalines;
1 0.25
2 0.33
3 1.2
4 5.8
5 3.2
6 3.8
7 3.44
8 4.7
;
run;
proc format;
value KBZrange (FUZZ=0)
low-1 = "<=1"
1<-2 = "1-2"
2<-3 = "2-3"
3<-4 = "3-4"
4<-5 = "4-5"
5<-high = ">=5";
quit;
data have;
set have;
format KBZ_range KBZrange.;
KBZ_range=KBZ;
run;
proc sql;
create table why_dup as
select distinct KBZ_range from have;
quit;
Why are there now this extra rows although they look the same to me?
No. PROC SQL considers the actual values of KBZ_range. Not the formated values of KBZ_range. Right now, KBZ_range is exactly equal to KBZ. They just look different because you gave it a different format.
If you do like below however, then you give KBZ_range the values of the ranges you specify in PROC SORT (the actual strings), and you get 5 obs
data have;
set have;
KBZ_range=put(KBZ, KBZrange.);
run;
proc sql;
create table why_dup as
select distinct KBZ_range from have;
quit;
Result:
KBZ_Range 1-2 3-4 4-5 <=1 >=5
Makes sense?
I don't see any extra rows? Your initial data has 8 rows and so does why_dup?
Yes it has 8 rows, but shouldn't it have only 5 as I wrote select distinct?
No. PROC SQL considers the actual values of KBZ_range. Not the formated values of KBZ_range. Right now, KBZ_range is exactly equal to KBZ. They just look different because you gave it a different format.
If you do like below however, then you give KBZ_range the values of the ranges you specify in PROC SORT (the actual strings), and you get 5 obs
data have;
set have;
KBZ_range=put(KBZ, KBZrange.);
run;
proc sql;
create table why_dup as
select distinct KBZ_range from have;
quit;
Result:
KBZ_Range 1-2 3-4 4-5 <=1 >=5
Makes sense?
Yes, makes sense. 🙂
Now I also understand why KBZ_range still had the type numeric and this blue circle next to it. I thought this was because sas didn't know how to display my custom format.
Thanks!
Spot on! SAS knows exactly how to display your custom format 🙂
Anytime, glad you found your answer.
@Emjay wrote:
Yes, makes sense. 🙂
Now I also understand why KBZ_range still had the type numeric and this blue circle next to it. I thought this was because sas didn't know how to display my custom format.
Thanks!
Consider:
proc summary data=have ; class kbz; format kbz kbzrange.; output out=work.summary ; run;
You will find the groups created by the format are honored by other procedures, though you might be surprised sometimes what the actual value (unformatted) kept with Proc Summary output.
You are right! And that's exactly why I was so confused. Cause I used my original approach with a proc freq a while ago and there it also worked fine... But now I know. 🙂
@Emjay wrote:
You are right! And that's exactly why I was so confused. Cause I used my original approach with a proc freq a while ago and there it also worked fine... But now I know. 🙂
I use formats to validate some long lists of values and sometime use things like
if put(variable, validformatname.) = 'Valid' then <what I wanted to do>; Else do <whatever when the value isn't on the valid list>;
You should be able to use:
proc sql; create table why_dup as select distinct put(KBZ,KBZrange. ) as KBZ_range from have; quit;
I've seen this format approach in if-statements but I never understood the benefit.
What would be the reason to rather write:
proc format;
value KBZ2range (FUZZ=0)
low-1 = "valid"
1<-2 = "nv"
2<-3 = "valid"
3<-4 = "nv"
4<-5 = "valid"
5<-high = "nv";
quit;
data foo;
set have;
if put(KBZ,KBZ2range.)='valid' then check='yes';
else check='no';
run;
Instead of:
data foo2;
set have;
if (KBZ<=1 or 2<KBZ<=3 or 4<KBZ<=5) then check='yes';
else check='no';
run;
A lot of extra typing...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.