BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emjay
Obsidian | Level 7

I want to use proc format to transform my numerical variable into a range variable. This works fine.

 

But if I want to do some further analysis with this new range variable I have the problem that proc format has created duplicates but I don't see why.

 

So let's say I have the following:

 

data have; 
input ID KBZ; 
datalines;
1 0.25
2 0.33
3 1.2
4 5.8
5 3.2
6 3.8
7 3.44
8 4.7
;
run;

proc format;
value KBZrange (FUZZ=0)
low-1 = "<=1" 
1<-2 = "1-2" 
2<-3 = "2-3"
3<-4 = "3-4"
4<-5 = "4-5"
5<-high = ">=5";
quit;

data have;
set have;
format KBZ_range KBZrange.;
KBZ_range=KBZ;
run;

proc sql;
create table why_dup as
select distinct KBZ_range from have;
quit;

Why are there now this extra rows although they look the same to me?

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

No. PROC SQL considers the actual values of KBZ_range. Not the formated values of KBZ_range. Right now, KBZ_range is exactly equal to KBZ. They just look different because you gave it a different format. 

 

If you do like below however, then you give KBZ_range the values of the ranges you specify in PROC SORT (the actual strings), and you get 5 obs

 

data have;
set have;
KBZ_range=put(KBZ, KBZrange.);
run;

proc sql;
create table why_dup as
select distinct KBZ_range from have;
quit;

Result:

 

KBZ_Range
1-2
3-4
4-5
<=1
>=5

Makes sense?

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

I don't see any extra rows? Your initial data has 8 rows and so does why_dup?

Emjay
Obsidian | Level 7

Yes it has 8 rows, but shouldn't it have only 5 as I wrote select distinct?

PeterClemmensen
Tourmaline | Level 20

No. PROC SQL considers the actual values of KBZ_range. Not the formated values of KBZ_range. Right now, KBZ_range is exactly equal to KBZ. They just look different because you gave it a different format. 

 

If you do like below however, then you give KBZ_range the values of the ranges you specify in PROC SORT (the actual strings), and you get 5 obs

 

data have;
set have;
KBZ_range=put(KBZ, KBZrange.);
run;

proc sql;
create table why_dup as
select distinct KBZ_range from have;
quit;

Result:

 

KBZ_Range
1-2
3-4
4-5
<=1
>=5

Makes sense?

Emjay
Obsidian | Level 7

Yes, makes sense. 🙂

 

Now I also understand why KBZ_range still had the type numeric and this blue circle next to it. I thought this was because sas didn't know how to display my custom format.

 

Thanks!

PeterClemmensen
Tourmaline | Level 20

Spot on! SAS knows exactly how to display your custom format 🙂

 

Anytime, glad you found your answer.

ballardw
Super User

@Emjay wrote:

Yes, makes sense. 🙂

 

Now I also understand why KBZ_range still had the type numeric and this blue circle next to it. I thought this was because sas didn't know how to display my custom format.

 

Thanks!


Consider:

 

proc summary data=have ;
   class kbz;
   format kbz kbzrange.;
   output out=work.summary  ;
run;

You will find the groups created by the format are honored by other procedures, though you might be surprised sometimes what the actual value (unformatted) kept with Proc Summary output.

 

 

Emjay
Obsidian | Level 7

You are right! And that's exactly why I was so confused. Cause I used my original approach with a proc freq a while ago and there it also worked fine... But now I know. 🙂 

ballardw
Super User

@Emjay wrote:

You are right! And that's exactly why I was so confused. Cause I used my original approach with a proc freq a while ago and there it also worked fine... But now I know. 🙂 


I use formats to validate some long lists of values and sometime use things like

 

if put(variable, validformatname.) = 'Valid' then <what I wanted to do>;
Else do <whatever when the value isn't on the valid list>;

You should be able to use:

proc sql;
create table why_dup as
select distinct put(KBZ,KBZrange. ) as KBZ_range from have;
quit;

 

 

Emjay
Obsidian | Level 7

I've seen this format approach in if-statements but I never understood the benefit.

 

What would be the reason to rather write:

 

proc format;
value KBZ2range (FUZZ=0)
low-1 = "valid" 
1<-2 = "nv" 
2<-3 = "valid"
3<-4 = "nv"
4<-5 = "valid"
5<-high = "nv";
quit;

data foo;
set have;
if put(KBZ,KBZ2range.)='valid' then check='yes'; 
else check='no';
run;

Instead of:

 

data foo2;
set have;
if (KBZ<=1 or 2<KBZ<=3 or 4<KBZ<=5) then check='yes'; 
else check='no';
run;

A lot of extra typing...

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2750 views
  • 0 likes
  • 3 in conversation