Hi friends,
Got a dataset Mdata.D2007_01
Want to return from the same the following fields:
Customer Product Amount
However, I need to add an additional field of Category in the end based on the figures in the last field of Amount as:
IIF(Amount < 1000000,'Category 01',
IIF(Amount < 30000000,'Category 02',
IIF(Amount < 60000000,'Category 03',
IIF(Amount < 90000000,'Category 04',
IIF(Amount < 120000000,'Category 05',
IIF(Amount < 150000000,'Category 06',
IIF(Amount < 180000000,'Category 07',
IIF(Amount < 210000000,'Category 08',
IIF(Amount < 240000000,'Category 09',
IIF(Amount < 270000000,'Category 10',
IIF(Amount < 300000000,'Category 11',
IIF(Amount < 330000000,'Category 12',
IIF(Amount < 360000000,'Category 13','Category 14')))))))))))))
and have the result be saved to a temporary dataset.
What could be the most appropriate code for the same?
Use either a format or Select statement or If/Then statements.
Are you working in SQL or a data step?
I think formats are the most efficient, especially if they're being reused in more than one place.
http://www2.sas.com/proceedings/sugi30/001-30.pdf
proc format;
value amount_category
low - <1000000 = "Category 01"
1000000 - < 30000000 = "Category 02"
....
330000000 -< 360000000 = "Category 13"
other = "Category 14";
run;
data want;;
set have;
new_variable=put(category, amount_category.);
run;
OR
proc sql;
create table want as
select *, put(category, amount_category.) as new_variable
from have;
quit;
Use either a format or Select statement or If/Then statements.
Are you working in SQL or a data step?
I think formats are the most efficient, especially if they're being reused in more than one place.
http://www2.sas.com/proceedings/sugi30/001-30.pdf
proc format;
value amount_category
low - <1000000 = "Category 01"
1000000 - < 30000000 = "Category 02"
....
330000000 -< 360000000 = "Category 13"
other = "Category 14";
run;
data want;;
set have;
new_variable=put(category, amount_category.);
run;
OR
proc sql;
create table want as
select *, put(category, amount_category.) as new_variable
from have;
quit;
as IIF is from thinking modifying data in the MS-access limitations...
Now thinking in SAS oppurtunities...
- If you have the format defined. You could possibly have no need to change data.
Many procedures (eg report/tabluate) are accepting the format being defined there and it works like the data was changed (but not really)
This approach can save a lot of time as it is bypassing the need for data conversions
Excellent reference to the Matchless Working Paper.
Sure do appreciate!
Thanx again!
By the way, is there any way to use the Put() function limits, values legend, determination as in PROC FORMAT in PROC SQL as well. I mean consider the following example where the Temp3 dataset is not created, and please see if you could help in finishing up the same more efficiently:
proc format;
value Category low -< 1 = '1'
1 -< 30 = '2'
30 -< 60 = '3'
60 -< 90 = '4'
90 -< 120 = '5'
120 -< 150 = '6'
150 -< 180 = '7'
180 -< 210 = '8'
210 -< 240 = '9'
240 -< 270 = '10'
270 -< 300 = '11'
300 -< 330 = '12'
330 -< 360 = '13'
other = '14'
;
value $ codefmt
'1'='Bucket 1'
'2'='Bucket 2'
'3'='Bucket 3'
'4'='Bucket 4'
'5'='Bucket 5'
'6'='Bucket 6'
'7'='Bucket 7'
'8'='Bucket 8'
'9'='Bucket 9'
'10'='Bucket 10'
'11'='Bucket 11'
'12'='Bucket 12'
'13'='Bucket 13'
'14'='Bucket 14'
;
run;
DATA Work.Temp2;
set FAQ.Data_2007_01;
tempcode=put(put(DPD, Category.), codefmt.);
run;
PROC SQL;
CREATE TABLE Work.Temp3 as SELECT POS, put(put(DPD, Category.), codefmt.) as Bucket from FAQ.Data_2007_01;
quit;
Have a different format that assigns the ranges directly
value buckets
low -<1 = 'Bucket 1'
1 -< 30 = 'Bucket 2'
etc.
and I'm not sure you need to create a new variable. One of the big advantages of formats is you can perform analysis or output with the format applied to the original variable and don't need new variables all of the time. Makes it easy to investigate groups by making a new format instead of new variables.
With the new Buckets format
proc freq data=faq.data_2007_01;
tables DPD;
format DPD buckets.;
run;
Suppose you later decide you want to see what happens with the data in 3 groups.
Proc format;
value threebuckets
low -< 150 = 'Big bucket 1'
150 -< 300 ='Big bucket 2'
300 - high = 'Big bucket 3'
;
run;
proc freq data=faq.data_2007_01;
tables DPD;
format DPD threebuckets.;
run;
BTW your code examples should be using $codefmt. in the put statements. Also, your current Category format is going to assign missing to 14. If you don't have any missing then no problem but something to consider when using 'other'. You may actually want 360 - high = '14'.
Very nice, the idea of better performance limiting computer resource usage by using Formats is being got
Having them defined could be a very handy one wiht using the full (SAS) dataset.
The next level of this is Multi label formats. Using different classification / format recodings on the same data.
Base SAS(R) 9.3 Procedures Guide, Second Edition ( Example 17: Using Multilabel Formats )
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.