BookmarkSubscribeRSS Feed
GVeers
Calcite | Level 5
Hey guys,

I figured out the following way to get my data to sum and print exactly the way I want it. However, it is quite inelegant coding and I could use some input on better ways.

Basically I'm using PROC SQL with a CASE statement to summarize my data in a particular way, and then using PROC FORMAT to make sure the data groupings are ordered properly. As I say, the following code works, but it seems pretty clunky.

proc format;
value $Cost '0' = 'a'
'1 - 4,500' = 'b'
'4,501 - 6,000' = 'c'
run;

proc sql;
SELECT CASE
WHEN missing(ORIG_COST) THEN '.'
WHEN ORIG_COST = 0 THEN '0'
WHEN ORIG_COST BETWEEN 1 AND 4500 THEN '1 - 4,500'
WHEN ORIG_COST BETWEEN 4501 AND 6000 THEN '4,501 - 6,000'
END AS Cost,
Count(*)
FROM CL.Cleaned_cauto_2007
GROUP BY Cost ORDER BY put(Cost, $Cost.);
quit;
5 REPLIES 5
GVeers
Calcite | Level 5
Oh, and I should also note this is just a sample. The reason I need the PROC FORMAT is because there are a lot more ranges in my real data and I end up getting ugly results like '10,001 - 25,000' showing up before '6,001 - 10,000' (for example).
data_null__
Jade | Level 19
You can achieve that result of a VALUE statement option and two OPTIONS in PROC SUMMARY.

[pre]
proc format;
value Cost(notsorted)
0 = '0'
1-4500 = '1-45,00'
4500-6000 = '4,500-6,000'
6000-10000 = '6,000-10,000'
10000-11000 = '10,000-11,000';
;
run;

proc plan seed=1169604082;
factors r=100 ordered cost = 1 of 11000 / noprint;
output out=cost;
Run;
proc summary data=cost nway;
class cost / order=data preloadfmt;
format cost cost.;
output out=test;
run;
proc print;
run;[/pre]
GVeers
Calcite | Level 5
Thanks, dall_null. I appreciate the response, particularly since I did not know about the power of PROC SUMMARY.

Still, I think this doesn't quite help me. My data set has a number of variables that I need to seperate into ranges, output with a specific character field, and print in numeric order. I think the way you describe will require sorting the data for each variable I need to output, which seems about as efficient as my approach.

I guess what would be ideal is if there was some procedure that could assign a label to a variable (like PROC FORMAT) but also assign a value for sorting purposes. If anyone knows of such a thing, I'd love to hear it, but I can stick to what I have as well.

Thanks!
data_null__
Jade | Level 19
How about an option to create an index variable associated with the levels of the CLASS variable(s).

[pre]
dm 'clear log; clear output;';
proc format;
value Cost(notsorted)
0 = '0'
1-4500 = '1-45,00'
4500-6000 = '4,500-6,000'
6000-10000 = '6,000-10,000'
10000-11000 = '10,000-11,000';
;
value xCost(notsorted)
0 = '0'
3500-7000 = '3,500-7,000' /*for example*/
1-3500 = '1-3,400'
7000-10000 = '6,000-10,000'
10000-11000 = '10,000-11,000';
;

run;

proc plan seed=1169604082;
factors r=100 ordered cost = 1 of 11000 / noprint;
output out=cost;
Run;
data cost;
set cost;
xcost = cost;
run;
proc summary data=cost;
class cost xcost/ order=data preloadfmt;
ways 1;
format cost cost. xcost xcost.;
output out=test / levels;
run;
proc print;
run;
proc print;
format _all_;
run;
[/pre]
GVeers
Calcite | Level 5
Thanks again, data_null. That could do it.
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1859 views
  • 0 likes
  • 2 in conversation