BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phueltank
Fluorite | Level 6

Hello and thanks for taking the time to help. I'm fairly new to using SAS, so don't have the best technical background. I'm running SAS EG with base 9.4, and am having to write some programs from scratch to get where I need to go. Here's my business scenario:

I'm analyzing performance of ~6,000 individuals across a range of 18 measures. The measures reflect the most recent 13 weeks of performance, and are created weekly.

 

My data is in the following format

 

Name        Type        Length        Format 
Report_Week Date             8        MMDDYY10. 
Alias       Character        4 
Role        Character       32 
Measure_1   Numeric          8 
...

Measure_18  Numeric          8

 

 

I'm using Proc UNIVARIATE to calculate the 10th and 90th percentile, then splitting the interior range into 8 even buckets for a total of 10 buckets for each measure||Role||Report_Week. I then use Proc SQL to create the necessary formatting fields and Proc SORT to ensure they are in the correct order. The final set of buckets looks like this...(not including all 1440 rows)

 

fmtname	        Start	        End	        Label	                SEXCY	EEXCY	HLO
$D2TCA020419A	0.00001	        0.9879518072	< 98.80%			N		Y		L
$D2TCA020419A	0.9879518072	0.9894578313	98.80% to 98.95%	N		Y	
$D2TCA020419A	0.9894578313	0.9909638554	98.95% to 99.10%	N		Y	
$D2TCA020419A	0.9909638554	0.9924698795	99.10% to 99.25%	N		Y	
$D2TCA020419A	0.9924698795	0.9939759036	99.25% to 99.40%	N		Y	
$D2TCA020419A	0.9939759036	0.9954819277	99.40% to 99.55%	N		Y	
$D2TCA020419A	0.9954819277	0.9969879518	99.55% to 99.70%	N		Y	
$D2TCA020419A	0.9969879518	0.9984939759	99.70% to 99.85%	N		Y	
$D2TCA020419A	0.9984939759	0.9999	        99.85% to 99.99%	N		Y	
$D2TCA020419A	0.9999			1000000			> 99.99%			Y		Y		H
$D2TCA021119A	0.00001			0.9879759519	< 98.80%			N		Y		L
$D2TCA021119A	0.9879759519	0.9894789579	98.80% to 98.95%	N		Y	
$D2TCA021119A	0.9894789579	0.9909819639	98.95% to 99.10%	N		Y	
$D2TCA021119A	0.9909819639	0.9924849699	99.10% to 99.25%	N		Y	

I want to apply the labels to the dataset containing the actual results, but I get an error when trying to write in the format buckets:

 

229        proc format cntlin=localsbx.bucket_format;
ERROR: Start is greater than end: -.
230        run;

I can't seem to figure out what I'm doing wrong. I've exported the bucket_format table to excel and tested the Start and End fields, verifying there are no cases where Start is greater than end.

 

I've attached the full SAS code used to generate the buckets. Any assistance you can provide would be greatly appreciated, and if I'm going about this all wrong, I'm eager to learn how to do this "the right way".

 

Thanks again,

 

Frank 

1 ACCEPTED SOLUTION

Accepted Solutions
Phueltank
Fluorite | Level 6
on the suggestion of a coworker, I removed the "$" from the format name and everything worked as intended.

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Use PROC RANK with option GROUPS=10, which produces 10 buckets based on percentiles.

 

Then all of this programming effort is done for you inside of PROC RANK.

 

If you really need labels (like "99.85% to 99.99%") on the groups, this is computed by finding the min and max of each group created, and then turning this min and max into a format. Even so, it would be easier to produce a label like "90-100 percentile" rather than the actual values of "99.85% to 99.99%".

--
Paige Miller
Phueltank
Fluorite | Level 6
Thanks Paige, this was my original approach, but the user wants to see how the buckets change over time. additionally, they wanted consistent sized buckets from a measure perspective, as opposed to a associate perspective.
PaigeMiller
Diamond | Level 26

@Phueltank wrote:
Thanks Paige, this was my original approach, but the user wants to see how the buckets change over time

This has nothing to do with using (or not using) PROC RANK. You can store the results from PROC RANK (or PROC UNIVARIATE) over time and compare them.

 

additionally, they wanted consistent sized buckets from a measure perspective, as opposed to a associate perspective.

 

I don't know exactly what this means. Can you give an example of the "consistent sized buckets" you want?

--
Paige Miller
Phueltank
Fluorite | Level 6

in a percentile ranking, there are an equal number of people in each of the 10 buckets and the bucket ends change in order to obtain this.

the buckets I'm creating need to each have the same range. there's not much of a difference when looking at the percentage of tasks that were completed (this is the data I used for illustration), but it makes a difference when looking at the number of tasks completed.

 

week      P10      P90        bucket range

2/4/19     40.25   244.50    25.53

2/11/19   42.75   251.20    26.06

2/18/19   41.67   247.50    25.72

2/25/19   40.17   220.67    25.23

 

ballardw
Super User

If you want to correct your format data set then start with this code to find which format and which values are involved.

data problem;
   set localsbx.bucket_format;
   where start> end;
run;

I am going to guess that you may have values that are small enough you are running into precision and data storage issues.

 

You may want to go into the step that creates the cntlin data set and do a check for the start>end condition and modify start.

 

You don't say what you are measuring but are 10 decimal points of significance actually needed?? Perhaps rounding to 0.00001 would be sufficient.

Phueltank
Fluorite | Level 6

I ran the suggested code to find the error, but it returned an empty dataset. I'll try modifying precision next to see if that's the issue. thanks!

Phueltank
Fluorite | Level 6
on the suggestion of a coworker, I removed the "$" from the format name and everything worked as intended.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2098 views
  • 2 likes
  • 3 in conversation