Hello and thanks for taking the time to help. I'm fairly new to using SAS, so don't have the best technical background. I'm running SAS EG with base 9.4, and am having to write some programs from scratch to get where I need to go. Here's my business scenario:
I'm analyzing performance of ~6,000 individuals across a range of 18 measures. The measures reflect the most recent 13 weeks of performance, and are created weekly.
My data is in the following format
Name Type Length Format Report_Week Date 8 MMDDYY10. Alias Character 4 Role Character 32 Measure_1 Numeric 8 ... Measure_18 Numeric 8
I'm using Proc UNIVARIATE to calculate the 10th and 90th percentile, then splitting the interior range into 8 even buckets for a total of 10 buckets for each measure||Role||Report_Week. I then use Proc SQL to create the necessary formatting fields and Proc SORT to ensure they are in the correct order. The final set of buckets looks like this...(not including all 1440 rows)
fmtname Start End Label SEXCY EEXCY HLO $D2TCA020419A 0.00001 0.9879518072 < 98.80% N Y L $D2TCA020419A 0.9879518072 0.9894578313 98.80% to 98.95% N Y $D2TCA020419A 0.9894578313 0.9909638554 98.95% to 99.10% N Y $D2TCA020419A 0.9909638554 0.9924698795 99.10% to 99.25% N Y $D2TCA020419A 0.9924698795 0.9939759036 99.25% to 99.40% N Y $D2TCA020419A 0.9939759036 0.9954819277 99.40% to 99.55% N Y $D2TCA020419A 0.9954819277 0.9969879518 99.55% to 99.70% N Y $D2TCA020419A 0.9969879518 0.9984939759 99.70% to 99.85% N Y $D2TCA020419A 0.9984939759 0.9999 99.85% to 99.99% N Y $D2TCA020419A 0.9999 1000000 > 99.99% Y Y H $D2TCA021119A 0.00001 0.9879759519 < 98.80% N Y L $D2TCA021119A 0.9879759519 0.9894789579 98.80% to 98.95% N Y $D2TCA021119A 0.9894789579 0.9909819639 98.95% to 99.10% N Y $D2TCA021119A 0.9909819639 0.9924849699 99.10% to 99.25% N Y
I want to apply the labels to the dataset containing the actual results, but I get an error when trying to write in the format buckets:
229 proc format cntlin=localsbx.bucket_format; ERROR: Start is greater than end: -. 230 run;
I can't seem to figure out what I'm doing wrong. I've exported the bucket_format table to excel and tested the Start and End fields, verifying there are no cases where Start is greater than end.
I've attached the full SAS code used to generate the buckets. Any assistance you can provide would be greatly appreciated, and if I'm going about this all wrong, I'm eager to learn how to do this "the right way".
Use PROC RANK with option GROUPS=10, which produces 10 buckets based on percentiles.
Then all of this programming effort is done for you inside of PROC RANK.
If you really need labels (like "99.85% to 99.99%") on the groups, this is computed by finding the min and max of each group created, and then turning this min and max into a format. Even so, it would be easier to produce a label like "90-100 percentile" rather than the actual values of "99.85% to 99.99%".
Thanks Paige, this was my original approach, but the user wants to see how the buckets change over time.
This has nothing to do with using (or not using) PROC RANK. You can store the results from PROC RANK (or PROC UNIVARIATE) over time and compare them.
additionally, they wanted consistent sized buckets from a measure perspective, as opposed to a associate perspective.
I don't know exactly what this means. Can you give an example of the "consistent sized buckets" you want?
in a percentile ranking, there are an equal number of people in each of the 10 buckets and the bucket ends change in order to obtain this.
the buckets I'm creating need to each have the same range. there's not much of a difference when looking at the percentage of tasks that were completed (this is the data I used for illustration), but it makes a difference when looking at the number of tasks completed.
week P10 P90 bucket range
2/4/19 40.25 244.50 25.53
2/11/19 42.75 251.20 26.06
2/18/19 41.67 247.50 25.72
2/25/19 40.17 220.67 25.23
If you want to correct your format data set then start with this code to find which format and which values are involved.
data problem; set localsbx.bucket_format; where start> end; run;
I am going to guess that you may have values that are small enough you are running into precision and data storage issues.
You may want to go into the step that creates the cntlin data set and do a check for the start>end condition and modify start.
You don't say what you are measuring but are 10 decimal points of significance actually needed?? Perhaps rounding to 0.00001 would be sufficient.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.