BookmarkSubscribeRSS Feed
AustinHarris
Calcite | Level 5

As you can see in the code below, I am creating score bands for data. I am then creating separate tables to dissect the data by their relative score band. These tables are then combined and used to create a table of percent totals. The issue I am having is that the separate score bands are not cumulative and thus not providing cumulative percent totals. 

 

For example, the '0.95' score band is only pulling in the rows of data that have a score of 0.95 it is not pulling in the 0.98 nor 0.99 scoreband data. I am not sure where the error lies in the code or what could be done differently to arrive at the result I am wanting. Please help.

 

data b_&i.;
set a1_&i.;
if score >=0.99 then scoreband = '0.99 ';
else if score >=0.98 then scoreband = '0.98 ';
else if score >=0.95 then scoreband = '0.95 ';
else scoreband = '0.94-';
run;

proc sql;
create table s_&i. as select
Scoring_Sample_Date, description, count(distinct(field)) as Greater_Than_or_Equal_To_99
from b_&i.
where scoreband = '0.99'
group by Scoring_Sample_Date, Description;
Run;

proc sql;
create table s1_&i. as select
Scoring_Sample_Date, description, count(distinct(field)) as Greater_Than_or_Equal_To_98
from b_&i.
where scoreband = '0.98'
group by Scoring_Sample_Date, Description;
run;

proc sql;
create table s2_&i. as select
Scoring_Sample_Date, description, count(distinct(field)) as Greater_Than_or_Equal_To_95
from b_&i.
where scoreband = '0.95'
group by Scoring_Sample_Date, Description;

proc sql;
create table s3_&i. as select
Scoring_Sample_Date, description, count(distinct(field)) as total
from b_&i.
group by Scoring_Sample_Date, Description;


data c_&i.;
merge s_&i. s1_&i. s2_&i. s3_&i.;
by Scoring_Sample_Date description;
run;

 

data d_&i.;
set c_&i.;
Perc_99 = Greater_Than_or_Equal_To_99/total; format Perc_99 percent7.1;
Perc_98 = Greater_Than_or_Equal_To_98/total;format Perc_98 percent7.1;
Perc_95 = Greater_Than_or_Equal_To_95/total;format Perc_95 percent7.1;
run;

proc sql;
create table f_&i. as select
Scoring_Sample_Date, description, perc_99, perc_98, perc_95
from d_&i.;

 

 

Thank you, 

 

Austin Harris 

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Creating score bands, and then using SQL on the different score bands, seems very inefficient in terms of processing, and an awful lot of typing as well.

 

A more efficient way to do this, in my opinion, is to have one data set, use custom formats in place of a scoreband variable, and then PROC FREQ. I give an example here, but since we don't have your data, I can't be sure this will work — so please consider this as nothing more than an example right now.

 

proc format;
    value scoref 0.99-1='>=0.99' 0.98-<0.99='0.98-0.99'
           0.95-<0.98='0.95-0.98' low-<0.95='< 0.95';
run;

proc freq data=whatever;
    tables score;
    format score scoref.;
run;

 

--
Paige Miller
AustinHarris
Calcite | Level 5

I appreciate your help. I am a complete beginner to SAS with nearly no experience. This is my first piece of code, and I am learning as a go. I am certain I am doing things in the most inefficient manner possible... But I am doing them in ways that i know how and are somewhat working. My goal is of course to learn and take in information from individuals like yourself to become more efficient and knowledgeable in the SAS coding language/world. 

 

I am going to attempt to implement this example and will let you know any results. 

 

Thank you, 

PaigeMiller
Diamond | Level 26

@AustinHarris wrote:

I appreciate your help. I am a complete beginner to SAS with nearly no experience. This is my first piece of code, and I am learning as a go. I am certain I am doing things in the most inefficient manner possible... But I am doing them in ways that i know how and are somewhat working. My goal is of course to learn and take in information from individuals like yourself to become more efficient and knowledgeable in the SAS coding language/world. 

 

I am going to attempt to implement this example and will let you know any results. 

 

Thank you, 


Ok, some advice. SAS is an extremely powerful and comprehensive program for analyzing data. Common analyses, such as means, standard deviations, percents, cumulative percents, and a gazillion others, plus many analyses that are not so "common", are already programmed for you. When you start a project, you can ask ... is this already in SAS, or do I have to program it myself? Most of the time, the common (and many not so "common") analyses are already programmed and so you don't have to write your own code to do the calculations. Now if you have existing data, and you need to do something specific and not so common, such as convert a variable to some other scale, such as 

 

new_x=100*x+43;

 

 

then you have to program that yourself.

 

So always first ask if the analysis has already been programmed in SAS rather than just jumping in and programming it yourself. In the long run, using the power of SAS is a much better approach.

--
Paige Miller
ballardw
Super User

@AustinHarris wrote:

I appreciate your help. I am a complete beginner to SAS with nearly no experience. This is my first piece of code, and I am learning as a go. I am certain I am doing things in the most inefficient manner possible... But I am doing them in ways that i know how and are somewhat working. My goal is of course to learn and take in information from individuals like yourself to become more efficient and knowledgeable in the SAS coding language/world. 

 

I am going to attempt to implement this example and will let you know any results. 

 

Thank you, 


With your comment about "complete beginner" and "first piece of code" then my recommendation is NOT to attempt any macro language (the bits you have with &i are an example). A very minor macro logic issue can be very hard to trace down for a beginner because there can be a lot of timing and dependency issues that are not obvious to new users (and many fairly experienced).

 

I don't know if you have looked at your SAS online help at all but you may find bit titled "Learning to Use SAS" in the contents (depending on SAS environment).

ballardw
Super User

Some example data and expected results would be very helpful.

 

The "error" is that your Scoreband variable is working the way you coded it, values of .98 are assigned to the 0.98 band, not the 0.95.

 

There are some ways to have values evaluated in more than one range of values. One way involves multilabel formats. But only a few procedures can use the multilabel and so we need to see what you expect to calculate for given data.

 

Below is an example creating a custom multilabel format, a data set with a variable that has values that appear similar to what you may be using and a Proc Tabulate report using that multilabel format.

proc format;
value scoreband (multilabel)
0.99 - high = '0.99'
0.98 - high = '0.98'
0.95 - high = '0.95'
other       = '0.94-'
;

data example;
   do i=1 to 100;
      x= 0.9 + rand('integer',1,9)*0.01;
      output;
   end;
run;

proc tabulate data=example;
   class x/mlf;
   format x scoreband.;
   table x,
         n pctn
   ;
run;

If you add up the N values in the report it should total more than 100. The data set creates 100 random values so I won't say what your exact total should be but I suspect around 120 is likely. The Pctn , percentage of N, is one percentage possible give that data. Picking a set size of 100 in my example means that the percent will be the same as the N.

 

Proc Tabulate, Report, Summary and Means are the only procedures that currently support the multilable formats. So if you have a more complex definition of percentage you may need to use Summary/Means to get the group counts and apply additional code for the percentage or rate calculation.

 

If my example looks promising you could apply to your data with:

proc tabulate data=a1_&i;
   class score/mlf;
   format score scoreband.;
   table score,
         n pctn
   ;
run;

But not having your data, your expectation of result or even the value of the macro variable &i there isn't a way to test your data.

 

One thing about Tabulate is that if a Class value is missing in the data then by default the record would be dropped from the report. If you have missing scores that need to be considered as "0.94-" for this then you would add the option MISSING to the Class statement with score.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 705 views
  • 2 likes
  • 3 in conversation