BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I am trying the case when statement to categorize the numbers to Low, medium and high but when my code runs, it shows all the results as low. Can you please suggest how to correct this?

Here is the sample dataset:

Data Test;
input debt_code Pence_in_pound credit_score;
Datalines;
1001452 3.01% 640
129979530 0.25% 603
291138113 76.00% 598
294432885 3.95% 552
350203659 8.74% 591
;

run;

Proc sql;
create table Willingness_to_pay as
select a.*,
case
when Pence_in_Pound between 0 and 5 then 'Low'
when Pence_in_Pound between 5.001 and 15 then 'Medium'
when Pence_in_Pound > 15.001 then 'High'
end as Pence_pound_metrics
from Book_on as a;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Most likely it's a numeric variable with a percentage format? If that's the case the values are actually 0.00 and 0.05 not 0 and 5.

A value of 0.05 with a percent format will show as 5%. So either scale your numbers by multiplying by 100 or change your ranges.

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

The data set you provided has Pence_in_Pound always missing, so you will have to modify it to give correct data. Thank you.

 

P.S. When you provide data as a SAS data step, please test your code from now on before you post it.

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10
The data in percentage is the Pence_in_pound (3.01%,0.25% etc).
PaigeMiller
Diamond | Level 26

@Sandeep77 wrote:
The data in percentage is the Pence_in_pound (3.01%,0.25% etc).

It's a character string? Or not? It seems as if you want to perform numerical calculations on this. Can you fix the code you provided?

--
Paige Miller
Reeza
Super User
Most likely it's a numeric variable with a percentage format? If that's the case the values are actually 0.00 and 0.05 not 0 and 5.

A value of 0.05 with a percent format will show as 5%. So either scale your numbers by multiplying by 100 or change your ranges.
ballardw
Super User

@Sandeep77 wrote:
The data in percentage is the Pence_in_pound (3.01%,0.25% etc).

The syntax for the data step is incorrect and does not read that variable correctly as shown:

265  Data Test;
266  input debt_code Pence_in_pound credit_score;
267  Datalines;

NOTE: Invalid data for Pence_in_pound in line 268 9-13.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
268        1001452 3.01% 640
debt_code=1001452 Pence_in_pound=. credit_score=640 _ERROR_=1 _N_=1
NOTE: Invalid data for Pence_in_pound in line 269 11-15.
269        129979530 0.25% 603
debt_code=129979530 Pence_in_pound=. credit_score=603 _ERROR_=1 _N_=2
NOTE: Invalid data for Pence_in_pound in line 270 11-16.
270        291138113 76.00% 598
debt_code=291138113 Pence_in_pound=. credit_score=598 _ERROR_=1 _N_=3
NOTE: Invalid data for Pence_in_pound in line 271 11-15.
271        294432885 3.95% 552
debt_code=294432885 Pence_in_pound=. credit_score=552 _ERROR_=1 _N_=4
NOTE: Invalid data for Pence_in_pound in line 272 11-15.
272        350203659 8.74% 591
debt_code=350203659 Pence_in_pound=. credit_score=591 _ERROR_=1 _N_=5
NOTE: The data set WORK.TEST has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


273  ;
274

If values that should be numeric have any special characters other than -, for negative values, or decimal you need to provide a proper informat that will read the value as intended.

Such as

Data Test;
input debt_code Pence_in_pound :percent. credit_score;
Datalines;
1001452 3.01% 640
129979530 0.25% 603
291138113 76.00% 598
294432885 3.95% 552
350203659 8.74% 591
;

run;

At which point a short examination of the example data shows that NONE of your values are actually greater than 1. The largest percentage shown, 76.00% is a numeric value of 0.76. In this code where you are using numeric ranges:

Proc sql;
create table Willingness_to_pay as
select a.*,
case
when Pence_in_Pound between 0 and 5 then 'Low'
when Pence_in_Pound between 5.001 and 15 then 'Medium'
when Pence_in_Pound > 15.001 then 'High'
end as Pence_pound_metrics
from Book_on as a;
quit;

will therefore have ALL values of Pence_pound_metrics as 'Low'.

 

Really no reason to even run that as a Format would do that if the proper ranges are provided in the definition and then just print the data set with the format applied.

 

proc format;
value metric
0 - 0.05     = 'Low'
0.05< - 0.15 = 'Medium'
0.15< -high  = 'High'
;
run;

proc print data=test noobs;
  format pence_in_pound metric.;
run;

Also note that your between code would have missed, if the values were not actual percentages, 5.0004 for example if it occurred in the data.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 725 views
  • 1 like
  • 4 in conversation