BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
LucileD
Calcite | Level 5

Hi everyone 🙂

 

I need your help to solve the following issue:

 

I need to classify the _ascvd_S variable into 4 categories (described below). But the following code does not work for some values and I can't figure why:

 

1 - Creation of the dataset:

%let NUM=0;
data _ASCVD_IMP_SDS_&NUM.;
	format SUBJID $8. VISIT $16.;
	input SUBJID $ VISIT $ vis Current_smoker $ _ascvd_S $;
	datalines;
        0805-004 Baseline 1 Yes 0.073
        0805-004 Week_4 2 Yes 0.068
        0101-003 Baseline 1 Yes 0.076
        0101-003 Week_14 3 Yes 0.071
        0003-007 Baseline 1 No 0.023
        0003-007 Week_4 3 No 0.018
        0003-007 End_of_treatment 4 No 0.018
        0003-007 Week_4 3 No 0.018
        0207-006 Baseline 1 No 0.322
        0207-006 Week_4 2 No 0.199
        1503-003 Baseline 1 No 0.214
        1503-003 Week_14 3 No 0.199
        0003-003 Baseline 1 Yes 0.257
        0003-003 Week_28 5 Yes 0.199
	;
run;

2 - derivation of the ascvd_100 variable:

%let NUM=%eval(&NUM.+1);
data _ASCVD_IMP_SDS_&NUM.;
	set _ASCVD_IMP_SDS_%eval(&NUM.-1);
	attrib ascvd_S label="Source ASCVD score (SDS)";
	ascvd_S=_ascvd_S*1;
	attrib ascvd_100 label="ASCVD score (pct) (SDS)";
	ascvd_100=_ascvd_S*100;
	drop _ascvd_S;
run;

 

3 - derivation of the classification variable (ascvd_100_cl):

%let NUM=%eval(&NUM.+1);
data _ASCVD_IMP_SDS_&NUM.;
	set _ASCVD_IMP_SDS_%eval(&NUM.-1);
	* ASCVD class;
	attrib ascvd_S_cl format=$40. label="10-year risk for ASCVD (SDS)";
	if missing(ascvd_S) eq 0 then do;
		if ascvd_S<0.05 then ascvd_S_cl="   Low-risk (<0.05)";
		else if 0.05<=ascvd_S<=0.074 then ascvd_S_cl="  Borderline risk (0.05 to 0.074)";
		else if 0.074<=ascvd_S<=0.199 then ascvd_S_cl=" Intermediate risk (0.075 to 0.199)";
		else if 0.2<=ascvd_S then ascvd_S_cl="High risk (>=0.2)";
	end;
	attrib ascvd_100_cl format=$40. label="10-year risk for ASCVD (SDS)";
	if missing(ascvd_100) eq 0 then do;
		if ascvd_100<5 then ascvd_100_cl="   Low-risk (<5pct)";
		else if 5<=ascvd_100<=7.4 then ascvd_100_cl="  Borderline risk (5 to 7.4pct)";
		else if 7.5<=ascvd_100<=19.9 then ascvd_100_cl=" Intermediate risk (7.5 to 19.9pct)";
		else if 20<=ascvd_100 then ascvd_100_cl="High risk (>=20pct)";
	end;
run;

 

As you can see, after running those 3 steps, the ascvd_100_cl parameter is not filled for ascvd_100 values = 19.9 whereas parameter ascvd_S_cl (based on ascvd_S) is.

Certainely I'm missing something here, but if anyone can help it would be very much appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @LucileD and welcome to the SAS Support Communities!

 

You have encountered a numeric representation issue. Read the log below (created with Windows SAS 9.4M5) to see what happened.

136   data _null_;
137   x=0.199;
138   y=100*x;
139   z=19.9;
140   if y > z then put 'Surprised?';
141   put (y z)(=hex16./);
142   run;

Surprised?
y=4033E66666666667
z=4033E66666666666
NOTE: DATA statement used (Total process time):

Whenever non-integer numbers (such as 0.199 or 19.9 in this example) are involved in calculations like addition, multiplication, etc. there is a high (!) risk that rounding errors lead to tiny, yet potentially momentous deviations from the exact results obtained by hand calculation. These rounding errors occur during the computer's calculations in the binary system, which is used internally to represent the decimal numeric values. The main problem is that most of these binary numbers have infinitely many digits and therefore must be rounded.

 

Your numbers, denoted x and z above, are no exceptions:

x=    0.00110010111100011010100111111011111001110110110010001011010000111001010110000001...
z=10011.11100110011001100110011001100110011001100110011001100110011001100110011001100110...

 

To avoid this type of issue, apply the ROUND function with a suitable (small) rounding unit to the result of the calculation:

ascvd_S=input(_ascvd_S, 32.);
ascvd_100=round(100*ascvd_S, 1e-9);

(Note that the character-to-numeric conversion using the INPUT function as shown above is good practice because it avoids the unwanted "NOTE: Character values have been converted to numeric values ..." in the log.)

 

Without appropriate rounding the result of 100*0.199 is slightly larger than 19.9 (see the last hex digit 7 in y above) and therefore falls into the gap between 19.9 and 20 that you left in your IF-THEN/ELSE conditions, as others have pointed out already. I would close those gaps (and also avoid overlaps) by omitting the duplicate borders in the inequality conditions:

     if ascvd_S <  0.05  then ascvd_S_cl=...
else if ascvd_S <= 0.074 then ascvd_S_cl=...
else if ascvd_S <= 0.199 then ascvd_S_cl=...
else                          ascvd_S_cl=...

Or maybe create a user-defined format that assigns the labels ("Low-risk ...", etc.) to the numbers.

 

Edit: Depending on the actual specifications, "< 0.075" and "< 0.2" might be more appropriate than "<= 0.074" and "<= 0.199," respectively.

View solution in original post

6 REPLIES 6
Renoux
Obsidian | Level 7

Hi.

 

In step 3 your classes aren't disjoined. There are gaps and overlaping values. This may be the cause of your problem.


3 - derivation of the classification variable (ascvd_100_cl):
%let NUM=%eval(&NUM.+1);
data _ASCVD_IMP_SDS_&NUM.;
	set _ASCVD_IMP_SDS_%eval(&NUM.-1);
	* ASCVD class;
	attrib ascvd_S_cl format=$40. label="10-year risk for ASCVD (SDS)";
	if missing(ascvd_S) eq 0 then do;
		if ascvd_S<0.05 then ascvd_S_cl="   Low-risk (<0.05)";
		else if 0.05<=ascvd_S<0.075 then ascvd_S_cl="  Borderline risk (0.05 to 0.074)";
		else if 0.075<=ascvd_S<0.2 then ascvd_S_cl=" Intermediate risk (0.075 to 0.199)";
		else if 0.2<=ascvd_S then ascvd_S_cl="High risk (>=0.2)";
	end;
	attrib ascvd_100_cl format=$40. label="10-year risk for ASCVD (SDS)";
	if missing(ascvd_100) eq 0 then do;
		if ascvd_100<5 then ascvd_100_cl="   Low-risk (<5pct)";
		else if 5<=ascvd_100<7.5 then ascvd_100_cl="  Borderline risk (5 to 7.4pct)";
		else if 7.5<=ascvd_100<20 then ascvd_100_cl=" Intermediate risk (7.5 to 19.9pct)";
		else if 20<=ascvd_100 then ascvd_100_cl="High risk (>=20pct)";
	end;
run;

Regards.

 

Axel Renoux

kleelasiva9
Obsidian | Level 7

Apart from overlapping ranges, I too couldn't find any problem with the code. Adding below code in the 3rd step fixes (temporary

%let NUM=%eval(&NUM.+1);
data _ASCVD_IMP_SDS_&NUM.;
	set _ASCVD_IMP_SDS_%eval(&NUM.-1);
	* ASCVD class;
	attrib ascvd_S_cl format=$40. label="10-year risk for ASCVD (SDS)";
	if missing(ascvd_S) eq 0 then do;
		if ascvd_S<0.05 then ascvd_S_cl="   Low-risk (<0.05)";
		else if 0.05<=ascvd_S<=0.074 then ascvd_S_cl="  Borderline risk (0.05 to 0.074)";
		else if 0.075<=ascvd_S<=0.199 then ascvd_S_cl=" Intermediate risk (0.075 to 0.199)";
		else if 0.2<=ascvd_S then ascvd_S_cl="High risk (>=0.2)";
	end;
	attrib ascvd_100_cl format=$40. label="10-year risk for ASCVD (SDS)";
	if missing(ascvd_100) eq 0 then do;
		if ascvd_100<5 then ascvd_100_cl="   Low-risk (<5pct)";
		else if 5<=ascvd_100<=7.4 then ascvd_100_cl="  Borderline risk (5 to 7.4pct)";
		else if 7.5 le ascvd_100 le 19 then ascvd_100_cl ="Intermediate risk (7.5 to 19.9pct)";

		else if 19 le ascvd_100 lt 20 then ascvd_100_cl ="Intermediate risk (7.5 to 19.9pct)";

		else if 20<=ascvd_100 then ascvd_100_cl="High risk (>=20pct)";
		
	end;
run;

).

Renoux
Obsidian | Level 7
Also, shouldn't your step 2 be only on &NUM.=0?

data _ASCVD_IMP_SDS_0;
set _ASCVD_IMP_SDS_0;
FreelanceReinh
Jade | Level 19

Hi @LucileD and welcome to the SAS Support Communities!

 

You have encountered a numeric representation issue. Read the log below (created with Windows SAS 9.4M5) to see what happened.

136   data _null_;
137   x=0.199;
138   y=100*x;
139   z=19.9;
140   if y > z then put 'Surprised?';
141   put (y z)(=hex16./);
142   run;

Surprised?
y=4033E66666666667
z=4033E66666666666
NOTE: DATA statement used (Total process time):

Whenever non-integer numbers (such as 0.199 or 19.9 in this example) are involved in calculations like addition, multiplication, etc. there is a high (!) risk that rounding errors lead to tiny, yet potentially momentous deviations from the exact results obtained by hand calculation. These rounding errors occur during the computer's calculations in the binary system, which is used internally to represent the decimal numeric values. The main problem is that most of these binary numbers have infinitely many digits and therefore must be rounded.

 

Your numbers, denoted x and z above, are no exceptions:

x=    0.00110010111100011010100111111011111001110110110010001011010000111001010110000001...
z=10011.11100110011001100110011001100110011001100110011001100110011001100110011001100110...

 

To avoid this type of issue, apply the ROUND function with a suitable (small) rounding unit to the result of the calculation:

ascvd_S=input(_ascvd_S, 32.);
ascvd_100=round(100*ascvd_S, 1e-9);

(Note that the character-to-numeric conversion using the INPUT function as shown above is good practice because it avoids the unwanted "NOTE: Character values have been converted to numeric values ..." in the log.)

 

Without appropriate rounding the result of 100*0.199 is slightly larger than 19.9 (see the last hex digit 7 in y above) and therefore falls into the gap between 19.9 and 20 that you left in your IF-THEN/ELSE conditions, as others have pointed out already. I would close those gaps (and also avoid overlaps) by omitting the duplicate borders in the inequality conditions:

     if ascvd_S <  0.05  then ascvd_S_cl=...
else if ascvd_S <= 0.074 then ascvd_S_cl=...
else if ascvd_S <= 0.199 then ascvd_S_cl=...
else                          ascvd_S_cl=...

Or maybe create a user-defined format that assigns the labels ("Low-risk ...", etc.) to the numbers.

 

Edit: Depending on the actual specifications, "< 0.075" and "< 0.2" might be more appropriate than "<= 0.074" and "<= 0.199," respectively.

ballardw
Super User

When an assignment is using only ONE variable then a common way in SAS to handle this type of use is a CUSTOM Format. Formats can be more flexible than creating variables. Consider the following code example:

data example;
	format SUBJID $8. VISIT $16.;
	input SUBJID $ VISIT $ vis Current_smoker $ ascvd_S ;
	datalines;
0805-004 Baseline 1 Yes 0.073
0805-004 Week_4 2 Yes 0.068
0101-003 Baseline 1 Yes 0.076
0101-003 Week_14 3 Yes 0.071
0003-007 Baseline 1 No 0.023
0003-007 Week_4 3 No 0.018
0003-007 End_of_treatment 4 No 0.018
0003-007 Week_4 3 No 0.018
0207-006 Baseline 1 No 0.322
0207-006 Week_4 2 No 0.199
1503-003 Baseline 1 No 0.214
1503-003 Week_14 3 No 0.199
0003-003 Baseline 1 Yes 0.257
0003-003 Week_28 5 Yes 0.199
	;
run;

proc format;
value ascvd_s
.            =" "
low  -< 0.05 ="   Low-risk"
0.05 -<0.074 ="  Borderline risk"
0.074-<0.199 =" Intermediate risk"
0.199-High   ="High risk"
;
run;
/*display existing values with the format*/
Proc print data=example noobs;
   format ascvd_S  ascvd_S.;
run;
/* use the formatted values in analysis*/
proc freq data=example;
   tables subjid*ascvd_s;
   format ascvd_S  ascvd_S.;
run;

/* graph using formatted value*/
proc sgplot data=example;
   vbar ascvd_s / group=subjid;
    format ascvd_S  ascvd_S.;
run;

If someone decides that the boundary for "low risk" should actually be at 0.055 instead of 0.05 the ONLY change needed to make the display, frequency table or graph comply would be to change the 0.05 in the Format code to 0.055 in two places. Or could change multiple boundaries. But I would only have to change the format code. Or create a different format and use it as desired to create the groups used in the Proc Freq or Sgplot code. Most formats are acceptable for report, analysis or graphing procedures and the exceptions usually have to do with custom date formats.

 

I have a library of formats for age that assign 5 or 10 year intervals, specific age breakouts like <18, 18-44,45+ or <65 and 65+. I use the needed format with a single Age variable for reports and graphs.

 

 

BTW your example data has a character variable _ascvd_s not the numeric variable ascvd_s as used in your assignment logic. My example data uses the numeric values as range comparisons work poorly with character variables.

 

LucileD
Calcite | Level 5

Dear all,

 

Thanks for all your answers!!!

 

To FreelanceReinhard: thanks for the explanation, it is exactly what I was looking for!

 

To Renoux and kleelasiva9: As you can see, all values are written with only 2 decimals so the classes are ok (and thanks to this I was able to spot the issue with 19.9!! 🙂 )

To Renoux: Yes, I could have put all the steps in one dataset, but I really don't like to program "data toto; set toto;" 🙂

To ballardw: Yes, I could have created a format, but I prefer to have both variables (source + derived (formatted or not)), it helps me check my programming :). Moreover, as there is an issue with 19.9 not beeing included in the "<=19.9" condition, the format you proposed would have classified those values as "High Risk" 🙂

 

Again, thank you all and have a nice day!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1803 views
  • 2 likes
  • 5 in conversation