Hi there, so I'm trying to use this little doo-wop to flag variables that have values within certain range(s) (>= |.35|):
data flags (drop= i); set data;
array factor {10} factor1-factor10;
array place {10} place1-place10;
do i = 1 to 10;
place{i} = 0;
if factor{i} < .35 and factor{i} > -.35 then place{i} = 1; /* >= |.35| threshold is somewhat arbitrary,
may be changed */
end;
noload_ct = sum(of place1-place10);
noload = 0;
if noload_ct = 10 then noload = 1;
run;
Now, I've done this before with other data, and it appeared to work as expected. But in this particular instance, upon inspection of the values I'm trying to filter, it is flagging cases that are ≈ .346, which are obviously are not >=|.35|. Such "close" cases perhaps simply did not exist in prior use of this approach; hence I did not notice.
I suppose I can chalk this up to some sort of rounding that is going on behind the scenes. For my present purposes, I imagine this is fine, but my concern is still that it would be technically inaccurate to say that I used a threshold of >= .35, when the output retains values that are technically, say, > .345 (or whatever it is that's happening behind the scenes). That is, if someone replicated the analysis with the same data, they might observe this inconsistency, and say, awesome_opossum, "you're wrong, and a liar; how dare you.", which is principally silly enough that it is something I wish to avoid.
Does anyone have an explanation for this, that I can at least include in a footnote (e.g. is it actually > .345 -- values rounded to the hundredth?), or any recommendations?
For clarification: I'm making two lists: one is a list of all the items (rows) that are >= |.35| on at least one variable; the other is a list of items (rows) that are < |.35| on all the variables. Items (rows) with a highest value of |.346| on any variable end up on the list that is supposed to be >= |.35|.
@awesome_opossum wrote:
Hi there, so I'm trying to use this little doo-wop to flag variables that have values within certain range(s) (>= |.35|):
data flags (drop= i); set data; array factor {10} factor1-factor10; array place {10} place1-place10; do i = 1 to 10; place{i} = 0; if factor{i} < .35 and factor{i} > -.35 then place{i} = 1; /* >= |.35| threshold is somewhat arbitrary, may be changed */ end; noload_ct = sum(of place1-place10); noload = 0; if noload_ct = 10 then noload = 1; run;
Now, I've done this before with other data, and it appeared to work as expected. But in this particular instance, upon inspection of the values I'm trying to filter, it is flagging cases that are ≈ .346, which are obviously are not >=|.35|. Such "close" cases perhaps simply did not exist in prior use of this approach; hence I did not notice.
I suppose I can chalk this up to some sort of rounding that is going on behind the scenes. For my present purposes, I imagine this is fine, but my concern is still that it would be technically inaccurate to say that I used a threshold of >= .35, when the output retains values that are technically, say, > .345 (or whatever it is that's happening behind the scenes). That is, if someone replicated the analysis with the same data, they might observe this inconsistency, and say, awesome_opossum, "you're wrong, and a liar; how dare you.", which is principally silly enough that it is something I wish to avoid.
Does anyone have an explanation for this, that I can at least include in a footnote (e.g. is it actually > .345 -- values rounded to the hundredth?), or any recommendations?
Check what you think you have written. 0.346 is indeed < 0.35 and 0.346 > -0.35 as shown in your code.
Or provide some example data in the form of a data step and what you expect for results.
Is there some reason that you did not use the ABS function?
Abs(factor[i]) ge 0.35
is the equivalent to test ">= |.35|".
Or if you want the negation
abs(factor[i]) lt 0.35
Your condition is wrong.
Should be:
factor{i} > .35 and factor{i} < -.35
Bart
I narrated it as the opposite; the inaccurate flagging remains the issue.
Just to double confirm, your condition is:
flag elements which are either greater then 0.35 or less than -0.35
right?
Bart
I'm making two lists: one is a list of all the items (rows) that are >= .35 on at least one variable; the other is a list of items (rows) that are < .35 on all the variables. Items (rows) with a highest value of .346 on any variable end up on the list that is supposed to be >= .35.
Something else must be going on.
Example:
data test;
do factor=.,0,.346,.36,1 ;
*factor = round(factor,0.01);
lt0_35 = factor < 0.35;
ge0_35 = factor >= 0.35;
output;
end;
run;
proc print;
run;
Obs factor lt0_35 ge0_35 1 . 1 0 2 0.000 1 0 3 0.346 1 0 4 0.360 0 1 5 1.000 0 1
data data;
array factor {10} factor1-factor10 (.344 .345 .346 .347 .348 .349 .350 .351 .352 .353);
run;
data flags (drop= i);
set data;
array factor {10} factor1-factor10;
array ge {10} (10*0); /* >= .35 */
array ls {10} (10*0); /* < .35 */
do i = 1 to 10;
ge{i} = (factor{i} >= .35 );
ls{i} = (factor{i} < .35 );
end;
put (_ALL_) (=/);
run;
Log:
255 256 data data; 257 array factor {10} factor1-factor10 (.344 .345 .346 .347 .348 .349 .350 .351 .352 257! .353); 258 run; NOTE: The data set WORK.DATA has 1 observations and 10 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 259 260 261 data flags (drop= i); 262 set data; 263 array factor {10} factor1-factor10; 264 array ge {10} (10*0); /* >= .35 */ 265 array ls {10} (10*0); /* < .35 */ 266 267 268 do i = 1 to 10; 269 ge{i} = (factor{i} >= .35 ); 270 ls{i} = (factor{i} < .35 ); 271 end; 272 put (_ALL_) (=/); 273 run; factor1=0.344 factor2=0.345 factor3=0.346 factor4=0.347 factor5=0.348 factor6=0.349 factor7=0.35 factor8=0.351 factor9=0.352 factor10=0.353 ge1=0 ge2=0 ge3=0 ge4=0 ge5=0 ge6=0 ge7=1 ge8=1 ge9=1 ge10=1 ls1=1 ls2=1 ls3=1 ls4=1 ls5=1 ls6=1 ls7=0 ls8=0 ls9=0 ls10=0 i=11 NOTE: There were 1 observations read from the data set WORK.DATA. NOTE: The data set WORK.FLAGS has 1 observations and 30 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
@awesome_opossum wrote:
Hi there, so I'm trying to use this little doo-wop to flag variables that have values within certain range(s) (>= |.35|):
data flags (drop= i); set data; array factor {10} factor1-factor10; array place {10} place1-place10; do i = 1 to 10; place{i} = 0; if factor{i} < .35 and factor{i} > -.35 then place{i} = 1; /* >= |.35| threshold is somewhat arbitrary, may be changed */ end; noload_ct = sum(of place1-place10); noload = 0; if noload_ct = 10 then noload = 1; run;
Now, I've done this before with other data, and it appeared to work as expected. But in this particular instance, upon inspection of the values I'm trying to filter, it is flagging cases that are ≈ .346, which are obviously are not >=|.35|. Such "close" cases perhaps simply did not exist in prior use of this approach; hence I did not notice.
I suppose I can chalk this up to some sort of rounding that is going on behind the scenes. For my present purposes, I imagine this is fine, but my concern is still that it would be technically inaccurate to say that I used a threshold of >= .35, when the output retains values that are technically, say, > .345 (or whatever it is that's happening behind the scenes). That is, if someone replicated the analysis with the same data, they might observe this inconsistency, and say, awesome_opossum, "you're wrong, and a liar; how dare you.", which is principally silly enough that it is something I wish to avoid.
Does anyone have an explanation for this, that I can at least include in a footnote (e.g. is it actually > .345 -- values rounded to the hundredth?), or any recommendations?
Check what you think you have written. 0.346 is indeed < 0.35 and 0.346 > -0.35 as shown in your code.
Or provide some example data in the form of a data step and what you expect for results.
Is there some reason that you did not use the ABS function?
Abs(factor[i]) ge 0.35
is the equivalent to test ">= |.35|".
Or if you want the negation
abs(factor[i]) lt 0.35
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.