Solved: Re: Array to flag values in certain range(s) only flags within roundin...

awesome_opossum · Posted 09-26-2022 11:29 AM

Hi there, so I'm trying to use this little doo-wop to flag variables that have values within certain range(s) (>= |.35|):

data flags (drop= i); set data; 
array factor {10} factor1-factor10; 
array place {10} place1-place10; 
do i = 1 to 10; 
	place{i} = 0; 
	if factor{i} < .35 and factor{i} > -.35 then place{i} = 1; /* >= |.35| threshold is somewhat arbitrary, 
																	may be changed */ 
end; 
noload_ct = sum(of place1-place10); 
noload = 0; 
if noload_ct = 10 then noload = 1; 
run;

Now, I've done this before with other data, and it appeared to work as expected. But in this particular instance, upon inspection of the values I'm trying to filter, it is flagging cases that are ≈ .346, which are obviously are not >=|.35|. Such "close" cases perhaps simply did not exist in prior use of this approach; hence I did not notice.

I suppose I can chalk this up to some sort of rounding that is going on behind the scenes. For my present purposes, I imagine this is fine, but my concern is still that it would be technically inaccurate to say that I used a threshold of >= .35, when the output retains values that are technically, say, > .345 (or whatever it is that's happening behind the scenes). That is, if someone replicated the analysis with the same data, they might observe this inconsistency, and say, awesome_opossum, "you're wrong, and a liar; how dare you.", which is principally silly enough that it is something I wish to avoid.

Does anyone have an explanation for this, that I can at least include in a footnote (e.g. is it actually > .345 -- values rounded to the hundredth?), or any recommendations?

For clarification: I'm making two lists: one is a list of all the items (rows) that are >= |.35| on at least one variable; the other is a list of items (rows) that are < |.35| on all the variables. Items (rows) with a highest value of |.346| on any variable end up on the list that is supposed to be >= |.35|.

ballardw · Posted 09-26-2022 11:56 AM

@awesome_opossum wrote:

Hi there, so I'm trying to use this little doo-wop to flag variables that have values within certain range(s) (>= |.35|):
data flags (drop= i); set data; 
array factor {10} factor1-factor10; 
array place {10} place1-place10; 
do i = 1 to 10; 
	place{i} = 0; 
	if factor{i} < .35 and factor{i} > -.35 then place{i} = 1; /* >= |.35| threshold is somewhat arbitrary, 
																	may be changed */ 
end; 
noload_ct = sum(of place1-place10); 
noload = 0; 
if noload_ct = 10 then noload = 1; 
run; 
Now, I've done this before with other data, and it appeared to work as expected. But in this particular instance, upon inspection of the values I'm trying to filter, it is flagging cases that are ≈ .346, which are obviously are not >=|.35|. Such "close" cases perhaps simply did not exist in prior use of this approach; hence I did not notice.

I suppose I can chalk this up to some sort of rounding that is going on behind the scenes. For my present purposes, I imagine this is fine, but my concern is still that it would be technically inaccurate to say that I used a threshold of >= .35, when the output retains values that are technically, say, > .345 (or whatever it is that's happening behind the scenes). That is, if someone replicated the analysis with the same data, they might observe this inconsistency, and say, awesome_opossum, "you're wrong, and a liar; how dare you.", which is principally silly enough that it is something I wish to avoid.

Does anyone have an explanation for this, that I can at least include in a footnote (e.g. is it actually > .345 -- values rounded to the hundredth?), or any recommendations?

Check what you think you have written. 0.346 is indeed < 0.35 and 0.346 > -0.35 as shown in your code.

Or provide some example data in the form of a data step and what you expect for results.

Is there some reason that you did not use the ABS function?

Abs(factor[i]) ge 0.35

is the equivalent to test ">= |.35|".

Or if you want the negation

abs(factor[i]) lt 0.35

View solution in original post

yabwon · Posted 09-26-2022 11:36 AM

Your condition is wrong.

Should be:

 factor{i} > .35 and factor{i} < -.35

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

awesome_opossum · Posted 09-26-2022 11:37 AM

I narrated it as the opposite; the inaccurate flagging remains the issue.

yabwon · Posted 09-26-2022 11:47 AM

Just to double confirm, your condition is:

flag elements which are either greater then 0.35 or less than -0.35

right?

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

awesome_opossum · Posted 09-26-2022 11:49 AM

I'm making two lists: one is a list of all the items (rows) that are >= .35 on at least one variable; the other is a list of items (rows) that are < .35 on all the variables. Items (rows) with a highest value of .346 on any variable end up on the list that is supposed to be >= .35.

Tom · Posted 09-26-2022 11:57 AM

Something else must be going on.

Example:

data test;
  do factor=.,0,.346,.36,1 ;
    *factor = round(factor,0.01);
    lt0_35 = factor < 0.35;
    ge0_35 = factor >= 0.35;
    output;
  end;
run;
proc print;
run;

Obs    factor    lt0_35    ge0_35

 1       .          1         0
 2      0.000       1         0
 3      0.346       1         0
 4      0.360       0         1
 5      1.000       0         1

yabwon · Posted 09-26-2022 11:59 AM

data data; 
  array factor {10} factor1-factor10 (.344 .345 .346 .347 .348 .349 .350 .351 .352 .353); 
run;


data flags (drop= i); 
  set data; 
  array factor {10} factor1-factor10; 
  array ge {10} (10*0); /* >= .35 */ 
  array ls {10} (10*0); /*  < .35 */


    do i = 1 to 10; 
    	ge{i} = (factor{i}  >= .35 );
      ls{i} = (factor{i}   < .35 );
    end;
  put (_ALL_) (=/);
run;

Log:

255
256  data data;
257    array factor {10} factor1-factor10 (.344 .345 .346 .347 .348 .349 .350 .351 .352
257! .353);
258  run;

NOTE: The data set WORK.DATA has 1 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


259
260
261  data flags (drop= i);
262    set data;
263    array factor {10} factor1-factor10;
264    array ge {10} (10*0); /* >= .35 */
265    array ls {10} (10*0); /*  < .35 */
266
267
268      do i = 1 to 10;
269        ge{i} = (factor{i}  >= .35 );
270        ls{i} = (factor{i}   < .35 );
271      end;
272    put (_ALL_) (=/);
273  run;


factor1=0.344
factor2=0.345
factor3=0.346
factor4=0.347
factor5=0.348
factor6=0.349
factor7=0.35
factor8=0.351
factor9=0.352
factor10=0.353
ge1=0
ge2=0
ge3=0
ge4=0
ge5=0
ge6=0
ge7=1
ge8=1
ge9=1
ge10=1
ls1=1
ls2=1
ls3=1
ls4=1
ls5=1
ls6=1
ls7=0
ls8=0
ls9=0
ls10=0
i=11
NOTE: There were 1 observations read from the data set WORK.DATA.
NOTE: The data set WORK.FLAGS has 1 observations and 30 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

ballardw · Posted 09-26-2022 11:56 AM

@awesome_opossum wrote:

Hi there, so I'm trying to use this little doo-wop to flag variables that have values within certain range(s) (>= |.35|):
data flags (drop= i); set data; 
array factor {10} factor1-factor10; 
array place {10} place1-place10; 
do i = 1 to 10; 
	place{i} = 0; 
	if factor{i} < .35 and factor{i} > -.35 then place{i} = 1; /* >= |.35| threshold is somewhat arbitrary, 
																	may be changed */ 
end; 
noload_ct = sum(of place1-place10); 
noload = 0; 
if noload_ct = 10 then noload = 1; 
run; 
Now, I've done this before with other data, and it appeared to work as expected. But in this particular instance, upon inspection of the values I'm trying to filter, it is flagging cases that are ≈ .346, which are obviously are not >=|.35|. Such "close" cases perhaps simply did not exist in prior use of this approach; hence I did not notice.

I suppose I can chalk this up to some sort of rounding that is going on behind the scenes. For my present purposes, I imagine this is fine, but my concern is still that it would be technically inaccurate to say that I used a threshold of >= .35, when the output retains values that are technically, say, > .345 (or whatever it is that's happening behind the scenes). That is, if someone replicated the analysis with the same data, they might observe this inconsistency, and say, awesome_opossum, "you're wrong, and a liar; how dare you.", which is principally silly enough that it is something I wish to avoid.

Does anyone have an explanation for this, that I can at least include in a footnote (e.g. is it actually > .345 -- values rounded to the hundredth?), or any recommendations?

Check what you think you have written. 0.346 is indeed < 0.35 and 0.346 > -0.35 as shown in your code.

Or provide some example data in the form of a data step and what you expect for results.

Is there some reason that you did not use the ABS function?

Abs(factor[i]) ge 0.35

is the equivalent to test ">= |.35|".

Or if you want the negation

abs(factor[i]) lt 0.35

awesome_opossum · Posted 09-26-2022 12:20 PM

The abs() function actually did solve the problem! Still strange my method didn't work; but elegant solution; I appreciate it!

Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Re: Array to flag values in certain range(s) only flags within rounding error?

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!