BookmarkSubscribeRSS Feed
Shradha1
Obsidian | Level 7
I have a list of transaction amounts. From there I want to select only those observations where the transaction amount is in round figure, like 250, 500, 1000, 2000, etc.
If I take only cases where mod(x,10)=0 then it also keeps transactions like 1410, 1280, but I do not want these, I want round figures like 250, 300, 500,etc.
How do I proceed? I am using SAS 9.4.
8 REPLIES 8
mkeintz
PROC Star

Editted note:  Apologies:  I have misunderstood your problem.  You can ignore everything below.  I leave it for the curious.

 

 

 

 

You apparently want all values that are multiple of 250 times 2 to an integer power.  I.e.

  • x=250 =   250* (2**0)
  • x=500 =   250* (2**1)
  • x=1000 = 250* (2**2)
  • etc.

So, if the log2 of (x/250) is a non-negative integer, you would satisfy this requirement:

 

data have;
  do x=1 to 16000; output; end;
run;

data want;
  set have;
  where log2(x/250) = floor(log2(x/250));
  where also log2(x/250)>=0;
  put x=;
run;

Of course, at some point there may be a numeric precision issue, but it works correctly at least up to 250*(2**20).  When I replaced 16000 above by 250*(2**20), I got the log below, taking about 40 seconds to run: 

 

6890   data have;
6891     do x=1 to 250*(2**20); output; end;
6892   run;

NOTE: The data set WORK.HAVE has 262144000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           5.01 seconds
      cpu time            5.00 seconds


6893
6894   data want;
6895     set have;
6896     where log2(x/250) = floor(log2(x/250));
6897     where also log2(x/250)>=0;
NOTE: WHERE clause has been augmented.
6898     put x=;
6899   run;

x=250
x=500
x=1000
x=2000
x=4000
x=8000
x=16000
x=32000
x=64000
x=128000
x=256000
x=512000
x=1024000
x=2048000
x=4096000
x=8192000
x=16384000
x=32768000
x=65536000
x=131072000
x=262144000
NOTE: There were 21 observations read from the data set WORK.HAVE.
      WHERE (LOG2((x/250))=FLOOR(LOG2((x/250)))) and (LOG2((x/250))>=0);
NOTE: The data set WORK.WANT has 21 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           35.17 seconds
      cpu time            35.15 seconds

And why do I know this is correct?  Because there are 21 resulting observations, representing powers of 2 from 0 through 20.

 

 

Here's a faster way: make a list of desired value in an array.  Then you don't have to bother with generating logs.  The data step below took 10 seconds (vs 35 seconds above):

 

data want;
  array testvalues {0:20} _temporary_;
  if _n_=1 then do p=0 to 20;
    testvalues{p}=250*(2**p);
  end;

  set have;
  if  whichn(x,of testvalues{*});
  put x=;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

@mkeintz wrote:

You apparently want all values that are multiple of 250 times 2 to an integer power.  I.e.

  • x=250 =   250* (2**0)
  • x=500 =   250* (2**1)
  • x=1000 = 250* (2**2)
  • etc.

Except that the OP @Shradha1 stated in his/her text (which doesn't agree with the title, and I always find that confusing) that he/she wants to select 300 as well.

 

Which brings me back to my comment. The OP needs to define "round figures".

--
Paige Miller
mkeintz
PROC Star
Yes, just realized that. I put a disclaimer at the top of my note. THX.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PhilC
Rhodochrosite | Level 12

I agree the definition of "round" is not standard.  What I think is being described is a rounding, in terms of scientific notation, where the significand or mantissa is rounded to the nearest 0.5.  If that is so, I find the mantissa, round, and convert back to normal, non-scientific-notation form.

 

 

%macro RandBetween(min, max);
   (&min + floor((1+&max-&min)*rand("uniform")))
%mend;

data want ;
  do i = 1 to 10;
    p=%RandBetween(2,nobs);
    set SASHELP.GEOEXM(Keep=last) point=p nobs=nobs;
    last_power = 10**floor(LOG10(last));
      drop last_power ;
    last_new = round(last,0.5*last_power);
    output;
  end;
  stop;
proc sort;
  by last;
data want;
  set want;
  i=_N_;
run;

 

i	Last	last_new
1	6187	6000
2	9933	10000
3	15660	15000
4	36662	35000
5	45234	45000
6	47104	45000
7	64690	65000
8	72860	75000
9	76560	75000
10	124603	100000

 

PaigeMiller
Diamond | Level 26

What I think is being described is a rounding, in terms of scientific notation, where the significand or mantissa is rounded to the nearest 0.5.

How can you tell that from the contradictory problem explanation?

--
Paige Miller
PhilC
Rhodochrosite | Level 12

A problem explanation than can be contradictory implies a problem that was written in a mathematical logic.  Our friend gave us an example of what they are looking for.  I formed an opinion of what they wanted, and offered an my best educated guess.  I though it was a better answer than Mr. @mkeintz , before he retracted it.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6597 views
  • 1 like
  • 5 in conversation