Solved: Re: Allocating numbers from "-n" to "+n" while ranking a variable by g...

Saba1 · Posted 09-20-2017 08:26 PM

Hi

I have a dataset with millions of observations for several IDs. I have given a small example below to narate the issue.

Say, I have following dataset where each ID has observations (sorted) ranging from negative to positive values. I need to allocate ranks (smallest to largest) to these observations "By ID", the ranks must be ranging from -n to +n as shown below.

data have; 
  input id year ret;
  datalines;
45  2005  -0.30
45  2006   0.28
45  2007  -0.08
48  2011  -0.01
48  2012   0.78
48  2013   0.54
49  2007   0.15
49  2008  -0.06
49  2009   0.3
49  2010  -0.06
;
run;
proc sort data=have;
by id ret;
run;

data want; 
  input id year ret counter;
  datalines;
45  2005  -0.30    -2
45  2007  -0.08    -1
45  2006   0.28     1
48  2011  -0.01    -1
48  2013   0.54     1
48  2012   0.78     2
49  2008  -0.06    -2
49  2010  -0.06    -1
49  2007   0.15     1
49  2009   0.3      2
;
run;

Please help me fin finding codes for:

1- Finding number of negative and positive values in each ID.

2- Allocating rank counter from smallest to largest.

Thanks.

Astounding · Posted 09-20-2017 09:17 PM

While I would normally program this just the way @mkeintz suggested, it sounds like that's difficult for you. So I'll present an easier approach:

proc sort data=have out=negatives;

by id descending ret;

where ret < 0;

run;

proc sort data=have out=positives;

by id ret;

where ret > 0;

run;

data negatives;

set negatives;

by id;

if first.id then counter=-1;

else counter + (-1);

run;

data positives;

set positives;

by id;

if first.id then counter=1;

else counter + 1;

run;

data want;

set negatives positives;

by id;

run;

The order won't be quite right, but you can fix that:

proc sort data=want;

by id counter;

run;

View solution in original post

mkeintz · Posted 09-20-2017 08:35 PM

This looks like a homework assignment.

Why don't you provide a sample dataset and code that you have tried to use, so we can help you figure things out.

But as an operational problem, it looks like you could

sort the data smallest to largest value of some variable for each id
For each id

count the number of negatives (N_Neg).
Using that count, assign the "negative ranks" sequentially from -N_neg to -1
For positives assign ranks sequentially starting at 1

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Saba1 · Posted 09-20-2017 08:37 PM

@mkeintz can you provide the code please?

Astounding · Posted 09-20-2017 09:17 PM

While I would normally program this just the way @mkeintz suggested, it sounds like that's difficult for you. So I'll present an easier approach:

proc sort data=have out=negatives;

by id descending ret;

where ret < 0;

run;

proc sort data=have out=positives;

by id ret;

where ret > 0;

run;

data negatives;

set negatives;

by id;

if first.id then counter=-1;

else counter + (-1);

run;

data positives;

set positives;

by id;

if first.id then counter=1;

else counter + 1;

run;

data want;

set negatives positives;

by id;

run;

The order won't be quite right, but you can fix that:

proc sort data=want;

by id counter;

run;

Saba1 · Posted 09-20-2017 09:38 PM

@Astounding Thank you so much for your detailed reply. This is quite helpful.

ShiroAmada · Posted 09-20-2017 09:45 PM

Try this.....

data NEG WANT;
  set HAVE;
if ret<0 then output NEG;
  else output WANT;
run;

proc sort data=neg;
  by ID descending RET;
run;

data NEG;
  set NEG;
by id;
if first.id then ctr=.;
ctr+1;
run;

proc sql;
  update NEG
  SET CTR=CTR*-1;
quit;


proc sort data=want;
  by id ret;
run;

data want;
  set want;
by id;
if first.id then ctr=.;
ctr+1;
run;

proc append base=want data=neg force;
 run;

proc sort data=want;
  by id ctr;
run;

Hope this helps.

Saba1 · Posted 09-20-2017 11:19 PM

@ShiroAmada Thanks a lot. Really useful.

s_lassen · Posted 10-04-2017 06:58 AM

@Saba1

As you state that your data has millions of observations, it may be more efficient if you can do the calculation without sorting (assuming that you initial data is sorted by ID and YEAR):

data want;
  array values(100) 8 _temporary_;
  npos=0;
  nneg=0;
  n0=0;
  do _N_=1 by 1 until(last.id);
    set have(keep=id ret);
    by id;
    values(_N_)=ret;
    if ret<0 then
      nneg+1; /* number of negative values */
    else if ret>0 then
      npos+1; /* number of positive values */
    else
      n0+1; /* number of zeroes */
    end;
  call sortn(of values(*));
  do until(last.id);
    set have;
    by id;
    _N_=whichn(ret,of values(*));
    if ret=0 then
      rank=0;
    else if ret<0 then
      rank=_N_-100+npos+n0-1;
    else
      rank=_N_-100+npos;
    values(_N_)=.; /* set missing, so that next record with same value gets next rank */
    output;
    end;
run;

If two years have the same ID and RET value, the program assigns the lowest rank to the earliest record. The array dimension (100) should not be smaller than the number of records for a single ID, as you appeared to have one record per ID and YEAR, I assumed that 100 was a safe value. I added the possibility of zeros, always giving a rank of 0. You may want to take missing values into account also (I would suggest giving them a missing rank, currently they are getting NPOS+N0-100).

Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Re: Allocating numbers from "-n" to "+n" while ranking a variable by group

Register Today!

SAS Training: Just a Click Away