BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8

Hi

 

I have a dataset with millions of observations for several IDs. I have given a small example below to narate the issue.

 

Say, I have following dataset where each ID has observations (sorted) ranging from negative to positive values. I need to allocate ranks (smallest to largest) to these observations "By ID", the ranks must be ranging from -n to +n as shown below.

 

data have; 
  input id year ret;
  datalines;
45  2005  -0.30
45  2006   0.28
45  2007  -0.08
48  2011  -0.01
48  2012   0.78
48  2013   0.54
49  2007   0.15
49  2008  -0.06
49  2009   0.3
49  2010  -0.06
;
run;
proc sort data=have;
by id ret;
run;
data want; 
  input id year ret counter;
  datalines;
45  2005  -0.30    -2
45  2007  -0.08    -1
45  2006   0.28     1
48  2011  -0.01    -1
48  2013   0.54     1
48  2012   0.78     2
49  2008  -0.06    -2
49  2010  -0.06    -1
49  2007   0.15     1
49 2009 0.3 2
; run;

Please help me fin finding codes for:

1- Finding number of negative and positive values in each ID.

2- Allocating rank counter from smallest to largest.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

While I would normally program this just the way @mkeintz suggested, it sounds like that's difficult for you.  So I'll present an easier approach:

 

proc sort data=have out=negatives;

by id descending ret;

where ret < 0;

run;

 

proc sort data=have out=positives;

by id ret;

where ret > 0;

run;

 

data negatives;

set negatives;

by id;

if first.id then counter=-1;

else counter + (-1);

run;

 

data positives;

set positives;

by id;

if first.id then counter=1;

else counter + 1;

run;

 

data want;

set negatives positives;

by id;

run;

 

The order won't be quite right, but you can fix that:

 

proc sort data=want;

by id counter;

run;

View solution in original post

7 REPLIES 7
mkeintz
PROC Star

This looks like a homework assignment.

 

Why don't you provide a sample dataset and code that you have tried to use, so we can help you figure things out.

 

But as an operational problem, it looks like you could

  1. sort the data smallest to largest value of some variable for each id
  2. For each id
    1. count the number of negatives (N_Neg).
    2. Using that count, assign the "negative ranks" sequentially from -N_neg to -1
    3. For positives assign ranks sequentially starting at 1

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Saba1
Quartz | Level 8

@mkeintz can you provide the code please?

Astounding
PROC Star

While I would normally program this just the way @mkeintz suggested, it sounds like that's difficult for you.  So I'll present an easier approach:

 

proc sort data=have out=negatives;

by id descending ret;

where ret < 0;

run;

 

proc sort data=have out=positives;

by id ret;

where ret > 0;

run;

 

data negatives;

set negatives;

by id;

if first.id then counter=-1;

else counter + (-1);

run;

 

data positives;

set positives;

by id;

if first.id then counter=1;

else counter + 1;

run;

 

data want;

set negatives positives;

by id;

run;

 

The order won't be quite right, but you can fix that:

 

proc sort data=want;

by id counter;

run;

Saba1
Quartz | Level 8

@Astounding Thank you so much for your detailed reply. This is quite helpful.

ShiroAmada
Lapis Lazuli | Level 10

Try this.....

 

data NEG WANT;
  set HAVE;
if ret<0 then output NEG;
  else output WANT;
run;

proc sort data=neg;
  by ID descending RET;
run;

data NEG;
  set NEG;
by id;
if first.id then ctr=.;
ctr+1;
run;

proc sql;
  update NEG
  SET CTR=CTR*-1;
quit;


proc sort data=want;
  by id ret;
run;

data want;
  set want;
by id;
if first.id then ctr=.;
ctr+1;
run;

proc append base=want data=neg force;
 run;

proc sort data=want;
  by id ctr;
run;

Hope this helps.

s_lassen
Meteorite | Level 14

@Saba1

As you state that your data has millions of observations, it may be more efficient if you can do the calculation without sorting (assuming that you initial data is sorted by ID and YEAR):

data want;
  array values(100) 8 _temporary_;
  npos=0;
  nneg=0;
  n0=0;
  do _N_=1 by 1 until(last.id);
    set have(keep=id ret);
    by id;
    values(_N_)=ret;
    if ret<0 then
      nneg+1; /* number of negative values */
    else if ret>0 then
      npos+1; /* number of positive values */
    else
      n0+1; /* number of zeroes */
    end;
  call sortn(of values(*));
  do until(last.id);
    set have;
    by id;
    _N_=whichn(ret,of values(*));
    if ret=0 then
      rank=0;
    else if ret<0 then
      rank=_N_-100+npos+n0-1;
    else
      rank=_N_-100+npos;
    values(_N_)=.; /* set missing, so that next record with same value gets next rank */
    output;
    end;
run;

If two years have the same ID and RET value, the program assigns the lowest rank to the earliest record. The array dimension (100) should not be smaller than the number of records for a single ID, as you appeared to have one record per ID and YEAR, I assumed that 100 was a safe value. I added the possibility of zeros, always giving a rank of 0. You may want to take missing values into account also (I would suggest giving them a missing rank, currently they are getting NPOS+N0-100).

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1094 views
  • 0 likes
  • 5 in conversation