Hi
I have a dataset with millions of observations for several IDs. I have given a small example below to narate the issue.
Say, I have following dataset where each ID has observations (sorted) ranging from negative to positive values. I need to allocate ranks (smallest to largest) to these observations "By ID", the ranks must be ranging from -n to +n as shown below.
data have;
input id year ret;
datalines;
45 2005 -0.30
45 2006 0.28
45 2007 -0.08
48 2011 -0.01
48 2012 0.78
48 2013 0.54
49 2007 0.15
49 2008 -0.06
49 2009 0.3
49 2010 -0.06
;
run;
proc sort data=have;
by id ret;
run;
data want;
input id year ret counter;
datalines;
45 2005 -0.30 -2
45 2007 -0.08 -1
45 2006 0.28 1
48 2011 -0.01 -1
48 2013 0.54 1
48 2012 0.78 2
49 2008 -0.06 -2
49 2010 -0.06 -1
49 2007 0.15 1
49 2009 0.3 2
;
run;
Please help me fin finding codes for:
1- Finding number of negative and positive values in each ID.
2- Allocating rank counter from smallest to largest.
Thanks.
While I would normally program this just the way @mkeintz suggested, it sounds like that's difficult for you. So I'll present an easier approach:
proc sort data=have out=negatives;
by id descending ret;
where ret < 0;
run;
proc sort data=have out=positives;
by id ret;
where ret > 0;
run;
data negatives;
set negatives;
by id;
if first.id then counter=-1;
else counter + (-1);
run;
data positives;
set positives;
by id;
if first.id then counter=1;
else counter + 1;
run;
data want;
set negatives positives;
by id;
run;
The order won't be quite right, but you can fix that:
proc sort data=want;
by id counter;
run;
This looks like a homework assignment.
Why don't you provide a sample dataset and code that you have tried to use, so we can help you figure things out.
But as an operational problem, it looks like you could
@mkeintz can you provide the code please?
While I would normally program this just the way @mkeintz suggested, it sounds like that's difficult for you. So I'll present an easier approach:
proc sort data=have out=negatives;
by id descending ret;
where ret < 0;
run;
proc sort data=have out=positives;
by id ret;
where ret > 0;
run;
data negatives;
set negatives;
by id;
if first.id then counter=-1;
else counter + (-1);
run;
data positives;
set positives;
by id;
if first.id then counter=1;
else counter + 1;
run;
data want;
set negatives positives;
by id;
run;
The order won't be quite right, but you can fix that:
proc sort data=want;
by id counter;
run;
@Astounding Thank you so much for your detailed reply. This is quite helpful.
Try this.....
data NEG WANT;
set HAVE;
if ret<0 then output NEG;
else output WANT;
run;
proc sort data=neg;
by ID descending RET;
run;
data NEG;
set NEG;
by id;
if first.id then ctr=.;
ctr+1;
run;
proc sql;
update NEG
SET CTR=CTR*-1;
quit;
proc sort data=want;
by id ret;
run;
data want;
set want;
by id;
if first.id then ctr=.;
ctr+1;
run;
proc append base=want data=neg force;
run;
proc sort data=want;
by id ctr;
run;
Hope this helps.
@ShiroAmada Thanks a lot. Really useful.
As you state that your data has millions of observations, it may be more efficient if you can do the calculation without sorting (assuming that you initial data is sorted by ID and YEAR):
data want; array values(100) 8 _temporary_; npos=0; nneg=0; n0=0; do _N_=1 by 1 until(last.id); set have(keep=id ret); by id; values(_N_)=ret; if ret<0 then nneg+1; /* number of negative values */ else if ret>0 then npos+1; /* number of positive values */ else n0+1; /* number of zeroes */ end; call sortn(of values(*)); do until(last.id); set have; by id; _N_=whichn(ret,of values(*)); if ret=0 then rank=0; else if ret<0 then rank=_N_-100+npos+n0-1; else rank=_N_-100+npos; values(_N_)=.; /* set missing, so that next record with same value gets next rank */ output; end; run;
If two years have the same ID and RET value, the program assigns the lowest rank to the earliest record. The array dimension (100) should not be smaller than the number of records for a single ID, as you appeared to have one record per ID and YEAR, I assumed that 100 was a safe value. I added the possibility of zeros, always giving a rank of 0. You may want to take missing values into account also (I would suggest giving them a missing rank, currently they are getting NPOS+N0-100).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.