Hi, SAS Community,
I should again reach out for help from you.
My dataset looks like this:
data have;
infile cards truncover;
input CUSIP $ YEAR Couplets;
cards;
00036110 1990 0
00036110 1991 0
00036110 1992 1
00036110 2000 0
00036110 2001 1
00036110 2002 1
00036110 2010 0
00036110 2011 1
00036110 2012 0
00036110 2013 1
00037520 2009 0
00037520 2010 0
00037520 2011 1
00037520 2012 1
00037520 2014 0
00037520 2015 1
00086T10 2005 0
00086T10 2006 0
00086T10 2007 1
00095710 2014 0
00095710 2015 0
00095710 2016 1
;
I proc sorted my data by cusip and year.
What I want to do is, sieve only the consecutive (by year) 0-1 couplets and weed out others from my dataset. In other words:
CUSIP | YEAR | Couplets |
00036110 | 1991 | 0 |
00036110 | 1992 | 1 |
00036110 | 2000 | 0 |
00036110 | 2001 | 1 |
00036110 | 2010 | 0 |
00036110 | 2011 | 1 |
00036110 | 2012 | 0 |
00036110 | 2013 | 1 |
00037520 | 2010 | 0 |
00037520 | 2011 | 1 |
00037520 | 2014 | 0 |
00037520 | 2015 | 1 |
….......... |
Same CUSIP-consecutive year-consecutive 0-1.
This task looks simple, but for me as an eternal novice in SAS, it is quite a puzzle.
I tried lag, first.variable, last.variable, descending proc sort, but I cannot solve this.
Sincerely, I appreciate your help!
Sincerely,
KS -,
It is hard to look ahead. Looking back is pretty easy using the LAG() function.
So here is a way to do what you want that does not need to look ahead. (at least uses BY group processing ability to look ahead).
Instead it looks for a one that follows a zero and then uses POINT= option on a separate SET statement to re-read the pair of observations and output them.
data want ;
set have;
by CUSIP Couplets notsorted;
lag_c = lag(Couplets);
if first.Couplets and Couplets=1 and lag_c=0 and not first.CUSIP then do p=_n_-1,_n_;
set have point=p;
output;
end;
drop lag_c;
run;
You might also want to check that you don't have gaps in the YEAR values.
It is hard to look ahead. Looking back is pretty easy using the LAG() function.
So here is a way to do what you want that does not need to look ahead. (at least uses BY group processing ability to look ahead).
Instead it looks for a one that follows a zero and then uses POINT= option on a separate SET statement to re-read the pair of observations and output them.
data want ;
set have;
by CUSIP Couplets notsorted;
lag_c = lag(Couplets);
if first.Couplets and Couplets=1 and lag_c=0 and not first.CUSIP then do p=_n_-1,_n_;
set have point=p;
output;
end;
drop lag_c;
run;
You might also want to check that you don't have gaps in the YEAR values.
Wow, the code is exquisite!
Thank you Tom, it works perfectly.
Trying to understand your code, I remembered I learned from SAS community something like this:
if <condition> then do p=max(_n_-1,1) to min(_n_+1,nobs);
I undertand this is a step forward application from your code.
Have a nice weekend!
Sincerely,
KS -,
@KS99 wrote:
Wow, the code is exquisite!
Thank you Tom, it works perfectly.
Trying to understand your code, I remembered I learned from SAS community something like this:
if <condition> then do p=max(_n_-1,1) to min(_n_+1,nobs);
I undertand this is a step forward application from your code.
Have a nice weekend!
Sincerely,
KS -,
In this situation _N_ can never be less 2 nor greater than NOBS so the MAX() and MIN() are not need. And since we only want two values of P there is no need to use an iterative range ( lowerbound TO upperbound BY increment) since we can just use two distinct values. Remember you can have as many value (or ranges) as you want in the DO loop, just separate them by commas. Example: do time=1,2,3,5,10 to 50 by 10, 100 to 1000 by 100;
You can avoid explicit look ahead and the point= option. If the value of couplet is always a 0 or 1, and you want a pair of consecutive records in which the zero is followed by the one:
data want ;
set have;
by CUSIP Couplets notsorted;
if last.couplets=1 and couplets=0 and last.cusip=0
or first.couplets=1 and couplets=1 and first.cusip=0;
run;
Thank you mkeintz,
I will copy your code, and use it in a similar context.
Sincerely,
KS -
data have;
infile cards truncover;
input CUSIP $ YEAR Couplets;
cards;
00036110 1990 0
00036110 1991 0
00036110 1992 1
00036110 2000 0
00036110 2001 1
00036110 2002 1
00036110 2010 0
00036110 2011 1
00036110 2012 0
00036110 2013 1
00037520 2009 0
00037520 2010 0
00037520 2011 1
00037520 2012 1
00037520 2014 0
00037520 2015 1
00086T10 2005 0
00086T10 2006 0
00086T10 2007 1
00095710 2014 0
00095710 2015 0
00095710 2016 1
;
data want;
merge have have(firstobs=2 keep= CUSIP YEAR Couplets rename=(
CUSIP=_CUSIP YEAR=_YEAR Couplets=_Couplets ));
if (CUSIP=_CUSIP and YEAR=_YEAR-1 and Couplets=0 and _Couplets=1) or
(CUSIP=lag(CUSIP) and YEAR=lag(YEAR)+1 and Couplets=1 and lag(Couplets)=0) ;
drop _:;
run;
Thank you Ksharp,
long time no hear.
I will copy your code. As a novice like me in SAS, "data B merge A A" is a surprise. I think I can use them in a very efficient way.
Sincerely,
KS -,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.