Hello,
I have a very large dataset with several variables, including stock ticker and security ID. I am trying to include only observations where the stock ticker was constant over the time period I have in my dataset. Here are samples of 2 stocks:
ABSIE | 5001 |
ABSIE | 5001 |
ABSIE | 5001 |
ABSIE | 5001 |
ABSIE | 5001 |
ZZZZ | 5001 |
ZZZZ | 5001 |
ZZZZ | 5001 |
ZZZZ | 5001 |
ZZZZ | 5001 |
ZZZZ | 5001 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
AAPL | 101594 |
I would want to keep all observations with security ID 101594 and delete all observations with security ID 5001. So, if there is a "ZZZZ" present in any single observation with a security ID, I need to delete all observations with the security ID. In other words, every observation with the same security ID should have the same ticker.
Thanks so much, I really appreciate it!
How large is large?
Here is one approach but I make no claim for efficiency:
proc sql; create table want as select have.* from ( select security_id,count(ticker) as tcount from (select distinct ticker, security_id from have ) group by security_id having calculated tcount = 1 ) as a left join have on a.security_id=have.security_id ; quit;
the innermost part gets a set of single ticker and security_id, then count to see if there are more than one ticker associated with the security_id. Then use the security_id with a count of 1 to match to your whole data set to select the desired tickets (and any other data in the set).
untested:
proc sql;
create table want as
select *
from have
group by security_id
having max(ticker)=min(ticker);
quit;
This might work:
proc sql;
create table want as
select *
from have
group by security_id
having count(distinct ticker) = 1;
select * from want;
quit;
If you're interested in DATA step solutions, you could use a double-DoW loop. It requires two passes of the data, and the data must be sorted. You process the data in groups, by Security_ID, first to identify whether Ticker changes within the group, and then a second time to output the records for Security_ID groups where Ticker is constant.
data want;
do until(last.Security_ID);
set have;
by Security_ID Ticker;
*If ticker changes within security_ID, flag the security_ID;
if first.Ticker=1 and first.Security_ID=0 then _deleteme=1;
end;
do until(last.Security_ID);
set have;
by Security_ID;
if NOT (_deleteme) then output;
end;
drop _: ;
run;
w
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.