hi i have a data set with the following variables stockname price date year
i have data for 7 years for 100+ stocks
i want to subset my data set and keep only those stocks which have been traded in all years,
e.g.
input stockname price date year;
abc 26 20071205 2007
abc 35 20071206 2007
abc 75 20080123 2008
abc 45 20080612 2008
abc 55 20090612 2009
xyz 26 20071205 2007
xyz 35 20071206 2007
xyz 75 20080123 2008
xyz 45 20080612 2008
;
Now in this example stock abc is traded for all three years but xyz is not , i want to subset data to exclude all stocks that are not traded foro the whole time period
If you know that the number of years is constant (7), I would use first. and last. logic.
Proc sort data=dataset;
by stockname;
run;
Data limit;
set dataset;
by stockname;
if first.stockname then counter = 0;
counter + 1;
if last.stockname;
if counter lt 7 then delete;
run;
This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.
Does this work?
Lea
If you know that the number of years is constant (7), I would use first. and last. logic.
Proc sort data=dataset;
by stockname;
run;
Data limit;
set dataset;
by stockname;
if first.stockname then counter = 0;
counter + 1;
if last.stockname;
if counter lt 7 then delete;
run;
This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.
Does this work?
Lea
thanks for the reply, but you mean to say i will have to pick the other data manually, (e.g if i know the stcoks which have data for 7 years, how would i subset my data for all only those stocks??)
because the number of stocks is several hundred, close to 500, and the frequency is daily
If I'm understanding your problem correctly, you can use two steps. First, de-dupe your data by stock and year, and then run the logic above to grab only those stocks traded in all 7 years. Second, you would merge that list of stocks back with your other data (that picks up price and date) to get the final dataset.
Data final;
merge limit (in=in1 keep=stockname) dataset;
by stockname;
if in1;
run;
hey thanks Lea works perfect
Ahmad
Or SQL. If you like it.
data have; input stockname $ price date year; cards; abc 26 20071205 2007 abc 35 20071206 2007 abc 75 20080123 2008 abc 45 20080612 2008 abc 55 20090612 2009 xyz 26 20071205 2007 xyz 35 20071206 2007 xyz 75 20080123 2008 xyz 45 20080612 2008 ; run; proc sql ; select count(distinct year) into : n from have; create table want as select * from have where stockname in ( select stockname from (select stockname,year,count(*) as _count from have group by stockname,year) group by stockname having count(year) eq &n); quit;
Ksharp
Or maybe
create table want as
select * from have
group by stockname
having count(distinct year)
EQ (select range(year)+1 from have)
order by stockname, date ;
Thanks, Howles.
Which remind me the skill I have forgotten. I am old i think.
proc sql ; create table want as select * from have group by stockname having count(distinct year) eq (select count(distinct year) from have); quit;
Ksharp
I would still use RANGE to determine the correct number of years. Consider this test data:
data have;
input stockname $ price date year;
cards;
abc 35 20071206 2007
abc 55 20090612 2009
;
hi ksharp can you please explain your code a bit, as it only gives one value per stock per year, but what i want is to get daily data for stocks which have data for all seven years. Thanks
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.