- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hi i have a data set with the following variables stockname price date year
i have data for 7 years for 100+ stocks
i want to subset my data set and keep only those stocks which have been traded in all years,
e.g.
input stockname price date year;
abc 26 20071205 2007
abc 35 20071206 2007
abc 75 20080123 2008
abc 45 20080612 2008
abc 55 20090612 2009
xyz 26 20071205 2007
xyz 35 20071206 2007
xyz 75 20080123 2008
xyz 45 20080612 2008
;
Now in this example stock abc is traded for all three years but xyz is not , i want to subset data to exclude all stocks that are not traded foro the whole time period
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you know that the number of years is constant (7), I would use first. and last. logic.
Proc sort data=dataset;
by stockname;
run;
Data limit;
set dataset;
by stockname;
if first.stockname then counter = 0;
counter + 1;
if last.stockname;
if counter lt 7 then delete;
run;
This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.
Does this work?
Lea
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you know that the number of years is constant (7), I would use first. and last. logic.
Proc sort data=dataset;
by stockname;
run;
Data limit;
set dataset;
by stockname;
if first.stockname then counter = 0;
counter + 1;
if last.stockname;
if counter lt 7 then delete;
run;
This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.
Does this work?
Lea
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks for the reply, but you mean to say i will have to pick the other data manually, (e.g if i know the stcoks which have data for 7 years, how would i subset my data for all only those stocks??)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
because the number of stocks is several hundred, close to 500, and the frequency is daily
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I'm understanding your problem correctly, you can use two steps. First, de-dupe your data by stock and year, and then run the logic above to grab only those stocks traded in all 7 years. Second, you would merge that list of stocks back with your other data (that picks up price and date) to get the final dataset.
Data final;
merge limit (in=in1 keep=stockname) dataset;
by stockname;
if in1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hey thanks Lea works perfect
Ahmad
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Or SQL. If you like it.
data have; input stockname $ price date year; cards; abc 26 20071205 2007 abc 35 20071206 2007 abc 75 20080123 2008 abc 45 20080612 2008 abc 55 20090612 2009 xyz 26 20071205 2007 xyz 35 20071206 2007 xyz 75 20080123 2008 xyz 45 20080612 2008 ; run; proc sql ; select count(distinct year) into : n from have; create table want as select * from have where stockname in ( select stockname from (select stockname,year,count(*) as _count from have group by stockname,year) group by stockname having count(year) eq &n); quit;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Or maybe
create table want as
select * from have
group by stockname
having count(distinct year)
EQ (select range(year)+1 from have)
order by stockname, date ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Howles.
Which remind me the skill I have forgotten. I am old i think.
proc sql ; create table want as select * from have group by stockname having count(distinct year) eq (select count(distinct year) from have); quit;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would still use RANGE to determine the correct number of years. Consider this test data:
data have;
input stockname $ price date year;
cards;
abc 35 20071206 2007
abc 55 20090612 2009
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hi ksharp can you please explain your code a bit, as it only gives one value per stock per year, but what i want is to get daily data for stocks which have data for all seven years. Thanks