Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

how to subset observations based on id if they exist for all years.... NEED HELP

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

how to subset observations based on id if they exist for all years.... NEED HELP

hi i have a data set with the following variables stockname price date year

i have data for 7 years for 100+ stocks

i want to subset my data set and keep only those stocks which have been traded in all years,

e.g.

input stockname price date year;

abc 26  20071205 2007

abc 35 20071206 2007

abc 75 20080123 2008

abc 45 20080612 2008

abc 55 20090612 2009

xyz 26  20071205 2007

xyz 35 20071206 2007

xyz 75 20080123 2008

xyz  45 20080612 2008

;

Now in this example stock abc is traded for all three years but xyz is not , i want to subset data to exclude all stocks that are not traded foro the whole time period


Accepted Solutions
Solution
‎07-08-2012 09:49 PM
New Contributor Lea
New Contributor
Posts: 3

Re: how to subset observations based on id if they exist for all years.... NEED HELP

If you know that the number of years is constant (7), I would use first. and last. logic.

Proc sort data=dataset;

by stockname;

run;

Data limit;

  set dataset;

  by stockname;

if first.stockname then counter = 0;

counter + 1;

if last.stockname;

if counter lt 7 then delete;

run;

This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.

Does this work?

Lea

View solution in original post


All Replies
Solution
‎07-08-2012 09:49 PM
New Contributor Lea
New Contributor
Posts: 3

Re: how to subset observations based on id if they exist for all years.... NEED HELP

If you know that the number of years is constant (7), I would use first. and last. logic.

Proc sort data=dataset;

by stockname;

run;

Data limit;

  set dataset;

  by stockname;

if first.stockname then counter = 0;

counter + 1;

if last.stockname;

if counter lt 7 then delete;

run;

This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.

Does this work?

Lea

Contributor
Posts: 43

Re: how to subset observations based on id if they exist for all years.... NEED HELP

thanks for the reply, but you mean to say i will have to pick the other data manually, (e.g if i know the stcoks which have data for 7 years, how would i subset my data for all only those stocks??)

Contributor
Posts: 43

Re: how to subset observations based on id if they exist for all years.... NEED HELP

because the number of stocks is several hundred, close to 500, and the frequency is daily

New Contributor Lea
New Contributor
Posts: 3

Re: how to subset observations based on id if they exist for all years.... NEED HELP

If I'm understanding your problem correctly, you can use two steps.  First, de-dupe your data by stock and year, and then run the logic above to grab only those stocks traded in all 7 years.  Second, you would merge that list of stocks back with your other data (that picks up price and date) to get the final dataset.

Data final;

  merge limit (in=in1 keep=stockname) dataset;

by stockname;

if in1;

run;

Contributor
Posts: 43

Re: how to subset observations based on id if they exist for all years.... NEED HELP

hey thanks Lea  works perfect

Ahmad

Super User
Posts: 10,044

Re: how to subset observations based on id if they exist for all years.... NEED HELP

Or SQL. If you like it.

data have;
input stockname $ price date year;
cards;
abc 26  20071205 2007
abc 35 20071206 2007
abc 75 20080123 2008
abc 45 20080612 2008
abc 55 20090612 2009
xyz 26  20071205 2007
xyz 35 20071206 2007
xyz 75 20080123 2008
xyz  45 20080612 2008
;
run;
proc sql ;
select count(distinct year) into : n from have;
create table want as
 select * from have 
  where stockname in (
       select stockname from (select stockname,year,count(*) as _count from have group by stockname,year)
         group by stockname
          having count(year) eq &n);
quit; 

Ksharp

Regular Contributor
Posts: 184

Re: how to subset observations based on id if they exist for all years.... NEED HELP

Or maybe

create table want as

select * from have

  group by stockname

  having count(distinct year)

      EQ (select range(year)+1 from have)

order by stockname, date ;

Super User
Posts: 10,044

Re: how to subset observations based on id if they exist for all years.... NEED HELP

Thanks, Howles.

Which remind me the skill I have forgotten. I am old i think.

proc sql ;
create table want as
 select * from have 
    group by stockname
       having count(distinct year) eq (select count(distinct year) from have);
quit; 

Ksharp

Regular Contributor
Posts: 184

Re: how to subset observations based on id if they exist for all years.... NEED HELP

I would still use RANGE to determine the correct number of years. Consider this test data:

data have;

input stockname $ price date year;

cards;

abc 35 20071206 2007

abc 55 20090612 2009

;

Contributor
Posts: 43

Re: how to subset observations based on id if they exist for all years.... NEED HELP

hi ksharp can you please explain your code a bit, as it only gives one value per stock per year, but what i want is to get daily data for stocks which have data for all seven years. Thanks

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 643 views
  • 8 likes
  • 4 in conversation