BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ahmad
Calcite | Level 5

hi i have a data set with the following variables stockname price date year

i have data for 7 years for 100+ stocks

i want to subset my data set and keep only those stocks which have been traded in all years,

e.g.

input stockname price date year;

abc 26  20071205 2007

abc 35 20071206 2007

abc 75 20080123 2008

abc 45 20080612 2008

abc 55 20090612 2009

xyz 26  20071205 2007

xyz 35 20071206 2007

xyz 75 20080123 2008

xyz  45 20080612 2008

;

Now in this example stock abc is traded for all three years but xyz is not , i want to subset data to exclude all stocks that are not traded foro the whole time period

1 ACCEPTED SOLUTION

Accepted Solutions
Lea
Calcite | Level 5 Lea
Calcite | Level 5

If you know that the number of years is constant (7), I would use first. and last. logic.

Proc sort data=dataset;

by stockname;

run;

Data limit;

  set dataset;

  by stockname;

if first.stockname then counter = 0;

counter + 1;

if last.stockname;

if counter lt 7 then delete;

run;

This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.

Does this work?

Lea

View solution in original post

10 REPLIES 10
Lea
Calcite | Level 5 Lea
Calcite | Level 5

If you know that the number of years is constant (7), I would use first. and last. logic.

Proc sort data=dataset;

by stockname;

run;

Data limit;

  set dataset;

  by stockname;

if first.stockname then counter = 0;

counter + 1;

if last.stockname;

if counter lt 7 then delete;

run;

This will give you only the final year for the stocks you want, and you can go back and pick up the other data if you need it.

Does this work?

Lea

Ahmad
Calcite | Level 5

thanks for the reply, but you mean to say i will have to pick the other data manually, (e.g if i know the stcoks which have data for 7 years, how would i subset my data for all only those stocks??)

Ahmad
Calcite | Level 5

because the number of stocks is several hundred, close to 500, and the frequency is daily

Lea
Calcite | Level 5 Lea
Calcite | Level 5

If I'm understanding your problem correctly, you can use two steps.  First, de-dupe your data by stock and year, and then run the logic above to grab only those stocks traded in all 7 years.  Second, you would merge that list of stocks back with your other data (that picks up price and date) to get the final dataset.

Data final;

  merge limit (in=in1 keep=stockname) dataset;

by stockname;

if in1;

run;

Ahmad
Calcite | Level 5

hey thanks Lea  works perfect

Ahmad

Ksharp
Super User

Or SQL. If you like it.

data have;
input stockname $ price date year;
cards;
abc 26  20071205 2007
abc 35 20071206 2007
abc 75 20080123 2008
abc 45 20080612 2008
abc 55 20090612 2009
xyz 26  20071205 2007
xyz 35 20071206 2007
xyz 75 20080123 2008
xyz  45 20080612 2008
;
run;
proc sql ;
select count(distinct year) into : n from have;
create table want as
 select * from have 
  where stockname in (
       select stockname from (select stockname,year,count(*) as _count from have group by stockname,year)
         group by stockname
          having count(year) eq &n);
quit; 

Ksharp

Howles
Quartz | Level 8

Or maybe

create table want as

select * from have

  group by stockname

  having count(distinct year)

      EQ (select range(year)+1 from have)

order by stockname, date ;

Ksharp
Super User

Thanks, Howles.

Which remind me the skill I have forgotten. I am old i think.

proc sql ;
create table want as
 select * from have 
    group by stockname
       having count(distinct year) eq (select count(distinct year) from have);
quit; 

Ksharp

Howles
Quartz | Level 8

I would still use RANGE to determine the correct number of years. Consider this test data:

data have;

input stockname $ price date year;

cards;

abc 35 20071206 2007

abc 55 20090612 2009

;

Ahmad
Calcite | Level 5

hi ksharp can you please explain your code a bit, as it only gives one value per stock per year, but what i want is to get daily data for stocks which have data for all seven years. Thanks

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2953 views
  • 8 likes
  • 4 in conversation