Solved: create subsets from original data

shalmali · Posted 03-21-2014 09:58 AM

Hello All,

I am trying to create four subsets from my current dataset. I am not good with SAS codes and thus am not able to construct them.

My main dataset looks as follows:

FIRM YEAR NAME DUMMY CHANGE

A 2004 Mr. X 0 0

A 2005 Mr. X 0 0

A 2006 Mr. Y 0 1

A 2007 Mr. Y 0 0

A 2008 Mr. Y 0 0

B 2004 Mr. S 0 0

B 2005 Mr. S 0 0

B 2006 Mr. T 1 1

B 2007 Mr. T 1 0

B 2008 Mr. T 1 0

C 2004 Mr. M 1 0

C 2005 Mr. M 1 0

C 2006 Mr. M 1 0

C 2007 Mr. N 1 1

C 2008 Mr. N 1 0

D 2004 Mr. R 1 0

D 2005 Mr. R 1 0

D 2006 Mr. V 0 1

D 2007 Mr. V 0 0

D 2008 Mr. V 0 0

Name is the name of the CEO in a firm. Dummy is 1 if the CEO meets some condition and 0 otherwise. CHANGE is 1 when there is change in the CEO. As you may see, there are some firms like A with DUMMY to be 0 for all observations, some firms like C with DUMMY to be 1 for all observations, some firms like B with DUMMY to be 0 for years before the CEO change and 1 for new CEO firm years, and some firms like D with DUMMY to be 1 for years before the CEO change and 0 for the new CEO firm year.

I am trying to create four new subsets which will be as follows:

Subset 1 will have only those firms whose DUMMY was 0 for all years. That is

FIRM YEAR NAME DUMMY CHANGE

A 2004 Mr. X 0 0

A 2005 Mr. X 0 0

A 2006 Mr. Y 0 1

A 2007 Mr. Y 0 0

A 2008 Mr. Y 0 0

,

Subset 2 will have only those firms whose DUMMY was 1 for all the years. That is

FIRM YEAR NAME DUMMY CHANGE

C 2004 Mr. M 1 0

C 2005 Mr. M 1 0

C 2006 Mr. M 1 0

C 2007 Mr. N 1 1

C 2008 Mr. N 1 0

.

Subset 3 will have firms whose DUMMY was 0 before the CEO change and 1 after the CEO change. That is,

FIRM YEAR NAME DUMMY CHANGE

B 2004 Mr. S 0 0

B 2005 Mr. S 0 0

B 2006 Mr. T 1 1

B 2007 Mr. T 1 0

B 2008 Mr. T 1 0

Subset 4 will have firms whose DUMMY was 1 before the CEO change and 0 after the CEO change. That is,

FIRM YEAR NAME DUMMY CHANGE

D 2004 Mr. R 1 0

D 2005 Mr. R 1 0

D 2006 Mr. V 0 1

D 2007 Mr. V 0 0

D 2008 Mr. V 0 0

I would really appreciate if someone share a code with me.

Thank you,

SS

RW9 · Posted 03-21-2014 11:38 AM

Hi,

Have done the first three. You can work out the fourth from the code, just need to mix up counts on the where - have to leave now.

data have;
attrib FIRM format=$1. YEAR format=best. NAME format=$200. DUMMY CHANGE format=best.;
infile datalines delimiter=',';
input firm $ year name $ dummy change;
datalines;
A,2004,Mr.X,0,0
A,2005,Mr.X,0,0
A,2006,Mr.Y,0,1
A,2007,Mr.Y,0,0
A,2008,Mr.Y,0,0
B,2004,Mr.S,0,0
B,2005,Mr.S,0,0
B,2006,Mr.T,1,1
B,2007,Mr.T,1,0
B,2008,Mr.T,1,0
C,2004,Mr.M,1,0
C,2005,Mr.M,1,0
C,2006,Mr.M,1,0
C,2007,Mr.N,1,1
C,2008,Mr.N,1,0
D,2004,Mr.R,1,0
D,2005,Mr.R,1,0
D,2006,Mr.V,0,1
D,2007,Mr.V,0,0
D,2008,Mr.V,0,0
;
run;

proc sql;
create table subset1 as
select *
from    have
group by firm
having   sum(dummy)=0;
create table subset2 as
select *
from    have
group by firm
having   sum(dummy)=5;
quit;

data tmp;
set have;
length flag $4;
retain flag;
by firm;
if first.firm then flag='pre';
if change=1 then flag='on';
if change=0 and flag='on' then flag='post';
run;

proc sql;
create table subset3 as
select *
from    have
where   firm in (select distinct firm from (select * from tmp where flag="pre") group by firm having sum(dummy)=0)
     and firm in (select distinct firm from (select * from tmp where flag="post") group by firm having sum(dummy)>0);
quit;

View solution in original post

RW9 · Posted 03-21-2014 11:38 AM

Hi,

Have done the first three. You can work out the fourth from the code, just need to mix up counts on the where - have to leave now.

data have;
attrib FIRM format=$1. YEAR format=best. NAME format=$200. DUMMY CHANGE format=best.;
infile datalines delimiter=',';
input firm $ year name $ dummy change;
datalines;
A,2004,Mr.X,0,0
A,2005,Mr.X,0,0
A,2006,Mr.Y,0,1
A,2007,Mr.Y,0,0
A,2008,Mr.Y,0,0
B,2004,Mr.S,0,0
B,2005,Mr.S,0,0
B,2006,Mr.T,1,1
B,2007,Mr.T,1,0
B,2008,Mr.T,1,0
C,2004,Mr.M,1,0
C,2005,Mr.M,1,0
C,2006,Mr.M,1,0
C,2007,Mr.N,1,1
C,2008,Mr.N,1,0
D,2004,Mr.R,1,0
D,2005,Mr.R,1,0
D,2006,Mr.V,0,1
D,2007,Mr.V,0,0
D,2008,Mr.V,0,0
;
run;

proc sql;
create table subset1 as
select *
from    have
group by firm
having   sum(dummy)=0;
create table subset2 as
select *
from    have
group by firm
having   sum(dummy)=5;
quit;

data tmp;
set have;
length flag $4;
retain flag;
by firm;
if first.firm then flag='pre';
if change=1 then flag='on';
if change=0 and flag='on' then flag='post';
run;

proc sql;
create table subset3 as
select *
from    have
where   firm in (select distinct firm from (select * from tmp where flag="pre") group by firm having sum(dummy)=0)
     and firm in (select distinct firm from (select * from tmp where flag="post") group by firm having sum(dummy)>0);
quit;

shalmali · Posted 03-21-2014 06:40 PM

Thanks a lot.

create subsets from original data

Re: create subsets from original data

Re: create subsets from original data

Re: create subsets from original data

create subsets from original data

Re: create subsets from original data

Re: create subsets from original data

Re: create subsets from original data

Registration is open

SAS Training: Just a Click Away