Solved: Re: industry dummies

mei · Posted 05-27-2012 11:58 PM

Dear Sir,

I wish to create industry dummies from the 64602 observations from file industry.sum below:

Column 1=observation no,

column 2: industries code var=sic2;

column 3= frequencies of each industries

1	1	114
2	2	28
3	7	27
4	8	10
5	9	6
6	10	754
7	12	138
8	13	1858
9	14	126
10	15	243
11	16	175
12	17	113
13	20	1080
14	21	88
15	22	130
16	23	435
17	24	207
18	25	270
19	26	442
20	27	476
21	28	4906
22	29	374
23	30	402
24	31	178
25	32	231
26	33	645
27	34	533
28	35	2692
29	36	4477
30	37	1015
31	38	3177
32	39	427
33	40	124
34	41	27
35	42	332
36	44	464
37	45	362
38	46	75
39	47	190
40	48	2058
41	49	1696
42	50	979
43	51	596
44	52	63
45	53	271
46	54	237
47	55	248
48	56	547
49	57	194
50	58	673
51	59	901
52	60	6436
53	61	590
54	62	797
55	63	1606
56	64	214
57	65	454
58	67	8892
59	70	194
60	72	156
61	73	6216
62	75	95
63	76	12
64	78	198
65	79	438
66	80	820
67	81	10
68	82	244
69	83	78
70	86	2
71	87	1007
72	99	329

my industries dummy should be coded as ind01, ind02, ind03 ...ind72, ie following the observation no, instead of following the sic2 codes.

I have the following codes but i m not sure how do i modify the prog codes to following the observation no, pls help me:

proc sql noprint;

select distinct cats('ind',sic2) into : list separated by ' ' from industry.sum;

quit;

data industry.spi;

set spi.spi5;

array _y &list ;

do over _y;

_y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);

end;

run;

or using codes as below:

data industry.spi2;

set spi.spi5;

array _y{*} &list ;

do i=1 to dim(_y);

_y{i}=0 ;

end;

do i=1 to dim(_y);

if fyear=input(compress(vname(_y{i}),'yr'),best8.) then _y{i}=1;

end;

drop i;

run;

Ksharp · Posted 05-30-2012 04:46 AM

OK. You almost reached it . It is what you are looking for ?

libname x v9 'c:\ ' ;
proc sql;
create table want1 as
select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs
  from x.spi5 as a left join x.sum5 as b
   on a.sic2=b.sic2 ;
quit;

data obs1;
do dummy_obs=1 to 72;
output;
end;
run;


proc sql noprint;
select distinct cats('ind',put(dummy_obs,z2.)) into : list separated by ' ' from obs1;
quit;


data obs2;
set work.want1;
array _y &list ;
do over _y;
  _y=ifn(dummy_obs=vname(_y),1,0);
end;
run;

Ksharp

View solution in original post

Ksharp · Posted 05-28-2012 12:49 AM

EASY.

data industry.spi;

length dummy $ 10;

set spi.spi5;

dummy=cats('ind',put(obs,z3.));

array _y &list ;

do over _y;

_y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);

end;

run;

Ksharp

mei · Posted 05-28-2012 03:34 AM

Refer to the log:

NOTE: Variable obs is uninitialized.

Problem still exist: The output generated would have ind## following the sic2 code instead of observation# of industry.sum file.

To minimize misunderstanding, these are the codes i have applied:

proc sort data=spi.spi6;

by sic2 gvkey fyear;

run;

proc sql noprint;

select distinct cats('ind',sic2) into : list separated by ' ' from industry.sum;

quit;

data industry.spi4;

length dummy $ 10;

set spi.spi6;

dummy=cats('ind',put(obs,z3.));

array _y &list ;

do over _y;

_y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);

end;

run;

Ksharp · Posted 05-28-2012 05:13 AM

OK. Let me give you some sample data. Using SIC2 as ind###.

data have;
input obs sic2 fre;
cards;
1     1     114
2     2     28
3     7     27
4     8     10
;
run;
data want;
set have;
length dummy $ 10;
dummy=cats('ind',put(sic2,z3.));
run;

Ksharp

mei · Posted 05-28-2012 05:44 AM

hi KSharp,

refer to your data, I am looking for output of ind#001, ind#002, ind#003 and ind#004 (according to observation no) and no ind#001, ind#002, ind#7 and ind#008 (according to sic2 code).

so back to my own data, i need ind#01 till ind#72.

can it be done?

Ksharp · Posted 05-28-2012 07:24 AM

OK. The following is what you need ?

data have;
input obs sic2 fre;
cards;
1     1     114
2     2     28
3     7     27
4     8     10
;
run;
data want;
set have;
length dummy_obs dummy_sic2 $ 10;
dummy_obs=cats('ind#',put(obs,z2.));
dummy_sic2=cats('ind#',put(sic2,z2.));
run;

Ksharp

消息编辑者为：xia keshan

mei · Posted 05-29-2012 05:12 AM

Hi Ksharp,

as you can see from my file industry.sum, there are only 3 columns, sic2, _FREQ_. _N_.

In fact there is no column by the name of observation. How do i add/'input' those observations from 1 to 72 in file industry.sum to become one variable, so that the dummy_obs can be created?

Thanks a lot!!

Ksharp · Posted 05-29-2012 05:28 AM

OK. How about:

SUM dataset is under c:\

libname x v9 'c:\';
data want;
set x.sum;
length dummy_obs dummy_sic2 $ 10;
obs+1;
dummy_obs=cats('ind#',put(obs,z2.));
dummy_sic2=cats('ind#',put(sic2,z2.));
run;

Ksharp

mei · Posted 05-30-2012 12:14 AM

Dear KSharp,

i cannot use the above program as the dummy_sic2 and dummy_obs need to be created in a big data file (spi.spi7) that has 64602 observations. These dummies need to be created by referring from the file industry.sum5.

I am now creating a new file industry.sum5 which contains variable sic2 and obs. as you can see, when sic2=7, obs=3 and dummy_sic2 of value ind#007 and dummy_obs of value ind#003 should be created in data file spi.spi7. That means dummy_obs should be referred to file industry.sum5 to see which sic code it belongs to first then only apply to each company's data in spi.spi7, it will came out ind#003. can it be done?

if not, then i will just leave it to follow sic2 code....

I will attach the file industry.sum5. and will try to attach spi.spi5 which looks too big....

Thanks

Mei

Ksharp · Posted 05-30-2012 01:38 AM

Do you mean bring obs variable from sum5 table into spi5 table and make these dummy variables ?

"then only apply to each company's data in spi.spi7"

I do not understand what this means, do you want apply it to all the data in spi5 data ?

libname x v9 'c:\ ' ;
proc sql;
create table want as
 select a.*,cats('ind#',put(b.sic2,z2.)) as dummy_sic2,cats('ind#',put(obs,z2.)) as dummy_obs
  from x.spi5 as a left join x.sum5 as b
   on a.sic2=b.sic2 ;
quit;

Ksharp

mei · Posted 05-30-2012 02:53 AM

oh it is almost done. yes i want to create dummy-obs on file want with 1 and 0.

Pls see my prog codes:

proc sql;

create table want1 as

select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs

from industry.spi5 as a left join industry.sum5 as b

on a.sic2=b.sic2 ;

quit;

data obs1;

do dummy_obs=01 to 72;

output;

end;

run;

proc sql noprint;

select distinct cats('ind',dummy_obs) into : list separated by ' ' from obs1;

quit;

data obs2;

set work.want1;

array _y &list ;

do over _y;

_y=ifn(dummy_obs=compress(vname(_y),,'kd'), 1,0);

end;

run;

I know something is not right about assigning industry dummy of dummy_obs at 1-72 of the file obs1. How should i code properly so that i see ind#01-ind#72 or ind01 to ind72 at the final output want1?

I will attach file want1 and obs2

Ksharp · Posted 05-30-2012 04:46 AM

OK. You almost reached it . It is what you are looking for ?

libname x v9 'c:\ ' ;
proc sql;
create table want1 as
select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs
  from x.spi5 as a left join x.sum5 as b
   on a.sic2=b.sic2 ;
quit;

data obs1;
do dummy_obs=1 to 72;
output;
end;
run;


proc sql noprint;
select distinct cats('ind',put(dummy_obs,z2.)) into : list separated by ' ' from obs1;
quit;


data obs2;
set work.want1;
array _y &list ;
do over _y;
  _y=ifn(dummy_obs=vname(_y),1,0);
end;
run;

Ksharp

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away