Dear Sir,
I wish to create industry dummies from the 64602 observations from file industry.sum below:
Column 1=observation no,
column 2: industries code var=sic2;
column 3= frequencies of each industries
1 | 1 | 114 |
---|---|---|
2 | 2 | 28 |
3 | 7 | 27 |
4 | 8 | 10 |
5 | 9 | 6 |
6 | 10 | 754 |
7 | 12 | 138 |
8 | 13 | 1858 |
9 | 14 | 126 |
10 | 15 | 243 |
11 | 16 | 175 |
12 | 17 | 113 |
13 | 20 | 1080 |
14 | 21 | 88 |
15 | 22 | 130 |
16 | 23 | 435 |
17 | 24 | 207 |
18 | 25 | 270 |
19 | 26 | 442 |
20 | 27 | 476 |
21 | 28 | 4906 |
22 | 29 | 374 |
23 | 30 | 402 |
24 | 31 | 178 |
25 | 32 | 231 |
26 | 33 | 645 |
27 | 34 | 533 |
28 | 35 | 2692 |
29 | 36 | 4477 |
30 | 37 | 1015 |
31 | 38 | 3177 |
32 | 39 | 427 |
33 | 40 | 124 |
34 | 41 | 27 |
35 | 42 | 332 |
36 | 44 | 464 |
37 | 45 | 362 |
38 | 46 | 75 |
39 | 47 | 190 |
40 | 48 | 2058 |
41 | 49 | 1696 |
42 | 50 | 979 |
43 | 51 | 596 |
44 | 52 | 63 |
45 | 53 | 271 |
46 | 54 | 237 |
47 | 55 | 248 |
48 | 56 | 547 |
49 | 57 | 194 |
50 | 58 | 673 |
51 | 59 | 901 |
52 | 60 | 6436 |
53 | 61 | 590 |
54 | 62 | 797 |
55 | 63 | 1606 |
56 | 64 | 214 |
57 | 65 | 454 |
58 | 67 | 8892 |
59 | 70 | 194 |
60 | 72 | 156 |
61 | 73 | 6216 |
62 | 75 | 95 |
63 | 76 | 12 |
64 | 78 | 198 |
65 | 79 | 438 |
66 | 80 | 820 |
67 | 81 | 10 |
68 | 82 | 244 |
69 | 83 | 78 |
70 | 86 | 2 |
71 | 87 | 1007 |
72 | 99 | 329 |
my industries dummy should be coded as ind01, ind02, ind03 ...ind72, ie following the observation no, instead of following the sic2 codes.
I have the following codes but i m not sure how do i modify the prog codes to following the observation no, pls help me:
proc sql noprint;
select distinct cats('ind',sic2) into : list separated by ' ' from industry.sum;
quit;
data industry.spi;
set spi.spi5;
array _y &list ;
do over _y;
_y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);
end;
run;
or using codes as below:
data industry.spi2;
set spi.spi5;
array _y{*} &list ;
do i=1 to dim(_y);
_y{i}=0 ;
end;
do i=1 to dim(_y);
if fyear=input(compress(vname(_y{i}),'yr'),best8.) then _y{i}=1;
end;
drop i;
run;
OK. You almost reached it . It is what you are looking for ?
libname x v9 'c:\ ' ; proc sql; create table want1 as select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs from x.spi5 as a left join x.sum5 as b on a.sic2=b.sic2 ; quit; data obs1; do dummy_obs=1 to 72; output; end; run; proc sql noprint; select distinct cats('ind',put(dummy_obs,z2.)) into : list separated by ' ' from obs1; quit; data obs2; set work.want1; array _y &list ; do over _y; _y=ifn(dummy_obs=vname(_y),1,0); end; run;
Ksharp
EASY.
data industry.spi;
length dummy $ 10;
set spi.spi5;
dummy=cats('ind',put(obs,z3.));
array _y &list ;
do over _y;
_y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);
end;
run;
Ksharp
Refer to the log:
NOTE: Variable obs is uninitialized.
Problem still exist: The output generated would have ind## following the sic2 code instead of observation# of industry.sum file.
To minimize misunderstanding, these are the codes i have applied:
proc sort data=spi.spi6;
by sic2 gvkey fyear;
run;
proc sql noprint;
select distinct cats('ind',sic2) into : list separated by ' ' from industry.sum;
quit;
data industry.spi4;
length dummy $ 10;
set spi.spi6;
dummy=cats('ind',put(obs,z3.));
array _y &list ;
do over _y;
_y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);
end;
run;
OK. Let me give you some sample data. Using SIC2 as ind###.
data have; input obs sic2 fre; cards; 1 1 114 2 2 28 3 7 27 4 8 10 ; run; data want; set have; length dummy $ 10; dummy=cats('ind',put(sic2,z3.)); run;
Ksharp
hi KSharp,
refer to your data, I am looking for output of ind#001, ind#002, ind#003 and ind#004 (according to observation no) and no ind#001, ind#002, ind#7 and ind#008 (according to sic2 code).
so back to my own data, i need ind#01 till ind#72.
can it be done?
OK. The following is what you need ?
data have; input obs sic2 fre; cards; 1 1 114 2 2 28 3 7 27 4 8 10 ; run; data want; set have; length dummy_obs dummy_sic2 $ 10; dummy_obs=cats('ind#',put(obs,z2.)); dummy_sic2=cats('ind#',put(sic2,z2.)); run;
Ksharp
消息编辑者为:xia keshan
Hi Ksharp,
as you can see from my file industry.sum, there are only 3 columns, sic2, _FREQ_. _N_.
In fact there is no column by the name of observation. How do i add/'input' those observations from 1 to 72 in file industry.sum to become one variable, so that the dummy_obs can be created?
Thanks a lot!!
OK. How about:
SUM dataset is under c:\
libname x v9 'c:\'; data want; set x.sum; length dummy_obs dummy_sic2 $ 10; obs+1; dummy_obs=cats('ind#',put(obs,z2.)); dummy_sic2=cats('ind#',put(sic2,z2.)); run;
Ksharp
Dear KSharp,
i cannot use the above program as the dummy_sic2 and dummy_obs need to be created in a big data file (spi.spi7) that has 64602 observations. These dummies need to be created by referring from the file industry.sum5.
I am now creating a new file industry.sum5 which contains variable sic2 and obs. as you can see, when sic2=7, obs=3 and dummy_sic2 of value ind#007 and dummy_obs of value ind#003 should be created in data file spi.spi7. That means dummy_obs should be referred to file industry.sum5 to see which sic code it belongs to first then only apply to each company's data in spi.spi7, it will came out ind#003. can it be done?
if not, then i will just leave it to follow sic2 code....
I will attach the file industry.sum5. and will try to attach spi.spi5 which looks too big....
Thanks
Mei
Do you mean bring obs variable from sum5 table into spi5 table and make these dummy variables ?
"then only apply to each company's data in spi.spi7"
I do not understand what this means, do you want apply it to all the data in spi5 data ?
libname x v9 'c:\ ' ; proc sql; create table want as select a.*,cats('ind#',put(b.sic2,z2.)) as dummy_sic2,cats('ind#',put(obs,z2.)) as dummy_obs from x.spi5 as a left join x.sum5 as b on a.sic2=b.sic2 ; quit;
Ksharp
oh it is almost done. yes i want to create dummy-obs on file want with 1 and 0.
Pls see my prog codes:
proc sql;
create table want1 as
select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs
from industry.spi5 as a left join industry.sum5 as b
on a.sic2=b.sic2 ;
quit;
data obs1;
do dummy_obs=01 to 72;
output;
end;
run;
proc sql noprint;
select distinct cats('ind',dummy_obs) into : list separated by ' ' from obs1;
quit;
data obs2;
set work.want1;
array _y &list ;
do over _y;
_y=ifn(dummy_obs=compress(vname(_y),,'kd'), 1,0);
end;
run;
I know something is not right about assigning industry dummy of dummy_obs at 1-72 of the file obs1. How should i code properly so that i see ind#01-ind#72 or ind01 to ind72 at the final output want1?
I will attach file want1 and obs2
OK. You almost reached it . It is what you are looking for ?
libname x v9 'c:\ ' ; proc sql; create table want1 as select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs from x.spi5 as a left join x.sum5 as b on a.sic2=b.sic2 ; quit; data obs1; do dummy_obs=1 to 72; output; end; run; proc sql noprint; select distinct cats('ind',put(dummy_obs,z2.)) into : list separated by ' ' from obs1; quit; data obs2; set work.want1; array _y &list ; do over _y; _y=ifn(dummy_obs=vname(_y),1,0); end; run;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.