BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mei
Calcite | Level 5 mei
Calcite | Level 5

Dear Sir,

I wish to create industry dummies from the 64602 observations from file industry.sum below:

Column 1=observation no,

column 2: industries code var=sic2;

column 3= frequencies of each industries

11114
2228
3727
4810
596
610754
712138
8131858
914126
1015243
1116175
1217113
13201080
142188
1522130
1623435
1724207
1825270
1926442
2027476
21284906
2229374
2330402
2431178
2532231
2633645
2734533
28352692
29364477
30371015
31383177
3239427
3340124
344127
3542332
3644464
3745362
384675
3947190
40482058
41491696
4250979
4351596
445263
4553271
4654237
4755248
4856547
4957194
5058673
5159901
52606436
5361590
5462797
55631606
5664214
5765454
58678892
5970194
6072156
61736216
627595
637612
6478198
6579438
6680820
678110
6882244
698378
70862
71871007
7299329

my industries dummy should be coded as ind01, ind02, ind03 ...ind72, ie following the observation no, instead of following the sic2 codes.

I have the following codes but i m not sure how do i modify the prog codes to following the observation no, pls help me:

proc sql noprint;

select distinct cats('ind',sic2) into : list separated by ' ' from industry.sum;

quit;

data industry.spi;

set spi.spi5;

array _y &list ;

do over _y;

  _y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);

end;

run;

or using codes as below:

data industry.spi2;

set spi.spi5;

array _y{*} &list ;

do i=1 to dim(_y);

  _y{i}=0 ;

end;

do i=1 to dim(_y);

  if fyear=input(compress(vname(_y{i}),'yr'),best8.) then _y{i}=1;

end;

  drop i;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. You almost reached it . It is what you are looking for ?

libname x v9 'c:\ ' ;
proc sql;
create table want1 as
select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs
  from x.spi5 as a left join x.sum5 as b
   on a.sic2=b.sic2 ;
quit;

data obs1;
do dummy_obs=1 to 72;
output;
end;
run;


proc sql noprint;
select distinct cats('ind',put(dummy_obs,z2.)) into : list separated by ' ' from obs1;
quit;


data obs2;
set work.want1;
array _y &list ;
do over _y;
  _y=ifn(dummy_obs=vname(_y),1,0);
end;
run;

Ksharp

View solution in original post

11 REPLIES 11
Ksharp
Super User

EASY.

data industry.spi;

length dummy $ 10;

set spi.spi5;

dummy=cats('ind',put(obs,z3.));

array _y &list ;

do over _y;

  _y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);

end;

run;

Ksharp

mei
Calcite | Level 5 mei
Calcite | Level 5

Refer to the log:

NOTE: Variable obs is uninitialized.

Problem still exist: The output generated would have ind## following the sic2 code instead of observation# of industry.sum file.

To minimize misunderstanding, these are the codes i have applied:

proc sort data=spi.spi6;

by sic2 gvkey fyear;

run;

proc sql noprint;

select distinct cats('ind',sic2) into : list separated by ' ' from industry.sum;

quit;

data industry.spi4;

length dummy $ 10;

set spi.spi6;

dummy=cats('ind',put(obs,z3.));

array _y &list ;

do over _y;

  _y=ifn(sic2=compress(vname(_y),,'kd'), 1,0);

end;

run;

Ksharp
Super User

OK. Let me give you some sample data. Using SIC2 as ind###.

data have;
input obs sic2 fre;
cards;
1     1     114
2     2     28
3     7     27
4     8     10
;
run;
data want;
set have;
length dummy $ 10;
dummy=cats('ind',put(sic2,z3.));
run;

Ksharp

mei
Calcite | Level 5 mei
Calcite | Level 5

hi KSharp,

refer to your data, I am looking for output of ind#001, ind#002, ind#003 and ind#004 (according to observation no) and no ind#001, ind#002, ind#7 and ind#008 (according to sic2 code).

so back to my own data, i need ind#01 till ind#72.

can it be done?

Ksharp
Super User

OK. The following is what you need ?

data have;
input obs sic2 fre;
cards;
1     1     114
2     2     28
3     7     27
4     8     10
;
run;
data want;
set have;
length dummy_obs dummy_sic2 $ 10;
dummy_obs=cats('ind#',put(obs,z2.));
dummy_sic2=cats('ind#',put(sic2,z2.));
run;

Ksharp

消息编辑者为:xia keshan

mei
Calcite | Level 5 mei
Calcite | Level 5

Hi Ksharp,

as you can see from my file industry.sum, there are only 3 columns, sic2, _FREQ_. _N_.

In fact there is no column by the name of observation. How do i add/'input' those observations from 1 to 72 in file industry.sum to become one variable, so that the dummy_obs can be created?

Thanks a lot!!

Ksharp
Super User

OK. How about:

SUM dataset is under c:\

libname x v9 'c:\';
data want;
set x.sum;
length dummy_obs dummy_sic2 $ 10;
obs+1;
dummy_obs=cats('ind#',put(obs,z2.));
dummy_sic2=cats('ind#',put(sic2,z2.));
run;

Ksharp

mei
Calcite | Level 5 mei
Calcite | Level 5

Dear KSharp,

i cannot use the above program as the dummy_sic2 and dummy_obs need to be created in a big data file (spi.spi7) that has 64602 observations. These dummies need to be created by referring from the file industry.sum5.

I am now creating a new file industry.sum5 which contains variable sic2 and obs. as you can see, when sic2=7, obs=3 and dummy_sic2 of value ind#007 and dummy_obs of value ind#003 should be created in data file spi.spi7. That means dummy_obs should be referred to file industry.sum5 to see which sic code it belongs to first then only apply to each company's data in spi.spi7, it will came out ind#003. can it be done?

if not, then i will just leave it to follow sic2 code....

I will attach the file industry.sum5. and will try to attach spi.spi5 which looks too big....

Thanks

Mei

Ksharp
Super User

Do you mean bring obs variable from sum5 table into  spi5  table and make these dummy variables ?

"then only apply to each company's data in spi.spi7"

I do not understand what this means, do you want apply it to all the data in spi5 data ?

libname x v9 'c:\ ' ;
proc sql;
create table want as
 select a.*,cats('ind#',put(b.sic2,z2.)) as dummy_sic2,cats('ind#',put(obs,z2.)) as dummy_obs
  from x.spi5 as a left join x.sum5 as b
   on a.sic2=b.sic2 ;
quit;

Ksharp

mei
Calcite | Level 5 mei
Calcite | Level 5

oh it is almost done. yes i want to create dummy-obs on file want with 1 and 0.

Pls see my prog codes:

proc sql;

create table want1 as

select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs

  from industry.spi5 as a left join industry.sum5 as b

   on a.sic2=b.sic2 ;

quit;

data obs1;

do dummy_obs=01 to 72;

output;

end;

run;

proc sql noprint;

select distinct cats('ind',dummy_obs) into : list separated by ' ' from obs1;

quit;

data obs2;

set work.want1;

array _y &list ;

do over _y;

  _y=ifn(dummy_obs=compress(vname(_y),,'kd'), 1,0);

end;

run;

I know something is not right about assigning industry dummy of dummy_obs at 1-72 of the file obs1. How should i code properly so that i see ind#01-ind#72 or ind01 to ind72 at the final output want1?

I will attach file want1 and obs2

Ksharp
Super User

OK. You almost reached it . It is what you are looking for ?

libname x v9 'c:\ ' ;
proc sql;
create table want1 as
select a.*,cats('ind',put(b.sic2,z2.)) as dummy_sic2,cats('ind',put(obs,z2.)) as dummy_obs
  from x.spi5 as a left join x.sum5 as b
   on a.sic2=b.sic2 ;
quit;

data obs1;
do dummy_obs=1 to 72;
output;
end;
run;


proc sql noprint;
select distinct cats('ind',put(dummy_obs,z2.)) into : list separated by ' ' from obs1;
quit;


data obs2;
set work.want1;
array _y &list ;
do over _y;
  _y=ifn(dummy_obs=vname(_y),1,0);
end;
run;

Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4583 views
  • 4 likes
  • 2 in conversation