dummy variable counting

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

dummy variable counting

hi guys,

I have a simple problem of creating dummy variables. here is my dataset and an example of what i want (dummy column). I have to create a variable describing total months of experience for each worker given by ID, i have the months they showed up for work, for example the first worker (ID=1) came for 21 months in 2 years ( out of 24),so the dummy column just counts from 1 to 21. I want to do the same for the entire dataset. thanks guys !

idyearmonthmonth_ddummy
12009111
12009222
12009443
12009554
12009665
12009776
12009887
12009998
1200910109
12009111110
1201011311
1201021412
1201031513
1201041614
1201051715
1201071916
1201082017
1201092118
12010102219
12010112320
12010122421
242009661
242009772
242009883
242009994
24200910105
24200911116
2420101137
2420102148
442009111
442009222
442009443
442009884
442009995
44200910106
44200911117
4420101138
4420102149
44201031510
44201041611
44201051712
44201061813
44201071914
44201082015
44201092116
442010102217
442010122418

jessica


Accepted Solutions
Solution
‎01-12-2015 01:07 AM
Super Contributor
Posts: 298

Re: dummy variable counting

Posted in reply to Jessica98

Another way:

Your data set is assumed to be SORTED by ID.

data want;

   do dummy = 1 by 1 until(last.id);

      set have;

      by id;

      output;

   end;

run;

View solution in original post


All Replies
Contributor
Posts: 37

Re: dummy variable counting

Posted in reply to Jessica98

Hi

You could use a counter to increment for each record of a particular id

data

have

;

input id year mnth mnth_d;

cards ;

1 2009 1 1

1 2009 2 2

1 2009 4 4

1 2009 5 5

1 2009 6 6

1 2009 7 7

1 2009 8 8

1 2009 9 9

1 2009 10 10

1 2009 11 11

1 2010 1 13

1 2010 2 14

1 2010 3 15

1 2010 4 16

1 2010 5 17

1 2010 7 19

1 2010 8 20

1 2010 9 21

1 2010 10 22

1 2010 11 23

1 2010 12 24

24 2009 6 6

24 2009 7 7

24 2009 8 8

24 2009 9 9

24 2009 10 10

24 2009 11 11

24 2010 1 13

24 2010 2 14

44 2009 1 1

44 2009 2 2

44 2009 4 4

44 2009 8 8

44 2009 9 9

44 2009 10 10

44 2009 11 11

44 2010 1 13

44 2010 2 14

44 2010 3 15

44 2010 4 16

44 2010 5 17

44 2010 6 18

44 2010 7 19

44 2010 8 20

44 2010 9 21

44 2010 10 22

44 2010 12 24

;

run;

proc sort data = have ; by id year mnth; run;

data

want

;

set

have

;

by id year mnth ;

retain dummy 0;

if first.id then dummy = 1 ;

  else dummy = dummy + 1 ;

run;

Thanks

Solution
‎01-12-2015 01:07 AM
Super Contributor
Posts: 298

Re: dummy variable counting

Posted in reply to Jessica98

Another way:

Your data set is assumed to be SORTED by ID.

data want;

   do dummy = 1 by 1 until(last.id);

      set have;

      by id;

      output;

   end;

run;

Respected Advisor
Posts: 3,156

Re: dummy variable counting

Posted in reply to Jessica98

If your data is not SORTED, but CLUSTERED, meaning same ID stays together, then an option added to 's code should do:

data want;

  do dummy = 1 by 1 until(last.id);

  set have;

  by id NOTSORTED;

  output;

  end;

run;

If your data is neither SORTED, nor CLUSTERED, here is a one step Proc SQL approach:

proc sql;

  create table want as

  select *, (select count(*) from have where id=a.id and mnth_d <= a.mnth_d) as dummy

  from have a;

quit;

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 480 views
  • 3 likes
  • 4 in conversation