DATA Step, Macro, Functions and more

Creating a dummy variable for grouped data

Reply
Occasional Contributor
Posts: 5

Creating a dummy variable for grouped data

Hi

 

I have a dataset that looks like this:

ID

IN

Type

Sum

1

30000

1

45000

1

15000

3

45000

2

4000

3

4000

3

500

1

10500

3

10000

8

10500

4

3000

1

17300

4

6000

3

17300

4

7000

3

17300

4

300

6

17300

5

40000

8

40000

6

2000

1

9400

6

3200

3

9400

6

4200

8

9400

 

and I would like to create a dummy variable for the whole group of observations so that if type=8 then all observations with the same ID have the same dummy. It would look something like this:

 

ID

IN

Type

Sum

D

1

30000

1

45000

0

1

15000

3

45000

0

2

4000

3

4000

0

3

500

1

10500

1

3

10000

8

10500

1

4

3000

1

17300

0

4

6000

3

17300

0

4

7000

3

17300

0

4

300

6

17300

0

5

40000

8

40000

1

6

2000

1

9400

1

6

3200

3

9400

1

6

4200

8

9400

1

 

Thank you in advance!

Super User
Super User
Posts: 7,997

Re: Creating a dummy variable for grouped data

If you provide test data as a datastep (so I don't have to type all that in) then I can provide working code, as it is this is guess not tested:

proc sql;
  create table WANT as
  select  A.*,
            case when B.D=1 then 1 else 0 end as D
  from    HAVE A
  left join (select distinct ID,1 from HAVE where ID=A.ID and TYPE=8) B
  on      A.ID=B.ID;
quit;
Occasional Contributor
Posts: 5

Re: Creating a dummy variable for grouped data

Sure

 

data have;
 input ID IN Type Sum;
 datalines;
1 30000 1 45000
1 15000 3 45000
2 4000 3 4000
3 500 1 10500
3 10000 8 10500
4 3000 1 17300
4 6000 3 17300
4 7000 3 17300
4 300 6 17300
5 40000 8 40000
6 2000 1 9400
6 3200 3 9400
6 4200 8 9400
;
run;

Super User
Super User
Posts: 7,997

Re: Creating a dummy variable for grouped data

So yes:

data have;
 input ID IN Type Sum;
 datalines;
1 30000 1 45000
1 15000 3 45000
2 4000 3 4000
3 500 1 10500
3 10000 8 10500
4 3000 1 17300
4 6000 3 17300
4 7000 3 17300
4 300 6 17300
5 40000 8 40000
6 2000 1 9400
6 3200 3 9400
6 4200 8 9400
;
run;

proc sql;
  create table WANT as
  select  A.*,
          case when B.D=1 then 1 else 0 end as D
  from    HAVE A
  left join (select distinct ID,1 as D from HAVE where TYPE=8) B
  on      A.ID=B.ID;
quit;
Trusted Advisor
Posts: 1,118

Re: Creating a dummy variable for grouped data

Hi @AnnaKM,

 

If your input dataset is sorted by ID, you could also use a data step (DOW loop technique) to achieve the result:

data want;
do until(last.ID);
  set have;
  by ID;
  if first.ID then D=0;
  if type=8 then D=1;
end;
do until(last.ID);
  set have;
  by ID;
  output;
end;
run;
Occasional Contributor
Posts: 5

Re: Creating a dummy variable for grouped data

Both solutions worked really well. Thank you!

Super User
Posts: 10,046

Re: Creating a dummy variable for grouped data

data have;
 input ID IN Type Sum;
 datalines;
1 30000 1 45000
1 15000 3 45000
2 4000 3 4000
3 500 1 10500
3 10000 8 10500
4 3000 1 17300
4 6000 3 17300
4 7000 3 17300
4 300 6 17300
5 40000 8 40000
6 2000 1 9400
6 3200 3 9400
6 4200 8 9400
;
run;
proc sql;
 create table want as
  select *,(sum(Type=8) gt 0) as D
   from have
    group by id;
quit;

Ask a Question
Discussion stats
  • 6 replies
  • 256 views
  • 2 likes
  • 4 in conversation