Solved: convert long type panel data to wide type with dummy variables

jkim197 · Posted 04-25-2018 02:38 AM

Hello SAS users !

I got panel dataset described as below

group represents fund family : In my original dataset, there are more than 300 groups for each quarter.

port represents portfolios that are hold by distinct funds within the group.

for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24

for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........

for family 2 fund 1001 holds portfolio 1 and so on...

portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.

family	port	fund	stock
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29

Now I want to change this dataset to one shown below.

for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24

for family 2 : portfolio 1 (so that fund 1001) holds .....

family	stock	port1	port2	port3	port4
1	1	1	0	1	.
1	2	1	1	0	.
1	3	1	0	1	.
1	4	1	1	1	.
1	5	0	1	1	.
1	6	1	1	1	.
1	7	0	1	0	.
1	8	0	0	0	.
1	9	0	1	0	.
1	10	0	0	0	.
1	11	1	0	0	.
1	12	1	1	1	.
1	13	1	1	1	.
1	14	0	0	0	.
1	15	1	1	0	.
1	16	0	1	0	.
1	17	1	1	0	.
1	18	1	1	0	.
1	19	0	1	0	.
1	20	0	0	0	.
1	21	1	1	0	.
1	22	0	0	0	.
1	23	1	0	0	.
1	24	1	0	0	.
2	1	1	1	1	1
2	2	1	1	0	1
2	3	1	1	0	1
2	4	1	1	0	1
2	5	1	1	0	1
2	6	1	0	0	1
2	7	1	1	1	0
2	8	1	0	0	0
2	9	0	1	1	0
2	10	0	0	0	0
2	11	0	1	1	0
2	12	0	1	1	0
2	27	0	1	1	1
2	28	0	1	1	1
2	29	0	1	1	1

I want to get converted dataset EFFICIENTLY!

because the original dataset is so large!

(over 10 million obs... more than 300 families with average 40 funds for a family )

how can I do this?

Please help..

data have;
input family port fund stock;
datalines;
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29
;
run;

RW9 · Posted 04-25-2018 09:37 AM

Ah I see, not sorted. This works:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then output;
run;

View solution in original post

PeterClemmensen · Posted 04-25-2018 03:01 AM

Perhaps use the macro here

The DATA to DATA Step Macro
Blog: SASnrd

jkim197 · Posted 04-25-2018 05:20 AM

Hello SAS users !

I got panel dataset described as below

group represents fund family : In my original dataset, there are more than 300 groups for each quarter.

port represents portfolios that are hold by distinct funds within the group.

for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24

for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........

for family 2 fund 1001 holds portfolio 1 and so on...

portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.

family	port	fund	stock
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29

Now I want to change this dataset to one shown below.

for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24

for family 2 : portfolio 1 (so that fund 1001) holds .....

family	stock	port1	port2	port3	port4
1	1	1	0	1	.
1	2	1	1	0	.
1	3	1	0	1	.
1	4	1	1	1	.
1	5	0	1	1	.
1	6	1	1	1	.
1	7	0	1	0	.
1	8	0	0	0	.
1	9	0	1	0	.
1	10	0	0	0	.
1	11	1	0	0	.
1	12	1	1	1	.
1	13	1	1	1	.
1	14	0	0	0	.
1	15	1	1	0	.
1	16	0	1	0	.
1	17	1	1	0	.
1	18	1	1	0	.
1	19	0	1	0	.
1	20	0	0	0	.
1	21	1	1	0	.
1	22	0	0	0	.
1	23	1	0	0	.
1	24	1	0	0	.
2	1	1	1	1	1
2	2	1	1	0	1
2	3	1	1	0	1
2	4	1	1	0	1
2	5	1	1	0	1
2	6	1	0	0	1
2	7	1	1	1	0
2	8	1	0	0	0
2	9	0	1	1	0
2	10	0	0	0	0
2	11	0	1	1	0
2	12	0	1	1	0
2	27	0	1	1	1
2	28	0	1	1	1
2	29	0	1	1	1

I want to get converted dataset EFFICIENTLY!

because the original dataset is so large!

(over 10 million obs... more than 300 families with average 40 funds for a family )

how can I do this?

Please help..

data have;
input family port fund stock;
datalines;
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29
;
run;

RW9 · Posted 04-25-2018 05:58 AM

A datastep with an array might be the simplest:

data want;
  set have;
  array port{4} 8.;
  retain port:;
  by family;
  if first.family then call missing (of port{*});
  if ... then ...;
  if last.family then output;
run;

I am afraid this is incomplete because I have no idea what the 1's and 0's mean, or what the logic here is. For instance, in row 1 of the required output, port3=1, I cannot see any reason for this?

jkim197 · Posted 04-25-2018 06:16 AM

it's just a simple dummy variable

1 = if port j of family i holds stock k
0 = if port j of family i does not hold stock k

RW9 · Posted 04-25-2018 06:25 AM

I am afraid that does not make sense to me, I do not work in your industry or with your data. The first row of your required output shows:

family	stock	port1	port2	port3	port4
1	1	1	0	1	.

And the first row of the test data - the only one relating to the above as far as I can see:

input family port fund stock;
datalines;
1	1	100	1

There is nothing there which would indicate that port 2 should be 0, port3 should be 1, nor that port4 should be missing?

jkim197 · Posted 04-25-2018 07:13 AM

with all do respect don't you have a dataset 'have'
row=72? (provided code)

in row=29
there are
family port fund stock
1 3 300 1

right? This means that
family=1
port=3
stock=1

so that I want to have
family stock port1 port2 port3
1 1 1 0 1

zero is because for family=1 & port=2, they have no stock1
(there is no such row in dataset 'have' with value
family=1 & port=2 & stock=1)

RW9 · Posted 04-25-2018 09:37 AM

Ah I see, not sorted. This works:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then output;
run;

jkim197 · Posted 04-25-2018 12:08 PM

This works really well Thank you so much!!!!

But can I ask one more?
When I run your code, the outcome gives me no zeros but missing. Can I earn zeros for them and missing just for family=1 & port=4 ?

RW9 · Posted 04-26-2018 04:27 AM

I think for that you would need an additional few lines:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund i on);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then do;
    on=0;
    do i=4 to 1 by -1;
      if p{i}=1 then on=1;
      if on=1 and p{i}=. then p{i}=0;
    end;
    output;
  end;
run;

I haven't tested, but something like that should work, so basically loop back across the array from end to start, when hit the first 1 set a variable to on, then any of the array before that if missing becomes 0 rather than .

convert long type panel data to wide type with dummy variables

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel data to wide type with dummy variables

convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Re: convert long type panel to wide type panel data dummy variables in it.

Classroom Training Available!