DATA Step, Macro, Functions and more

convert long type panel data to wide type with dummy variables

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

convert long type panel data to wide type with dummy variables

[ Edited ]

Hello SAS users !

 

I got panel dataset described as below

 

group represents fund family : In my original dataset, there are more than 300 groups for each quarter.

 

port represents portfolios that are hold by distinct funds within the group.

 

for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24

for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........

for family 2 fund 1001 holds portfolio 1 and so on...

portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.

 

familyportfundstock
111001
111002
111003
111004
111006
1110011
1110012
1110013
1110015
1110017
1110018
1110021
1110023
1110024
122002
122004
122005
122006
122007
122009
1220012
1220013
1220015
1220016
1220017
1220018
1220019
1220021
133001
133003
133004
133005
133006
1330012
1330013
2110011
2110012
2110013
2110014
2110015
2110016
2110017
2110018
2210021
2210022
2210023
2210024
2210025
2210027
2210029
22100211
23100312
23100327
23100328
23100329
2310031
2310037
2310039
23100311
23100312
23100327
23100328
23100329
2410041
2410042
2410043
2410044
2410045
2410046
24100427
24100428
24100429

 

 

 

Now I want to change this dataset to one shown below.

 

for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24

for family 2 : portfolio 1 (so that fund 1001) holds .....

 

 

familystockport1port2port3port4
11101.
12110.
13101.
14111.
15011.
16111.
17010.
18000.
19010.
110000.
111100.
112111.
113111.
114000.
115110.
116010.
117110.
118110.
119010.
120000.
121110.
122000.
123100.
124100.
211111
221101
231101
241101
251101
261001
271110
281000
290110
2100000
2110110
2120110
2270111
2280111
2290111

 

 

I want to get converted dataset EFFICIENTLY!

because the original dataset is so large!

(over 10 million obs... more than 300 families with average 40 funds for a family )

 


how can I do this?

 

Please help..

 

 

 

 

 

 

 

data have;
input family port fund stock;
datalines;
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29
;
run;

 


Accepted Solutions
Solution
a month ago
Super User
Super User
Posts: 9,216

Re: convert long type panel to wide type panel data dummy variables in it.

Ah I see, not sorted.  This works:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then output;
run;

View solution in original post


All Replies
PROC Star
Posts: 1,190

Re: convert long type panel data to wide type with dummy variables

Perhaps use the macro here

Contributor
Posts: 28

convert long type panel to wide type panel data dummy variables in it.

Hello SAS users !

 

I got panel dataset described as below

 

group represents fund family : In my original dataset, there are more than 300 groups for each quarter.

 

port represents portfolios that are hold by distinct funds within the group.

 

for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24

for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........

for family 2 fund 1001 holds portfolio 1 and so on...

portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.

 

familyportfundstock
111001
111002
111003
111004
111006
1110011
1110012
1110013
1110015
1110017
1110018
1110021
1110023
1110024
122002
122004
122005
122006
122007
122009
1220012
1220013
1220015
1220016
1220017
1220018
1220019
1220021
133001
133003
133004
133005
133006
1330012
1330013
2110011
2110012
2110013
2110014
2110015
2110016
2110017
2110018
2210021
2210022
2210023
2210024
2210025
2210027
2210029
22100211
23100312
23100327
23100328
23100329
2310031
2310037
2310039
23100311
23100312
23100327
23100328
23100329
2410041
2410042
2410043
2410044
2410045
2410046
24100427
24100428
24100429

 

 

 

Now I want to change this dataset to one shown below.

 

for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24

for family 2 : portfolio 1 (so that fund 1001) holds .....

 

 

familystockport1port2port3port4
11101.
12110.
13101.
14111.
15011.
16111.
17010.
18000.
19010.
110000.
111100.
112111.
113111.
114000.
115110.
116010.
117110.
118110.
119010.
120000.
121110.
122000.
123100.
124100.
211111
221101
231101
241101
251101
261001
271110
281000
290110
2100000
2110110
2120110
2270111
2280111
2290111

 

 

I want to get converted dataset EFFICIENTLY!

because the original dataset is so large!

(over 10 million obs... more than 300 families with average 40 funds for a family )

 


how can I do this?

 

Please help..

 

 

 

 

 

 

 

data have;
input family port fund stock;
datalines;
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29
;
run;

 

Super User
Super User
Posts: 9,216

Re: convert long type panel to wide type panel data dummy variables in it.

A datastep with an array might be the simplest:

data want;
  set have;
  array port{4} 8.;
  retain port:;
  by family;
  if first.family then call missing (of port{*});
  if ... then ...;
  if last.family then output;
run;

I am afraid this is incomplete because I have no idea what the 1's and 0's mean, or what the logic here is.  For instance, in row 1 of the required output, port3=1, I cannot see any reason for this?

 

Contributor
Posts: 28

Re: convert long type panel to wide type panel data dummy variables in it.

it's just a simple dummy variable

1 = if port j of family i holds stock k
0 = if port j of family i does not hold stock k
Super User
Super User
Posts: 9,216

Re: convert long type panel to wide type panel data dummy variables in it.

I am afraid that does not make sense to me, I do not work in your industry or with your data.  The first row of your required output shows:

family stock port1 port2 port3 port4
1 1 1 0 1 .

 

And the first row of the test data - the only one relating to the above as far as I can see:

input family port fund stock;
datalines;
1	1	100	1

There is nothing there which would indicate that port 2 should be 0, port3 should be 1, nor that port4 should be missing?

Contributor
Posts: 28

Re: convert long type panel to wide type panel data dummy variables in it.

[ Edited ]

with all do respect don't you have a dataset 'have'
row=72? (provided code)

in row=29
there are
family port fund stock
1 3 300 1


right? This means that
family=1
port=3
stock=1

so that I want to have
family stock port1 port2 port3
1 1 1 0 1


zero is because for family=1 & port=2, they have no stock1
(there is no such row in dataset 'have' with value
family=1 & port=2 & stock=1)

Solution
a month ago
Super User
Super User
Posts: 9,216

Re: convert long type panel to wide type panel data dummy variables in it.

Ah I see, not sorted.  This works:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then output;
run;

Contributor
Posts: 28

Re: convert long type panel to wide type panel data dummy variables in it.

This works really well Thank you so much!!!!

But can I ask one more?
When I run your code, the outcome gives me no zeros but missing. Can I earn zeros for them and missing just for family=1 & port=4 ?
Super User
Super User
Posts: 9,216

Re: convert long type panel to wide type panel data dummy variables in it.

I think for that you would need an additional few lines:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund i on);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then do;
on=0;
do i=4 to 1 by -1;
if p{i}=1 then on=1;
if on=1 and p{i}=. then p{i}=0;
end;
output;
end; run;

I haven't tested, but something like that should work, so basically loop back across the array from end to start, when hit the first 1 set a variable to on, then any of the array before that if missing becomes 0 rather than . 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 181 views
  • 1 like
  • 3 in conversation