Hello SAS users !
I got panel dataset described as below
group represents fund family : In my original dataset, there are more than 300 groups for each quarter.
port represents portfolios that are hold by distinct funds within the group.
for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24
for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........
for family 2 fund 1001 holds portfolio 1 and so on...
portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.
family | port | fund | stock |
1 | 1 | 100 | 1 |
1 | 1 | 100 | 2 |
1 | 1 | 100 | 3 |
1 | 1 | 100 | 4 |
1 | 1 | 100 | 6 |
1 | 1 | 100 | 11 |
1 | 1 | 100 | 12 |
1 | 1 | 100 | 13 |
1 | 1 | 100 | 15 |
1 | 1 | 100 | 17 |
1 | 1 | 100 | 18 |
1 | 1 | 100 | 21 |
1 | 1 | 100 | 23 |
1 | 1 | 100 | 24 |
1 | 2 | 200 | 2 |
1 | 2 | 200 | 4 |
1 | 2 | 200 | 5 |
1 | 2 | 200 | 6 |
1 | 2 | 200 | 7 |
1 | 2 | 200 | 9 |
1 | 2 | 200 | 12 |
1 | 2 | 200 | 13 |
1 | 2 | 200 | 15 |
1 | 2 | 200 | 16 |
1 | 2 | 200 | 17 |
1 | 2 | 200 | 18 |
1 | 2 | 200 | 19 |
1 | 2 | 200 | 21 |
1 | 3 | 300 | 1 |
1 | 3 | 300 | 3 |
1 | 3 | 300 | 4 |
1 | 3 | 300 | 5 |
1 | 3 | 300 | 6 |
1 | 3 | 300 | 12 |
1 | 3 | 300 | 13 |
2 | 1 | 1001 | 1 |
2 | 1 | 1001 | 2 |
2 | 1 | 1001 | 3 |
2 | 1 | 1001 | 4 |
2 | 1 | 1001 | 5 |
2 | 1 | 1001 | 6 |
2 | 1 | 1001 | 7 |
2 | 1 | 1001 | 8 |
2 | 2 | 1002 | 1 |
2 | 2 | 1002 | 2 |
2 | 2 | 1002 | 3 |
2 | 2 | 1002 | 4 |
2 | 2 | 1002 | 5 |
2 | 2 | 1002 | 7 |
2 | 2 | 1002 | 9 |
2 | 2 | 1002 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 3 | 1003 | 1 |
2 | 3 | 1003 | 7 |
2 | 3 | 1003 | 9 |
2 | 3 | 1003 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 4 | 1004 | 1 |
2 | 4 | 1004 | 2 |
2 | 4 | 1004 | 3 |
2 | 4 | 1004 | 4 |
2 | 4 | 1004 | 5 |
2 | 4 | 1004 | 6 |
2 | 4 | 1004 | 27 |
2 | 4 | 1004 | 28 |
2 | 4 | 1004 | 29 |
Now I want to change this dataset to one shown below.
for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24
for family 2 : portfolio 1 (so that fund 1001) holds .....
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
1 | 2 | 1 | 1 | 0 | . |
1 | 3 | 1 | 0 | 1 | . |
1 | 4 | 1 | 1 | 1 | . |
1 | 5 | 0 | 1 | 1 | . |
1 | 6 | 1 | 1 | 1 | . |
1 | 7 | 0 | 1 | 0 | . |
1 | 8 | 0 | 0 | 0 | . |
1 | 9 | 0 | 1 | 0 | . |
1 | 10 | 0 | 0 | 0 | . |
1 | 11 | 1 | 0 | 0 | . |
1 | 12 | 1 | 1 | 1 | . |
1 | 13 | 1 | 1 | 1 | . |
1 | 14 | 0 | 0 | 0 | . |
1 | 15 | 1 | 1 | 0 | . |
1 | 16 | 0 | 1 | 0 | . |
1 | 17 | 1 | 1 | 0 | . |
1 | 18 | 1 | 1 | 0 | . |
1 | 19 | 0 | 1 | 0 | . |
1 | 20 | 0 | 0 | 0 | . |
1 | 21 | 1 | 1 | 0 | . |
1 | 22 | 0 | 0 | 0 | . |
1 | 23 | 1 | 0 | 0 | . |
1 | 24 | 1 | 0 | 0 | . |
2 | 1 | 1 | 1 | 1 | 1 |
2 | 2 | 1 | 1 | 0 | 1 |
2 | 3 | 1 | 1 | 0 | 1 |
2 | 4 | 1 | 1 | 0 | 1 |
2 | 5 | 1 | 1 | 0 | 1 |
2 | 6 | 1 | 0 | 0 | 1 |
2 | 7 | 1 | 1 | 1 | 0 |
2 | 8 | 1 | 0 | 0 | 0 |
2 | 9 | 0 | 1 | 1 | 0 |
2 | 10 | 0 | 0 | 0 | 0 |
2 | 11 | 0 | 1 | 1 | 0 |
2 | 12 | 0 | 1 | 1 | 0 |
2 | 27 | 0 | 1 | 1 | 1 |
2 | 28 | 0 | 1 | 1 | 1 |
2 | 29 | 0 | 1 | 1 | 1 |
I want to get converted dataset EFFICIENTLY!
because the original dataset is so large!
(over 10 million obs... more than 300 families with average 40 funds for a family )
how can I do this?
Please help..
data have;
input family port fund stock;
datalines;
1 1 100 1
1 1 100 2
1 1 100 3
1 1 100 4
1 1 100 6
1 1 100 11
1 1 100 12
1 1 100 13
1 1 100 15
1 1 100 17
1 1 100 18
1 1 100 21
1 1 100 23
1 1 100 24
1 2 200 2
1 2 200 4
1 2 200 5
1 2 200 6
1 2 200 7
1 2 200 9
1 2 200 12
1 2 200 13
1 2 200 15
1 2 200 16
1 2 200 17
1 2 200 18
1 2 200 19
1 2 200 21
1 3 300 1
1 3 300 3
1 3 300 4
1 3 300 5
1 3 300 6
1 3 300 12
1 3 300 13
2 1 1001 1
2 1 1001 2
2 1 1001 3
2 1 1001 4
2 1 1001 5
2 1 1001 6
2 1 1001 7
2 1 1001 8
2 2 1002 1
2 2 1002 2
2 2 1002 3
2 2 1002 4
2 2 1002 5
2 2 1002 7
2 2 1002 9
2 2 1002 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 3 1003 1
2 3 1003 7
2 3 1003 9
2 3 1003 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 4 1004 1
2 4 1004 2
2 4 1004 3
2 4 1004 4
2 4 1004 5
2 4 1004 6
2 4 1004 27
2 4 1004 28
2 4 1004 29
;
run;
Ah I see, not sorted. This works:
proc sort data=have; by family stock; run; data want (drop=port fund); set have; array p{4} 8; retain p:; by family stock; if first.stock then call missing(of p{*}); p{port}=1; if last.stock then output; run;
Perhaps use the macro here
Hello SAS users !
I got panel dataset described as below
group represents fund family : In my original dataset, there are more than 300 groups for each quarter.
port represents portfolios that are hold by distinct funds within the group.
for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24
for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........
for family 2 fund 1001 holds portfolio 1 and so on...
portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.
family | port | fund | stock |
1 | 1 | 100 | 1 |
1 | 1 | 100 | 2 |
1 | 1 | 100 | 3 |
1 | 1 | 100 | 4 |
1 | 1 | 100 | 6 |
1 | 1 | 100 | 11 |
1 | 1 | 100 | 12 |
1 | 1 | 100 | 13 |
1 | 1 | 100 | 15 |
1 | 1 | 100 | 17 |
1 | 1 | 100 | 18 |
1 | 1 | 100 | 21 |
1 | 1 | 100 | 23 |
1 | 1 | 100 | 24 |
1 | 2 | 200 | 2 |
1 | 2 | 200 | 4 |
1 | 2 | 200 | 5 |
1 | 2 | 200 | 6 |
1 | 2 | 200 | 7 |
1 | 2 | 200 | 9 |
1 | 2 | 200 | 12 |
1 | 2 | 200 | 13 |
1 | 2 | 200 | 15 |
1 | 2 | 200 | 16 |
1 | 2 | 200 | 17 |
1 | 2 | 200 | 18 |
1 | 2 | 200 | 19 |
1 | 2 | 200 | 21 |
1 | 3 | 300 | 1 |
1 | 3 | 300 | 3 |
1 | 3 | 300 | 4 |
1 | 3 | 300 | 5 |
1 | 3 | 300 | 6 |
1 | 3 | 300 | 12 |
1 | 3 | 300 | 13 |
2 | 1 | 1001 | 1 |
2 | 1 | 1001 | 2 |
2 | 1 | 1001 | 3 |
2 | 1 | 1001 | 4 |
2 | 1 | 1001 | 5 |
2 | 1 | 1001 | 6 |
2 | 1 | 1001 | 7 |
2 | 1 | 1001 | 8 |
2 | 2 | 1002 | 1 |
2 | 2 | 1002 | 2 |
2 | 2 | 1002 | 3 |
2 | 2 | 1002 | 4 |
2 | 2 | 1002 | 5 |
2 | 2 | 1002 | 7 |
2 | 2 | 1002 | 9 |
2 | 2 | 1002 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 3 | 1003 | 1 |
2 | 3 | 1003 | 7 |
2 | 3 | 1003 | 9 |
2 | 3 | 1003 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 4 | 1004 | 1 |
2 | 4 | 1004 | 2 |
2 | 4 | 1004 | 3 |
2 | 4 | 1004 | 4 |
2 | 4 | 1004 | 5 |
2 | 4 | 1004 | 6 |
2 | 4 | 1004 | 27 |
2 | 4 | 1004 | 28 |
2 | 4 | 1004 | 29 |
Now I want to change this dataset to one shown below.
for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24
for family 2 : portfolio 1 (so that fund 1001) holds .....
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
1 | 2 | 1 | 1 | 0 | . |
1 | 3 | 1 | 0 | 1 | . |
1 | 4 | 1 | 1 | 1 | . |
1 | 5 | 0 | 1 | 1 | . |
1 | 6 | 1 | 1 | 1 | . |
1 | 7 | 0 | 1 | 0 | . |
1 | 8 | 0 | 0 | 0 | . |
1 | 9 | 0 | 1 | 0 | . |
1 | 10 | 0 | 0 | 0 | . |
1 | 11 | 1 | 0 | 0 | . |
1 | 12 | 1 | 1 | 1 | . |
1 | 13 | 1 | 1 | 1 | . |
1 | 14 | 0 | 0 | 0 | . |
1 | 15 | 1 | 1 | 0 | . |
1 | 16 | 0 | 1 | 0 | . |
1 | 17 | 1 | 1 | 0 | . |
1 | 18 | 1 | 1 | 0 | . |
1 | 19 | 0 | 1 | 0 | . |
1 | 20 | 0 | 0 | 0 | . |
1 | 21 | 1 | 1 | 0 | . |
1 | 22 | 0 | 0 | 0 | . |
1 | 23 | 1 | 0 | 0 | . |
1 | 24 | 1 | 0 | 0 | . |
2 | 1 | 1 | 1 | 1 | 1 |
2 | 2 | 1 | 1 | 0 | 1 |
2 | 3 | 1 | 1 | 0 | 1 |
2 | 4 | 1 | 1 | 0 | 1 |
2 | 5 | 1 | 1 | 0 | 1 |
2 | 6 | 1 | 0 | 0 | 1 |
2 | 7 | 1 | 1 | 1 | 0 |
2 | 8 | 1 | 0 | 0 | 0 |
2 | 9 | 0 | 1 | 1 | 0 |
2 | 10 | 0 | 0 | 0 | 0 |
2 | 11 | 0 | 1 | 1 | 0 |
2 | 12 | 0 | 1 | 1 | 0 |
2 | 27 | 0 | 1 | 1 | 1 |
2 | 28 | 0 | 1 | 1 | 1 |
2 | 29 | 0 | 1 | 1 | 1 |
I want to get converted dataset EFFICIENTLY!
because the original dataset is so large!
(over 10 million obs... more than 300 families with average 40 funds for a family )
how can I do this?
Please help..
data have;
input family port fund stock;
datalines;
1 1 100 1
1 1 100 2
1 1 100 3
1 1 100 4
1 1 100 6
1 1 100 11
1 1 100 12
1 1 100 13
1 1 100 15
1 1 100 17
1 1 100 18
1 1 100 21
1 1 100 23
1 1 100 24
1 2 200 2
1 2 200 4
1 2 200 5
1 2 200 6
1 2 200 7
1 2 200 9
1 2 200 12
1 2 200 13
1 2 200 15
1 2 200 16
1 2 200 17
1 2 200 18
1 2 200 19
1 2 200 21
1 3 300 1
1 3 300 3
1 3 300 4
1 3 300 5
1 3 300 6
1 3 300 12
1 3 300 13
2 1 1001 1
2 1 1001 2
2 1 1001 3
2 1 1001 4
2 1 1001 5
2 1 1001 6
2 1 1001 7
2 1 1001 8
2 2 1002 1
2 2 1002 2
2 2 1002 3
2 2 1002 4
2 2 1002 5
2 2 1002 7
2 2 1002 9
2 2 1002 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 3 1003 1
2 3 1003 7
2 3 1003 9
2 3 1003 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 4 1004 1
2 4 1004 2
2 4 1004 3
2 4 1004 4
2 4 1004 5
2 4 1004 6
2 4 1004 27
2 4 1004 28
2 4 1004 29
;
run;
A datastep with an array might be the simplest:
data want; set have; array port{4} 8.; retain port:; by family; if first.family then call missing (of port{*}); if ... then ...; if last.family then output; run;
I am afraid this is incomplete because I have no idea what the 1's and 0's mean, or what the logic here is. For instance, in row 1 of the required output, port3=1, I cannot see any reason for this?
I am afraid that does not make sense to me, I do not work in your industry or with your data. The first row of your required output shows:
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
And the first row of the test data - the only one relating to the above as far as I can see:
input family port fund stock;
datalines;
1 1 100 1
There is nothing there which would indicate that port 2 should be 0, port3 should be 1, nor that port4 should be missing?
with all do respect don't you have a dataset 'have'
row=72? (provided code)
in row=29
there are
family port fund stock
1 3 300 1
right? This means that
family=1
port=3
stock=1
so that I want to have
family stock port1 port2 port3
1 1 1 0 1
zero is because for family=1 & port=2, they have no stock1
(there is no such row in dataset 'have' with value
family=1 & port=2 & stock=1)
Ah I see, not sorted. This works:
proc sort data=have; by family stock; run; data want (drop=port fund); set have; array p{4} 8; retain p:; by family stock; if first.stock then call missing(of p{*}); p{port}=1; if last.stock then output; run;
I think for that you would need an additional few lines:
proc sort data=have; by family stock; run; data want (drop=port fund i on); set have; array p{4} 8; retain p:; by family stock; if first.stock then call missing(of p{*}); p{port}=1; if last.stock then do;
on=0;
do i=4 to 1 by -1;
if p{i}=1 then on=1;
if on=1 and p{i}=. then p{i}=0;
end;
output;
end; run;
I haven't tested, but something like that should work, so basically loop back across the array from end to start, when hit the first 1 set a variable to on, then any of the array before that if missing becomes 0 rather than .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.