Hello SAS users !
I got panel dataset described as below
group represents fund family : In my original dataset, there are more than 300 groups for each quarter.
port represents portfolios that are hold by distinct funds within the group.
for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24
for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........
for family 2 fund 1001 holds portfolio 1 and so on...
portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.
| family | port | fund | stock |
| 1 | 1 | 100 | 1 |
| 1 | 1 | 100 | 2 |
| 1 | 1 | 100 | 3 |
| 1 | 1 | 100 | 4 |
| 1 | 1 | 100 | 6 |
| 1 | 1 | 100 | 11 |
| 1 | 1 | 100 | 12 |
| 1 | 1 | 100 | 13 |
| 1 | 1 | 100 | 15 |
| 1 | 1 | 100 | 17 |
| 1 | 1 | 100 | 18 |
| 1 | 1 | 100 | 21 |
| 1 | 1 | 100 | 23 |
| 1 | 1 | 100 | 24 |
| 1 | 2 | 200 | 2 |
| 1 | 2 | 200 | 4 |
| 1 | 2 | 200 | 5 |
| 1 | 2 | 200 | 6 |
| 1 | 2 | 200 | 7 |
| 1 | 2 | 200 | 9 |
| 1 | 2 | 200 | 12 |
| 1 | 2 | 200 | 13 |
| 1 | 2 | 200 | 15 |
| 1 | 2 | 200 | 16 |
| 1 | 2 | 200 | 17 |
| 1 | 2 | 200 | 18 |
| 1 | 2 | 200 | 19 |
| 1 | 2 | 200 | 21 |
| 1 | 3 | 300 | 1 |
| 1 | 3 | 300 | 3 |
| 1 | 3 | 300 | 4 |
| 1 | 3 | 300 | 5 |
| 1 | 3 | 300 | 6 |
| 1 | 3 | 300 | 12 |
| 1 | 3 | 300 | 13 |
| 2 | 1 | 1001 | 1 |
| 2 | 1 | 1001 | 2 |
| 2 | 1 | 1001 | 3 |
| 2 | 1 | 1001 | 4 |
| 2 | 1 | 1001 | 5 |
| 2 | 1 | 1001 | 6 |
| 2 | 1 | 1001 | 7 |
| 2 | 1 | 1001 | 8 |
| 2 | 2 | 1002 | 1 |
| 2 | 2 | 1002 | 2 |
| 2 | 2 | 1002 | 3 |
| 2 | 2 | 1002 | 4 |
| 2 | 2 | 1002 | 5 |
| 2 | 2 | 1002 | 7 |
| 2 | 2 | 1002 | 9 |
| 2 | 2 | 1002 | 11 |
| 2 | 3 | 1003 | 12 |
| 2 | 3 | 1003 | 27 |
| 2 | 3 | 1003 | 28 |
| 2 | 3 | 1003 | 29 |
| 2 | 3 | 1003 | 1 |
| 2 | 3 | 1003 | 7 |
| 2 | 3 | 1003 | 9 |
| 2 | 3 | 1003 | 11 |
| 2 | 3 | 1003 | 12 |
| 2 | 3 | 1003 | 27 |
| 2 | 3 | 1003 | 28 |
| 2 | 3 | 1003 | 29 |
| 2 | 4 | 1004 | 1 |
| 2 | 4 | 1004 | 2 |
| 2 | 4 | 1004 | 3 |
| 2 | 4 | 1004 | 4 |
| 2 | 4 | 1004 | 5 |
| 2 | 4 | 1004 | 6 |
| 2 | 4 | 1004 | 27 |
| 2 | 4 | 1004 | 28 |
| 2 | 4 | 1004 | 29 |
Now I want to change this dataset to one shown below.
for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24
for family 2 : portfolio 1 (so that fund 1001) holds .....
| family | stock | port1 | port2 | port3 | port4 |
| 1 | 1 | 1 | 0 | 1 | . |
| 1 | 2 | 1 | 1 | 0 | . |
| 1 | 3 | 1 | 0 | 1 | . |
| 1 | 4 | 1 | 1 | 1 | . |
| 1 | 5 | 0 | 1 | 1 | . |
| 1 | 6 | 1 | 1 | 1 | . |
| 1 | 7 | 0 | 1 | 0 | . |
| 1 | 8 | 0 | 0 | 0 | . |
| 1 | 9 | 0 | 1 | 0 | . |
| 1 | 10 | 0 | 0 | 0 | . |
| 1 | 11 | 1 | 0 | 0 | . |
| 1 | 12 | 1 | 1 | 1 | . |
| 1 | 13 | 1 | 1 | 1 | . |
| 1 | 14 | 0 | 0 | 0 | . |
| 1 | 15 | 1 | 1 | 0 | . |
| 1 | 16 | 0 | 1 | 0 | . |
| 1 | 17 | 1 | 1 | 0 | . |
| 1 | 18 | 1 | 1 | 0 | . |
| 1 | 19 | 0 | 1 | 0 | . |
| 1 | 20 | 0 | 0 | 0 | . |
| 1 | 21 | 1 | 1 | 0 | . |
| 1 | 22 | 0 | 0 | 0 | . |
| 1 | 23 | 1 | 0 | 0 | . |
| 1 | 24 | 1 | 0 | 0 | . |
| 2 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 1 | 1 | 0 | 1 |
| 2 | 3 | 1 | 1 | 0 | 1 |
| 2 | 4 | 1 | 1 | 0 | 1 |
| 2 | 5 | 1 | 1 | 0 | 1 |
| 2 | 6 | 1 | 0 | 0 | 1 |
| 2 | 7 | 1 | 1 | 1 | 0 |
| 2 | 8 | 1 | 0 | 0 | 0 |
| 2 | 9 | 0 | 1 | 1 | 0 |
| 2 | 10 | 0 | 0 | 0 | 0 |
| 2 | 11 | 0 | 1 | 1 | 0 |
| 2 | 12 | 0 | 1 | 1 | 0 |
| 2 | 27 | 0 | 1 | 1 | 1 |
| 2 | 28 | 0 | 1 | 1 | 1 |
| 2 | 29 | 0 | 1 | 1 | 1 |
I want to get converted dataset EFFICIENTLY!
because the original dataset is so large!
(over 10 million obs... more than 300 families with average 40 funds for a family )
how can I do this?
Please help..
data have;
input family port fund stock;
datalines;
1 1 100 1
1 1 100 2
1 1 100 3
1 1 100 4
1 1 100 6
1 1 100 11
1 1 100 12
1 1 100 13
1 1 100 15
1 1 100 17
1 1 100 18
1 1 100 21
1 1 100 23
1 1 100 24
1 2 200 2
1 2 200 4
1 2 200 5
1 2 200 6
1 2 200 7
1 2 200 9
1 2 200 12
1 2 200 13
1 2 200 15
1 2 200 16
1 2 200 17
1 2 200 18
1 2 200 19
1 2 200 21
1 3 300 1
1 3 300 3
1 3 300 4
1 3 300 5
1 3 300 6
1 3 300 12
1 3 300 13
2 1 1001 1
2 1 1001 2
2 1 1001 3
2 1 1001 4
2 1 1001 5
2 1 1001 6
2 1 1001 7
2 1 1001 8
2 2 1002 1
2 2 1002 2
2 2 1002 3
2 2 1002 4
2 2 1002 5
2 2 1002 7
2 2 1002 9
2 2 1002 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 3 1003 1
2 3 1003 7
2 3 1003 9
2 3 1003 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 4 1004 1
2 4 1004 2
2 4 1004 3
2 4 1004 4
2 4 1004 5
2 4 1004 6
2 4 1004 27
2 4 1004 28
2 4 1004 29
;
run;
Ah I see, not sorted. This works:
proc sort data=have;
by family stock;
run;
data want (drop=port fund);
set have;
array p{4} 8;
retain p:;
by family stock;
if first.stock then call missing(of p{*});
p{port}=1;
if last.stock then output;
run;
Perhaps use the macro here
Hello SAS users !
I got panel dataset described as below
group represents fund family : In my original dataset, there are more than 300 groups for each quarter.
port represents portfolios that are hold by distinct funds within the group.
for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24
for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........
for family 2 fund 1001 holds portfolio 1 and so on...
portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.
| family | port | fund | stock |
| 1 | 1 | 100 | 1 |
| 1 | 1 | 100 | 2 |
| 1 | 1 | 100 | 3 |
| 1 | 1 | 100 | 4 |
| 1 | 1 | 100 | 6 |
| 1 | 1 | 100 | 11 |
| 1 | 1 | 100 | 12 |
| 1 | 1 | 100 | 13 |
| 1 | 1 | 100 | 15 |
| 1 | 1 | 100 | 17 |
| 1 | 1 | 100 | 18 |
| 1 | 1 | 100 | 21 |
| 1 | 1 | 100 | 23 |
| 1 | 1 | 100 | 24 |
| 1 | 2 | 200 | 2 |
| 1 | 2 | 200 | 4 |
| 1 | 2 | 200 | 5 |
| 1 | 2 | 200 | 6 |
| 1 | 2 | 200 | 7 |
| 1 | 2 | 200 | 9 |
| 1 | 2 | 200 | 12 |
| 1 | 2 | 200 | 13 |
| 1 | 2 | 200 | 15 |
| 1 | 2 | 200 | 16 |
| 1 | 2 | 200 | 17 |
| 1 | 2 | 200 | 18 |
| 1 | 2 | 200 | 19 |
| 1 | 2 | 200 | 21 |
| 1 | 3 | 300 | 1 |
| 1 | 3 | 300 | 3 |
| 1 | 3 | 300 | 4 |
| 1 | 3 | 300 | 5 |
| 1 | 3 | 300 | 6 |
| 1 | 3 | 300 | 12 |
| 1 | 3 | 300 | 13 |
| 2 | 1 | 1001 | 1 |
| 2 | 1 | 1001 | 2 |
| 2 | 1 | 1001 | 3 |
| 2 | 1 | 1001 | 4 |
| 2 | 1 | 1001 | 5 |
| 2 | 1 | 1001 | 6 |
| 2 | 1 | 1001 | 7 |
| 2 | 1 | 1001 | 8 |
| 2 | 2 | 1002 | 1 |
| 2 | 2 | 1002 | 2 |
| 2 | 2 | 1002 | 3 |
| 2 | 2 | 1002 | 4 |
| 2 | 2 | 1002 | 5 |
| 2 | 2 | 1002 | 7 |
| 2 | 2 | 1002 | 9 |
| 2 | 2 | 1002 | 11 |
| 2 | 3 | 1003 | 12 |
| 2 | 3 | 1003 | 27 |
| 2 | 3 | 1003 | 28 |
| 2 | 3 | 1003 | 29 |
| 2 | 3 | 1003 | 1 |
| 2 | 3 | 1003 | 7 |
| 2 | 3 | 1003 | 9 |
| 2 | 3 | 1003 | 11 |
| 2 | 3 | 1003 | 12 |
| 2 | 3 | 1003 | 27 |
| 2 | 3 | 1003 | 28 |
| 2 | 3 | 1003 | 29 |
| 2 | 4 | 1004 | 1 |
| 2 | 4 | 1004 | 2 |
| 2 | 4 | 1004 | 3 |
| 2 | 4 | 1004 | 4 |
| 2 | 4 | 1004 | 5 |
| 2 | 4 | 1004 | 6 |
| 2 | 4 | 1004 | 27 |
| 2 | 4 | 1004 | 28 |
| 2 | 4 | 1004 | 29 |
Now I want to change this dataset to one shown below.
for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24
for family 2 : portfolio 1 (so that fund 1001) holds .....
| family | stock | port1 | port2 | port3 | port4 |
| 1 | 1 | 1 | 0 | 1 | . |
| 1 | 2 | 1 | 1 | 0 | . |
| 1 | 3 | 1 | 0 | 1 | . |
| 1 | 4 | 1 | 1 | 1 | . |
| 1 | 5 | 0 | 1 | 1 | . |
| 1 | 6 | 1 | 1 | 1 | . |
| 1 | 7 | 0 | 1 | 0 | . |
| 1 | 8 | 0 | 0 | 0 | . |
| 1 | 9 | 0 | 1 | 0 | . |
| 1 | 10 | 0 | 0 | 0 | . |
| 1 | 11 | 1 | 0 | 0 | . |
| 1 | 12 | 1 | 1 | 1 | . |
| 1 | 13 | 1 | 1 | 1 | . |
| 1 | 14 | 0 | 0 | 0 | . |
| 1 | 15 | 1 | 1 | 0 | . |
| 1 | 16 | 0 | 1 | 0 | . |
| 1 | 17 | 1 | 1 | 0 | . |
| 1 | 18 | 1 | 1 | 0 | . |
| 1 | 19 | 0 | 1 | 0 | . |
| 1 | 20 | 0 | 0 | 0 | . |
| 1 | 21 | 1 | 1 | 0 | . |
| 1 | 22 | 0 | 0 | 0 | . |
| 1 | 23 | 1 | 0 | 0 | . |
| 1 | 24 | 1 | 0 | 0 | . |
| 2 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 1 | 1 | 0 | 1 |
| 2 | 3 | 1 | 1 | 0 | 1 |
| 2 | 4 | 1 | 1 | 0 | 1 |
| 2 | 5 | 1 | 1 | 0 | 1 |
| 2 | 6 | 1 | 0 | 0 | 1 |
| 2 | 7 | 1 | 1 | 1 | 0 |
| 2 | 8 | 1 | 0 | 0 | 0 |
| 2 | 9 | 0 | 1 | 1 | 0 |
| 2 | 10 | 0 | 0 | 0 | 0 |
| 2 | 11 | 0 | 1 | 1 | 0 |
| 2 | 12 | 0 | 1 | 1 | 0 |
| 2 | 27 | 0 | 1 | 1 | 1 |
| 2 | 28 | 0 | 1 | 1 | 1 |
| 2 | 29 | 0 | 1 | 1 | 1 |
I want to get converted dataset EFFICIENTLY!
because the original dataset is so large!
(over 10 million obs... more than 300 families with average 40 funds for a family )
how can I do this?
Please help..
data have;
input family port fund stock;
datalines;
1 1 100 1
1 1 100 2
1 1 100 3
1 1 100 4
1 1 100 6
1 1 100 11
1 1 100 12
1 1 100 13
1 1 100 15
1 1 100 17
1 1 100 18
1 1 100 21
1 1 100 23
1 1 100 24
1 2 200 2
1 2 200 4
1 2 200 5
1 2 200 6
1 2 200 7
1 2 200 9
1 2 200 12
1 2 200 13
1 2 200 15
1 2 200 16
1 2 200 17
1 2 200 18
1 2 200 19
1 2 200 21
1 3 300 1
1 3 300 3
1 3 300 4
1 3 300 5
1 3 300 6
1 3 300 12
1 3 300 13
2 1 1001 1
2 1 1001 2
2 1 1001 3
2 1 1001 4
2 1 1001 5
2 1 1001 6
2 1 1001 7
2 1 1001 8
2 2 1002 1
2 2 1002 2
2 2 1002 3
2 2 1002 4
2 2 1002 5
2 2 1002 7
2 2 1002 9
2 2 1002 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 3 1003 1
2 3 1003 7
2 3 1003 9
2 3 1003 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 4 1004 1
2 4 1004 2
2 4 1004 3
2 4 1004 4
2 4 1004 5
2 4 1004 6
2 4 1004 27
2 4 1004 28
2 4 1004 29
;
run;
A datastep with an array might be the simplest:
data want;
set have;
array port{4} 8.;
retain port:;
by family;
if first.family then call missing (of port{*});
if ... then ...;
if last.family then output;
run;
I am afraid this is incomplete because I have no idea what the 1's and 0's mean, or what the logic here is. For instance, in row 1 of the required output, port3=1, I cannot see any reason for this?
I am afraid that does not make sense to me, I do not work in your industry or with your data. The first row of your required output shows:
| family | stock | port1 | port2 | port3 | port4 |
| 1 | 1 | 1 | 0 | 1 | . |
And the first row of the test data - the only one relating to the above as far as I can see:
input family port fund stock;
datalines;
1 1 100 1
There is nothing there which would indicate that port 2 should be 0, port3 should be 1, nor that port4 should be missing?
with all do respect don't you have a dataset 'have'
row=72? (provided code)
in row=29
there are
family port fund stock
1 3 300 1
right? This means that
family=1
port=3
stock=1
so that I want to have
family stock port1 port2 port3
1 1 1 0 1
zero is because for family=1 & port=2, they have no stock1
(there is no such row in dataset 'have' with value
family=1 & port=2 & stock=1)
Ah I see, not sorted. This works:
proc sort data=have;
by family stock;
run;
data want (drop=port fund);
set have;
array p{4} 8;
retain p:;
by family stock;
if first.stock then call missing(of p{*});
p{port}=1;
if last.stock then output;
run;
I think for that you would need an additional few lines:
proc sort data=have;
by family stock;
run;
data want (drop=port fund i on);
set have;
array p{4} 8;
retain p:;
by family stock;
if first.stock then call missing(of p{*});
p{port}=1;
if last.stock then do;
on=0;
do i=4 to 1 by -1;
if p{i}=1 then on=1;
if on=1 and p{i}=. then p{i}=0;
end;
output;
end;
run;
I haven't tested, but something like that should work, so basically loop back across the array from end to start, when hit the first 1 set a variable to on, then any of the array before that if missing becomes 0 rather than .
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.