Hello SAS users !
I got panel dataset described as below
group represents fund family : In my original dataset, there are more than 300 groups for each quarter.
port represents portfolios that are hold by distinct funds within the group.
for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24
for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........
for family 2 fund 1001 holds portfolio 1 and so on...
portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.
family | port | fund | stock |
1 | 1 | 100 | 1 |
1 | 1 | 100 | 2 |
1 | 1 | 100 | 3 |
1 | 1 | 100 | 4 |
1 | 1 | 100 | 6 |
1 | 1 | 100 | 11 |
1 | 1 | 100 | 12 |
1 | 1 | 100 | 13 |
1 | 1 | 100 | 15 |
1 | 1 | 100 | 17 |
1 | 1 | 100 | 18 |
1 | 1 | 100 | 21 |
1 | 1 | 100 | 23 |
1 | 1 | 100 | 24 |
1 | 2 | 200 | 2 |
1 | 2 | 200 | 4 |
1 | 2 | 200 | 5 |
1 | 2 | 200 | 6 |
1 | 2 | 200 | 7 |
1 | 2 | 200 | 9 |
1 | 2 | 200 | 12 |
1 | 2 | 200 | 13 |
1 | 2 | 200 | 15 |
1 | 2 | 200 | 16 |
1 | 2 | 200 | 17 |
1 | 2 | 200 | 18 |
1 | 2 | 200 | 19 |
1 | 2 | 200 | 21 |
1 | 3 | 300 | 1 |
1 | 3 | 300 | 3 |
1 | 3 | 300 | 4 |
1 | 3 | 300 | 5 |
1 | 3 | 300 | 6 |
1 | 3 | 300 | 12 |
1 | 3 | 300 | 13 |
2 | 1 | 1001 | 1 |
2 | 1 | 1001 | 2 |
2 | 1 | 1001 | 3 |
2 | 1 | 1001 | 4 |
2 | 1 | 1001 | 5 |
2 | 1 | 1001 | 6 |
2 | 1 | 1001 | 7 |
2 | 1 | 1001 | 8 |
2 | 2 | 1002 | 1 |
2 | 2 | 1002 | 2 |
2 | 2 | 1002 | 3 |
2 | 2 | 1002 | 4 |
2 | 2 | 1002 | 5 |
2 | 2 | 1002 | 7 |
2 | 2 | 1002 | 9 |
2 | 2 | 1002 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 3 | 1003 | 1 |
2 | 3 | 1003 | 7 |
2 | 3 | 1003 | 9 |
2 | 3 | 1003 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 4 | 1004 | 1 |
2 | 4 | 1004 | 2 |
2 | 4 | 1004 | 3 |
2 | 4 | 1004 | 4 |
2 | 4 | 1004 | 5 |
2 | 4 | 1004 | 6 |
2 | 4 | 1004 | 27 |
2 | 4 | 1004 | 28 |
2 | 4 | 1004 | 29 |
Now I want to change this dataset to one shown below.
for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24
for family 2 : portfolio 1 (so that fund 1001) holds .....
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
1 | 2 | 1 | 1 | 0 | . |
1 | 3 | 1 | 0 | 1 | . |
1 | 4 | 1 | 1 | 1 | . |
1 | 5 | 0 | 1 | 1 | . |
1 | 6 | 1 | 1 | 1 | . |
1 | 7 | 0 | 1 | 0 | . |
1 | 8 | 0 | 0 | 0 | . |
1 | 9 | 0 | 1 | 0 | . |
1 | 10 | 0 | 0 | 0 | . |
1 | 11 | 1 | 0 | 0 | . |
1 | 12 | 1 | 1 | 1 | . |
1 | 13 | 1 | 1 | 1 | . |
1 | 14 | 0 | 0 | 0 | . |
1 | 15 | 1 | 1 | 0 | . |
1 | 16 | 0 | 1 | 0 | . |
1 | 17 | 1 | 1 | 0 | . |
1 | 18 | 1 | 1 | 0 | . |
1 | 19 | 0 | 1 | 0 | . |
1 | 20 | 0 | 0 | 0 | . |
1 | 21 | 1 | 1 | 0 | . |
1 | 22 | 0 | 0 | 0 | . |
1 | 23 | 1 | 0 | 0 | . |
1 | 24 | 1 | 0 | 0 | . |
2 | 1 | 1 | 1 | 1 | 1 |
2 | 2 | 1 | 1 | 0 | 1 |
2 | 3 | 1 | 1 | 0 | 1 |
2 | 4 | 1 | 1 | 0 | 1 |
2 | 5 | 1 | 1 | 0 | 1 |
2 | 6 | 1 | 0 | 0 | 1 |
2 | 7 | 1 | 1 | 1 | 0 |
2 | 8 | 1 | 0 | 0 | 0 |
2 | 9 | 0 | 1 | 1 | 0 |
2 | 10 | 0 | 0 | 0 | 0 |
2 | 11 | 0 | 1 | 1 | 0 |
2 | 12 | 0 | 1 | 1 | 0 |
2 | 27 | 0 | 1 | 1 | 1 |
2 | 28 | 0 | 1 | 1 | 1 |
2 | 29 | 0 | 1 | 1 | 1 |
I want to get converted dataset EFFICIENTLY!
because the original dataset is so large!
(over 10 million obs... more than 300 families with average 40 funds for a family )
how can I do this?
Please help..
data have;
input family port fund stock;
datalines;
1 1 100 1
1 1 100 2
1 1 100 3
1 1 100 4
1 1 100 6
1 1 100 11
1 1 100 12
1 1 100 13
1 1 100 15
1 1 100 17
1 1 100 18
1 1 100 21
1 1 100 23
1 1 100 24
1 2 200 2
1 2 200 4
1 2 200 5
1 2 200 6
1 2 200 7
1 2 200 9
1 2 200 12
1 2 200 13
1 2 200 15
1 2 200 16
1 2 200 17
1 2 200 18
1 2 200 19
1 2 200 21
1 3 300 1
1 3 300 3
1 3 300 4
1 3 300 5
1 3 300 6
1 3 300 12
1 3 300 13
2 1 1001 1
2 1 1001 2
2 1 1001 3
2 1 1001 4
2 1 1001 5
2 1 1001 6
2 1 1001 7
2 1 1001 8
2 2 1002 1
2 2 1002 2
2 2 1002 3
2 2 1002 4
2 2 1002 5
2 2 1002 7
2 2 1002 9
2 2 1002 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 3 1003 1
2 3 1003 7
2 3 1003 9
2 3 1003 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 4 1004 1
2 4 1004 2
2 4 1004 3
2 4 1004 4
2 4 1004 5
2 4 1004 6
2 4 1004 27
2 4 1004 28
2 4 1004 29
;
run;
Ah I see, not sorted. This works:
proc sort data=have; by family stock; run; data want (drop=port fund); set have; array p{4} 8; retain p:; by family stock; if first.stock then call missing(of p{*}); p{port}=1; if last.stock then output; run;
Perhaps use the macro here
Hello SAS users !
I got panel dataset described as below
group represents fund family : In my original dataset, there are more than 300 groups for each quarter.
port represents portfolios that are hold by distinct funds within the group.
for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24
for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........
for family 2 fund 1001 holds portfolio 1 and so on...
portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.
family | port | fund | stock |
1 | 1 | 100 | 1 |
1 | 1 | 100 | 2 |
1 | 1 | 100 | 3 |
1 | 1 | 100 | 4 |
1 | 1 | 100 | 6 |
1 | 1 | 100 | 11 |
1 | 1 | 100 | 12 |
1 | 1 | 100 | 13 |
1 | 1 | 100 | 15 |
1 | 1 | 100 | 17 |
1 | 1 | 100 | 18 |
1 | 1 | 100 | 21 |
1 | 1 | 100 | 23 |
1 | 1 | 100 | 24 |
1 | 2 | 200 | 2 |
1 | 2 | 200 | 4 |
1 | 2 | 200 | 5 |
1 | 2 | 200 | 6 |
1 | 2 | 200 | 7 |
1 | 2 | 200 | 9 |
1 | 2 | 200 | 12 |
1 | 2 | 200 | 13 |
1 | 2 | 200 | 15 |
1 | 2 | 200 | 16 |
1 | 2 | 200 | 17 |
1 | 2 | 200 | 18 |
1 | 2 | 200 | 19 |
1 | 2 | 200 | 21 |
1 | 3 | 300 | 1 |
1 | 3 | 300 | 3 |
1 | 3 | 300 | 4 |
1 | 3 | 300 | 5 |
1 | 3 | 300 | 6 |
1 | 3 | 300 | 12 |
1 | 3 | 300 | 13 |
2 | 1 | 1001 | 1 |
2 | 1 | 1001 | 2 |
2 | 1 | 1001 | 3 |
2 | 1 | 1001 | 4 |
2 | 1 | 1001 | 5 |
2 | 1 | 1001 | 6 |
2 | 1 | 1001 | 7 |
2 | 1 | 1001 | 8 |
2 | 2 | 1002 | 1 |
2 | 2 | 1002 | 2 |
2 | 2 | 1002 | 3 |
2 | 2 | 1002 | 4 |
2 | 2 | 1002 | 5 |
2 | 2 | 1002 | 7 |
2 | 2 | 1002 | 9 |
2 | 2 | 1002 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 3 | 1003 | 1 |
2 | 3 | 1003 | 7 |
2 | 3 | 1003 | 9 |
2 | 3 | 1003 | 11 |
2 | 3 | 1003 | 12 |
2 | 3 | 1003 | 27 |
2 | 3 | 1003 | 28 |
2 | 3 | 1003 | 29 |
2 | 4 | 1004 | 1 |
2 | 4 | 1004 | 2 |
2 | 4 | 1004 | 3 |
2 | 4 | 1004 | 4 |
2 | 4 | 1004 | 5 |
2 | 4 | 1004 | 6 |
2 | 4 | 1004 | 27 |
2 | 4 | 1004 | 28 |
2 | 4 | 1004 | 29 |
Now I want to change this dataset to one shown below.
for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24
for family 2 : portfolio 1 (so that fund 1001) holds .....
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
1 | 2 | 1 | 1 | 0 | . |
1 | 3 | 1 | 0 | 1 | . |
1 | 4 | 1 | 1 | 1 | . |
1 | 5 | 0 | 1 | 1 | . |
1 | 6 | 1 | 1 | 1 | . |
1 | 7 | 0 | 1 | 0 | . |
1 | 8 | 0 | 0 | 0 | . |
1 | 9 | 0 | 1 | 0 | . |
1 | 10 | 0 | 0 | 0 | . |
1 | 11 | 1 | 0 | 0 | . |
1 | 12 | 1 | 1 | 1 | . |
1 | 13 | 1 | 1 | 1 | . |
1 | 14 | 0 | 0 | 0 | . |
1 | 15 | 1 | 1 | 0 | . |
1 | 16 | 0 | 1 | 0 | . |
1 | 17 | 1 | 1 | 0 | . |
1 | 18 | 1 | 1 | 0 | . |
1 | 19 | 0 | 1 | 0 | . |
1 | 20 | 0 | 0 | 0 | . |
1 | 21 | 1 | 1 | 0 | . |
1 | 22 | 0 | 0 | 0 | . |
1 | 23 | 1 | 0 | 0 | . |
1 | 24 | 1 | 0 | 0 | . |
2 | 1 | 1 | 1 | 1 | 1 |
2 | 2 | 1 | 1 | 0 | 1 |
2 | 3 | 1 | 1 | 0 | 1 |
2 | 4 | 1 | 1 | 0 | 1 |
2 | 5 | 1 | 1 | 0 | 1 |
2 | 6 | 1 | 0 | 0 | 1 |
2 | 7 | 1 | 1 | 1 | 0 |
2 | 8 | 1 | 0 | 0 | 0 |
2 | 9 | 0 | 1 | 1 | 0 |
2 | 10 | 0 | 0 | 0 | 0 |
2 | 11 | 0 | 1 | 1 | 0 |
2 | 12 | 0 | 1 | 1 | 0 |
2 | 27 | 0 | 1 | 1 | 1 |
2 | 28 | 0 | 1 | 1 | 1 |
2 | 29 | 0 | 1 | 1 | 1 |
I want to get converted dataset EFFICIENTLY!
because the original dataset is so large!
(over 10 million obs... more than 300 families with average 40 funds for a family )
how can I do this?
Please help..
data have;
input family port fund stock;
datalines;
1 1 100 1
1 1 100 2
1 1 100 3
1 1 100 4
1 1 100 6
1 1 100 11
1 1 100 12
1 1 100 13
1 1 100 15
1 1 100 17
1 1 100 18
1 1 100 21
1 1 100 23
1 1 100 24
1 2 200 2
1 2 200 4
1 2 200 5
1 2 200 6
1 2 200 7
1 2 200 9
1 2 200 12
1 2 200 13
1 2 200 15
1 2 200 16
1 2 200 17
1 2 200 18
1 2 200 19
1 2 200 21
1 3 300 1
1 3 300 3
1 3 300 4
1 3 300 5
1 3 300 6
1 3 300 12
1 3 300 13
2 1 1001 1
2 1 1001 2
2 1 1001 3
2 1 1001 4
2 1 1001 5
2 1 1001 6
2 1 1001 7
2 1 1001 8
2 2 1002 1
2 2 1002 2
2 2 1002 3
2 2 1002 4
2 2 1002 5
2 2 1002 7
2 2 1002 9
2 2 1002 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 3 1003 1
2 3 1003 7
2 3 1003 9
2 3 1003 11
2 3 1003 12
2 3 1003 27
2 3 1003 28
2 3 1003 29
2 4 1004 1
2 4 1004 2
2 4 1004 3
2 4 1004 4
2 4 1004 5
2 4 1004 6
2 4 1004 27
2 4 1004 28
2 4 1004 29
;
run;
A datastep with an array might be the simplest:
data want; set have; array port{4} 8.; retain port:; by family; if first.family then call missing (of port{*}); if ... then ...; if last.family then output; run;
I am afraid this is incomplete because I have no idea what the 1's and 0's mean, or what the logic here is. For instance, in row 1 of the required output, port3=1, I cannot see any reason for this?
I am afraid that does not make sense to me, I do not work in your industry or with your data. The first row of your required output shows:
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
And the first row of the test data - the only one relating to the above as far as I can see:
input family port fund stock;
datalines;
1 1 100 1
There is nothing there which would indicate that port 2 should be 0, port3 should be 1, nor that port4 should be missing?
with all do respect don't you have a dataset 'have'
row=72? (provided code)
in row=29
there are
family port fund stock
1 3 300 1
right? This means that
family=1
port=3
stock=1
so that I want to have
family stock port1 port2 port3
1 1 1 0 1
zero is because for family=1 & port=2, they have no stock1
(there is no such row in dataset 'have' with value
family=1 & port=2 & stock=1)
Ah I see, not sorted. This works:
proc sort data=have; by family stock; run; data want (drop=port fund); set have; array p{4} 8; retain p:; by family stock; if first.stock then call missing(of p{*}); p{port}=1; if last.stock then output; run;
I think for that you would need an additional few lines:
proc sort data=have; by family stock; run; data want (drop=port fund i on); set have; array p{4} 8; retain p:; by family stock; if first.stock then call missing(of p{*}); p{port}=1; if last.stock then do;
on=0;
do i=4 to 1 by -1;
if p{i}=1 then on=1;
if on=1 and p{i}=. then p{i}=0;
end;
output;
end; run;
I haven't tested, but something like that should work, so basically loop back across the array from end to start, when hit the first 1 set a variable to on, then any of the array before that if missing becomes 0 rather than .
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.