BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkim197
Obsidian | Level 7

Hello SAS users !

 

I got panel dataset described as below

 

group represents fund family : In my original dataset, there are more than 300 groups for each quarter.

 

port represents portfolios that are hold by distinct funds within the group.

 

for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24

for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........

for family 2 fund 1001 holds portfolio 1 and so on...

portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.

 

familyportfundstock
111001
111002
111003
111004
111006
1110011
1110012
1110013
1110015
1110017
1110018
1110021
1110023
1110024
122002
122004
122005
122006
122007
122009
1220012
1220013
1220015
1220016
1220017
1220018
1220019
1220021
133001
133003
133004
133005
133006
1330012
1330013
2110011
2110012
2110013
2110014
2110015
2110016
2110017
2110018
2210021
2210022
2210023
2210024
2210025
2210027
2210029
22100211
23100312
23100327
23100328
23100329
2310031
2310037
2310039
23100311
23100312
23100327
23100328
23100329
2410041
2410042
2410043
2410044
2410045
2410046
24100427
24100428
24100429

 

 

 

Now I want to change this dataset to one shown below.

 

for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24

for family 2 : portfolio 1 (so that fund 1001) holds .....

 

 

familystockport1port2port3port4
11101.
12110.
13101.
14111.
15011.
16111.
17010.
18000.
19010.
110000.
111100.
112111.
113111.
114000.
115110.
116010.
117110.
118110.
119010.
120000.
121110.
122000.
123100.
124100.
211111
221101
231101
241101
251101
261001
271110
281000
290110
2100000
2110110
2120110
2270111
2280111
2290111

 

 

I want to get converted dataset EFFICIENTLY!

because the original dataset is so large!

(over 10 million obs... more than 300 families with average 40 funds for a family )

 


how can I do this?

 

Please help..

 

 

 

 

 

 

 

data have;
input family port fund stock;
datalines;
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah I see, not sorted.  This works:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then output;
run;

View solution in original post

9 REPLIES 9
jkim197
Obsidian | Level 7

Hello SAS users !

 

I got panel dataset described as below

 

group represents fund family : In my original dataset, there are more than 300 groups for each quarter.

 

port represents portfolios that are hold by distinct funds within the group.

 

for family 1 fund 100 holds portfolio 1 and this portfolio1 holds stock 1,2,3,4,6,11 12 13 15 17 18 21 23 24

for family 1 fund 200 holds portfolio 2 and this portfolio1 holds stock ........

for family 2 fund 1001 holds portfolio 1 and so on...

portfolio 1 of family 1 and portfolio 1 of family 2 are different portfolios but it does not matter for coding.

 

familyportfundstock
111001
111002
111003
111004
111006
1110011
1110012
1110013
1110015
1110017
1110018
1110021
1110023
1110024
122002
122004
122005
122006
122007
122009
1220012
1220013
1220015
1220016
1220017
1220018
1220019
1220021
133001
133003
133004
133005
133006
1330012
1330013
2110011
2110012
2110013
2110014
2110015
2110016
2110017
2110018
2210021
2210022
2210023
2210024
2210025
2210027
2210029
22100211
23100312
23100327
23100328
23100329
2310031
2310037
2310039
23100311
23100312
23100327
23100328
23100329
2410041
2410042
2410043
2410044
2410045
2410046
24100427
24100428
24100429

 

 

 

Now I want to change this dataset to one shown below.

 

for family 1 : portfolio1 (so that fund 100) holds stock 1, 2,3,4,6,11,12 13 15 17 18 21 23 24

for family 2 : portfolio 1 (so that fund 1001) holds .....

 

 

familystockport1port2port3port4
11101.
12110.
13101.
14111.
15011.
16111.
17010.
18000.
19010.
110000.
111100.
112111.
113111.
114000.
115110.
116010.
117110.
118110.
119010.
120000.
121110.
122000.
123100.
124100.
211111
221101
231101
241101
251101
261001
271110
281000
290110
2100000
2110110
2120110
2270111
2280111
2290111

 

 

I want to get converted dataset EFFICIENTLY!

because the original dataset is so large!

(over 10 million obs... more than 300 families with average 40 funds for a family )

 


how can I do this?

 

Please help..

 

 

 

 

 

 

 

data have;
input family port fund stock;
datalines;
1	1	100	1
1	1	100	2
1	1	100	3
1	1	100	4
1	1	100	6
1	1	100	11
1	1	100	12
1	1	100	13
1	1	100	15
1	1	100	17
1	1	100	18
1	1	100	21
1	1	100	23
1	1	100	24
1	2	200	2
1	2	200	4
1	2	200	5
1	2	200	6
1	2	200	7
1	2	200	9
1	2	200	12
1	2	200	13
1	2	200	15
1	2	200	16
1	2	200	17
1	2	200	18
1	2	200	19
1	2	200	21
1	3	300	1
1	3	300	3
1	3	300	4
1	3	300	5
1	3	300	6
1	3	300	12
1	3	300	13
2	1	1001	1
2	1	1001	2
2	1	1001	3
2	1	1001	4
2	1	1001	5
2	1	1001	6
2	1	1001	7
2	1	1001	8
2	2	1002	1
2	2	1002	2
2	2	1002	3
2	2	1002	4
2	2	1002	5
2	2	1002	7
2	2	1002	9
2	2	1002	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	3	1003	1
2	3	1003	7
2	3	1003	9
2	3	1003	11
2	3	1003	12
2	3	1003	27
2	3	1003	28
2	3	1003	29
2	4	1004	1
2	4	1004	2
2	4	1004	3
2	4	1004	4
2	4	1004	5
2	4	1004	6
2	4	1004	27
2	4	1004	28
2	4	1004	29
;
run;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

A datastep with an array might be the simplest:

data want;
  set have;
  array port{4} 8.;
  retain port:;
  by family;
  if first.family then call missing (of port{*});
  if ... then ...;
  if last.family then output;
run;

I am afraid this is incomplete because I have no idea what the 1's and 0's mean, or what the logic here is.  For instance, in row 1 of the required output, port3=1, I cannot see any reason for this?

 

jkim197
Obsidian | Level 7
it's just a simple dummy variable

1 = if port j of family i holds stock k
0 = if port j of family i does not hold stock k
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I am afraid that does not make sense to me, I do not work in your industry or with your data.  The first row of your required output shows:

family stock port1 port2 port3 port4
1 1 1 0 1 .

 

And the first row of the test data - the only one relating to the above as far as I can see:

input family port fund stock;
datalines;
1	1	100	1

There is nothing there which would indicate that port 2 should be 0, port3 should be 1, nor that port4 should be missing?

jkim197
Obsidian | Level 7

with all do respect don't you have a dataset 'have'
row=72? (provided code)

in row=29
there are
family port fund stock
1 3 300 1


right? This means that
family=1
port=3
stock=1

so that I want to have
family stock port1 port2 port3
1 1 1 0 1


zero is because for family=1 & port=2, they have no stock1
(there is no such row in dataset 'have' with value
family=1 & port=2 & stock=1)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah I see, not sorted.  This works:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then output;
run;

jkim197
Obsidian | Level 7
This works really well Thank you so much!!!!

But can I ask one more?
When I run your code, the outcome gives me no zeros but missing. Can I earn zeros for them and missing just for family=1 & port=4 ?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I think for that you would need an additional few lines:

proc sort data=have;
  by family stock;
run;

data want (drop=port fund i on);
  set have;
  array p{4} 8;
  retain p:;
  by family stock;
  if first.stock then call missing(of p{*});
  p{port}=1;
  if last.stock then do;
on=0;
do i=4 to 1 by -1;
if p{i}=1 then on=1;
if on=1 and p{i}=. then p{i}=0;
end;
output;
end; run;

I haven't tested, but something like that should work, so basically loop back across the array from end to start, when hit the first 1 set a variable to on, then any of the array before that if missing becomes 0 rather than . 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1100 views
  • 1 like
  • 3 in conversation