I have a dataset with many missing values. One remedy is to fill forward the values after checking some variables. Here is an example of the data:
data example;
	input Obs:$3. fund:$3. Stock:$3. Date:YYMMDD8. CUSIP:$8. ISIN:$5. number_of_shares: BEST12. ;
	datalines;
1	SPY	ABC	20190301	12345678	US123	15000
2	SPY	ABC	20190302	12345678	US123	15100
3	SPY	ABC	20190303	.	.	15100
4	SPY	ABC	20190304	.	.	15000
5	SPY	ABC	20190305	.	.	15200
6	SPY	ABC	20190306	.	.	15231
7	SPY	ABC	20190307	.	.	14912
8	SPY	ABC	20190308	.	.	14872
9	SPY	ABC	20190309	12345678	US123	15340
10	SPY	ABC	20190310	12345678	US123	15000
11	SPY	XYZ	20190301	88888888	CA888	500
12	SPY	XYZ	20190302	88888888	CA888	510
13	SPY	XYZ	20190303	.	.	503
14	SPY	XYZ	20190304	.	.	515
15	SPY	XYZ	20190305	.	.	524
16	SPY	XYZ	20190306	.	.	552
17	SPY	XYZ	20190307	.	.	.
18	SPY	XYZ	20190308	.	.	.
19	SPY	XYZ	20190309	.	.	.
20	SPY	XYZ	20190310	.	.	555
21	SPY	XYZ	20190311	.	.	555
22	SPY	XYZ	20190312	.	.	532
	;
run;So, as you can see, there are missing value for CUSIP and ISIN. I would like to fill forward only if 2 criteria are met: (i) the next available value for CUSIP (or ISIN) is the same as the previously available value for CUSIP (or ISIN), and (ii) the value of the variable number_of_shares does not change by 10%.
So in the case of stock ABC, as CUSIP in obs 2 is the same as CUSIP in obs 9, and the values in number_of_shares are similar so fill forward CUSIP for obs 3,4,5,6,7,8. same with ISIN.
In the case of stock XYZ, If the next available value for CUSIP is the same as the value in obs 12, then fill forward the value for CUSIP and ISIN. But in this case, fill forward the value of the number_of_shares for obs 17,18,19 too (i.e. they will take 552). However, if the next available value for CUSIP is not the same as the one in obs 12 then don't do these fill-forward.
Edit: So what I want looks like this:
data etf_example2;
	input Obs:$3. fund:$3. Stock:$3. Date:YYMMDD8. CUSIP:$8. ISIN:$5. number_of_shares: BEST12. ;
	datalines;
1	SPY	ABC	20190301	12345678	US123	15000
2	SPY	ABC	20190302	12345678	US123	15100
3	SPY	ABC	20190303	12345678	US123	15100
4	SPY	ABC	20190304	12345678	US123	15000
5	SPY	ABC	20190305	12345678	US123	15200
6	SPY	ABC	20190306	12345678	US123	15231
7	SPY	ABC	20190307	12345678	US123	14912
8	SPY	ABC	20190308	12345678	US123	14872
9	SPY	ABC	20190309	12345678	US123	15340
10	SPY	ABC	20190310	12345678	US123	15000
11	SPY	XYZ	20190301	88888888	CA888	500
12	SPY	XYZ	20190302	88888888	CA888	510
13	SPY	XYZ	20190303	88888888	CA888	503
14	SPY	XYZ	20190304	88888888	CA888	515
15	SPY	XYZ	20190305	88888888	CA888	524
16	SPY	XYZ	20190306	88888888	CA888	552
17	SPY	XYZ	20190307	88888888	CA888	552
18	SPY	XYZ	20190308	88888888	CA888	552
19	SPY	XYZ	20190309	88888888	CA888	552
20	SPY	XYZ	20190310	88888888	CA888	555
21	SPY	XYZ	20190311	88888888	CA888	555
22	SPY	XYZ	20190312	88888888	CA888	532
	;
run;So, given your posted example data (and thank you for that 🙂 ), what does your desired result look like?
Always nice to have something to aim for
I edited the question to add the desired output
This is the most weird logic i have seen for a long time.
You can use the retain-statement (with new variables) to save the values during data-step iteration. You will need to implement look-ahead for the first rule. There are multiple documents describing this pattern, i am sure you find useful paper.
Yes, but most look-ahead options are used in DATA step that read or merge in the same dataset with a specified number of lead. The issue here is that we don't know how many step(obs) to look ahead. Also it might not be optimal for the large dataset (500gb) which I have
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
