Hi Cynthia, Thanks for the response.
Does your data ONLY have columns A and B???
Yes i do have more columns other than A and B, heres the screenshot of what the table looks like in sas.
http://img191.imageshack.us/img191/9155/sas1t.jpg , where A = price, B= bid_price, prev_B=prev_bid_price, C=updated_bid
Do you have an identifying variable or BY variable??
Yes i need to calculate C (or updated_bid) BY date and some time conditions when i get this part of the coding right. I have created a small subset of my data at this date and time between 4 to 4.15.
What about rows 1-4 and row 8, where the values of A and B are both missing???? Is that correct??? Are there other columns that you're not showing or are these just random measurements???
Yes, the values of A (price) and B (bid_price) are both missing and these are not random measurements. The screenshot above are all the columns that ill be working with except that there will be another column for different stock name.
When you get this data into a SAS dataset, do you want to keep rows 1-4 and row 8 and other rows with blanks for both A and B -- what purpose do these rows serve???
Yes, ill need to keep those blanks rows as blanks as it represents at that particular time, there isn't any trade(price) when there is maybe an ask price or bid price.
Why do only some rows have a value for C??? For example, why do rows 6 and 9 have values for C, but rows 5 and 8 do NOT have values for C?? What is the logic that decides whether a row gets a value for C or not???
This is because when there is a trade, there will be a 'price(column a)',and therefore the transaction happened due to either the previous bid_price or ask_price. Therefore ill need to have a updated previous bid_price and ask_price; and then calculate whether bid price or ask price is closer to price at the trade time. (after i get this part of the code written correctly, ill need to only take the previous 'bid_price' based on time condition that the 'bid price' must be 5 seconds older than the current 'price').
I have been trying with the lag function and i don't think it can be applied to my case (correct me if i'm wrong). Because there are times that i need to go up 2 observations or 3 or more when the previous observation is empty. From the dataset: looking at bid_price row 6, i only had to lag once and get the previous value. But at row 12, i have to lag five times (lag5) to get the (previous) available bid_price.
So far this is what i've done:
http://img191.imageshack.us/img191/2225/sas2r.jpg . The highlighted cell is supposed to have a value which means i need to lag more. However the rest of my data may have more than 20 empty observations in bid price. Therefore what i've done is inefficient, heres my code:
data data_new5;
set data_try2;
l_bid_price=lag(bid_price);
l2_bid_price=lag2(bid_price);
l3_bid_price=lag3(bid_price);
retain prev_bid_price;
if price^="." then updated_bid=prev_bid_price;
else if price ="." then prev_bid_price=bid_price;
if price ="." and bid_price="." then prev_bid_price=l_bid_price;
if price= "." and bid_price="." and l_bid_price="." then prev_bid_price=l2_bid_price;
if price= "." and bid_price="." and l_bid_price="." and l2_bid_price="."
then prev_bid_price=l3_bid_price;
run;
Is there a way to fix this? i'm gonna keep trying and ill appreciate any help in regards to this. Thanks!
Message was edited by: adrian.waw