Just to explain the columns in the screenshot; every time when "a" is not equal to "." (or in other words "a" has a value), "c" will be the "b"'s previous observation. For example: at observation 9 , there is a value at "a", and at observation 8: b="."
Therefore the goal that i'm failing to achieve is to obtain the value in correct "c". To obtain this, we need to keep moving up to the previous observation where "b" has a value. we will then report this value (from "b" ) into "c" at observation 9. As you can see in the screenshot, the values in c were obtained from the matching colored from b.
Heres my code so far which is providing me the columns in red:
if a ^="." then c=prev_b;
Will appreciate it if anyone can give me some advice/help. Thank you in advance.
Message was edited by: adrian.waw
Does your data ONLY have columns A and B??? Do you have an identifying variable or BY variable?? What about rows 1-4 and row 8, where the values of A and B are both missing???? Is that correct??? Are there other columns that you're not showing or are these just random measurements??? When you get this data into a SAS dataset, do you want to keep rows 1-4 and row 8 and other rows with blanks for both A and B -- what purpose do these rows serve???
You might want to search through the SAS documentation and previous forum postings for information and examples about the LAG function. You might still need to account for your rows with missing values for both A and B, but I believe the LAG function might help you in your requirement.
Another thing to think about...why do only some rows have a value for C??? For example, why do rows 6 and 9 have values for C, but rows 5 and 8 do NOT have values for C?? What is the logic that decides whether a row gets a value for C or not???
Anyway, do some research on the LAG function -- you will have to decide how to create C and decide whether the LAG function will do what you want. This sample program may help get you started, too. SASHELP.CLASS has 19 observations, it should be available to you on your system to experiment with and see how the LAG function works.
proc sort data=sashelp.class out=class;
where age in (12, 13, 14, 15);
proc print data=class;
title 'Show Data before using LAG function';
set class; by age;
prev_wt = lag(weight);
prev_ht = lag(height);
prev_name = lag(name);
** On the first age of every group, set PREV_ info variables to missing;
** because do not want to carry info from diff age group into new age group;
if first.age then do;
prev_name = ' ';
prev_wt = .;
prev_ht = .;
proc print data=showlag;
title 'Simple Example of LAG Function';
var name age height weight prev_ht prev_wt prev_name;
Does your data ONLY have columns A and B???
Yes i do have more columns other than A and B, heres the screenshot of what the table looks like in sas. http://img191.imageshack.us/img191/9155/sas1t.jpg , where A = price, B= bid_price, prev_B=prev_bid_price, C=updated_bid
Do you have an identifying variable or BY variable??
Yes i need to calculate C (or updated_bid) BY date and some time conditions when i get this part of the coding right. I have created a small subset of my data at this date and time between 4 to 4.15.
What about rows 1-4 and row 8, where the values of A and B are both missing???? Is that correct??? Are there other columns that you're not showing or are these just random measurements???
Yes, the values of A (price) and B (bid_price) are both missing and these are not random measurements. The screenshot above are all the columns that ill be working with except that there will be another column for different stock name.
When you get this data into a SAS dataset, do you want to keep rows 1-4 and row 8 and other rows with blanks for both A and B -- what purpose do these rows serve???
Yes, ill need to keep those blanks rows as blanks as it represents at that particular time, there isn't any trade(price) when there is maybe an ask price or bid price.
Why do only some rows have a value for C??? For example, why do rows 6 and 9 have values for C, but rows 5 and 8 do NOT have values for C?? What is the logic that decides whether a row gets a value for C or not???
This is because when there is a trade, there will be a 'price(column a)',and therefore the transaction happened due to either the previous bid_price or ask_price. Therefore ill need to have a updated previous bid_price and ask_price; and then calculate whether bid price or ask price is closer to price at the trade time. (after i get this part of the code written correctly, ill need to only take the previous 'bid_price' based on time condition that the 'bid price' must be 5 seconds older than the current 'price').
I have been trying with the lag function and i don't think it can be applied to my case (correct me if i'm wrong). Because there are times that i need to go up 2 observations or 3 or more when the previous observation is empty. From the dataset: looking at bid_price row 6, i only had to lag once and get the previous value. But at row 12, i have to lag five times (lag5) to get the (previous) available bid_price.
So far this is what i've done: http://img191.imageshack.us/img191/2225/sas2r.jpg . The highlighted cell is supposed to have a value which means i need to lag more. However the rest of my data may have more than 20 empty observations in bid price. Therefore what i've done is inefficient, heres my code:
if price^="." then updated_bid=prev_bid_price;
else if price ="." then prev_bid_price=bid_price;
if price ="." and bid_price="." then prev_bid_price=l_bid_price;
if price= "." and bid_price="." and l_bid_price="." then prev_bid_price=l2_bid_price;
if price= "." and bid_price="." and l_bid_price="." and l2_bid_price="."
Is there a way to fix this? i'm gonna keep trying and ill appreciate any help in regards to this. Thanks!
Message was edited by: adrian.waw