BookmarkSubscribeRSS Feed
adrian_waw
Calcite | Level 5
Hi,

I hope someone can help me with this. I'll try my best to explain the problem.

Heres a screenshot of the original data, my current coding's output (which is in red) and what i want it to achieve(in grey). I have color coded the relating cells for better understanding:
http://img689.imageshack.us/img689/9981/sas1y.jpg

Just to explain the columns in the screenshot; every time when "a" is not equal to "." (or in other words "a" has a value), "c" will be the "b"'s previous observation. For example: at observation 9 , there is a value at "a", and at observation 8: b="."
Therefore the goal that i'm failing to achieve is to obtain the value in correct "c". To obtain this, we need to keep moving up to the previous observation where "b" has a value. we will then report this value (from "b" ) into "c" at observation 9. As you can see in the screenshot, the values in c were obtained from the matching colored from b.

Heres my code so far which is providing me the columns in red:
data data_new2;
set data_try2;
retain prev_b;
if a ^="." then c=prev_b;
else do;
prev_b=b;
end;
run;

Will appreciate it if anyone can give me some advice/help. Thank you in advance. Message was edited by: adrian.waw
4 REPLIES 4
Cynthia_sas
SAS Super FREQ
Hi:
No ideas, only more questions.

Does your data ONLY have columns A and B??? Do you have an identifying variable or BY variable?? What about rows 1-4 and row 8, where the values of A and B are both missing???? Is that correct??? Are there other columns that you're not showing or are these just random measurements??? When you get this data into a SAS dataset, do you want to keep rows 1-4 and row 8 and other rows with blanks for both A and B -- what purpose do these rows serve???

You might want to search through the SAS documentation and previous forum postings for information and examples about the LAG function. You might still need to account for your rows with missing values for both A and B, but I believe the LAG function might help you in your requirement.

Another thing to think about...why do only some rows have a value for C??? For example, why do rows 6 and 9 have values for C, but rows 5 and 8 do NOT have values for C?? What is the logic that decides whether a row gets a value for C or not???

Anyway, do some research on the LAG function -- you will have to decide how to create C and decide whether the LAG function will do what you want. This sample program may help get you started, too. SASHELP.CLASS has 19 observations, it should be available to you on your system to experiment with and see how the LAG function works.

cynthia
[pre]
proc sort data=sashelp.class out=class;
by age;
where age in (12, 13, 14, 15);
run;

proc print data=class;
title 'Show Data before using LAG function';
run;

data showlag;
set class; by age;
prev_wt = lag(weight);
prev_ht = lag(height);
prev_name = lag(name);
** On the first age of every group, set PREV_ info variables to missing;
** because do not want to carry info from diff age group into new age group;
if first.age then do;
prev_name = ' ';
prev_wt = .;
prev_ht = .;
end;
run;

proc print data=showlag;
by age;
title 'Simple Example of LAG Function';
var name age height weight prev_ht prev_wt prev_name;
run;
[/pre]
adrian_waw
Calcite | Level 5
Hi Cynthia, Thanks for the response.

Does your data ONLY have columns A and B???
Yes i do have more columns other than A and B, heres the screenshot of what the table looks like in sas.
http://img191.imageshack.us/img191/9155/sas1t.jpg , where A = price, B= bid_price, prev_B=prev_bid_price, C=updated_bid

Do you have an identifying variable or BY variable??
Yes i need to calculate C (or updated_bid) BY date and some time conditions when i get this part of the coding right. I have created a small subset of my data at this date and time between 4 to 4.15.

What about rows 1-4 and row 8, where the values of A and B are both missing???? Is that correct??? Are there other columns that you're not showing or are these just random measurements???
Yes, the values of A (price) and B (bid_price) are both missing and these are not random measurements. The screenshot above are all the columns that ill be working with except that there will be another column for different stock name.

When you get this data into a SAS dataset, do you want to keep rows 1-4 and row 8 and other rows with blanks for both A and B -- what purpose do these rows serve???
Yes, ill need to keep those blanks rows as blanks as it represents at that particular time, there isn't any trade(price) when there is maybe an ask price or bid price.

Why do only some rows have a value for C??? For example, why do rows 6 and 9 have values for C, but rows 5 and 8 do NOT have values for C?? What is the logic that decides whether a row gets a value for C or not???
This is because when there is a trade, there will be a 'price(column a)',and therefore the transaction happened due to either the previous bid_price or ask_price. Therefore ill need to have a updated previous bid_price and ask_price; and then calculate whether bid price or ask price is closer to price at the trade time. (after i get this part of the code written correctly, ill need to only take the previous 'bid_price' based on time condition that the 'bid price' must be 5 seconds older than the current 'price').

I have been trying with the lag function and i don't think it can be applied to my case (correct me if i'm wrong). Because there are times that i need to go up 2 observations or 3 or more when the previous observation is empty. From the dataset: looking at bid_price row 6, i only had to lag once and get the previous value. But at row 12, i have to lag five times (lag5) to get the (previous) available bid_price.

So far this is what i've done:
http://img191.imageshack.us/img191/2225/sas2r.jpg . The highlighted cell is supposed to have a value which means i need to lag more. However the rest of my data may have more than 20 empty observations in bid price. Therefore what i've done is inefficient, heres my code:

data data_new5;
set data_try2;
l_bid_price=lag(bid_price);
l2_bid_price=lag2(bid_price);
l3_bid_price=lag3(bid_price);
retain prev_bid_price;
if price^="." then updated_bid=prev_bid_price;
else if price ="." then prev_bid_price=bid_price;
if price ="." and bid_price="." then prev_bid_price=l_bid_price;
if price= "." and bid_price="." and l_bid_price="." then prev_bid_price=l2_bid_price;
if price= "." and bid_price="." and l_bid_price="." and l2_bid_price="."
then prev_bid_price=l3_bid_price;
run;

Is there a way to fix this? i'm gonna keep trying and ill appreciate any help in regards to this. Thanks! Message was edited by: adrian.waw
Cynthia_sas
SAS Super FREQ
Hi:
This looks like a Time Series problem to me. Have you read the documentation on using Time Series data with SAS/ETS???
http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#/documentation/cdl/e...
This documentation also mentions the use of PROC EXPAND.

This paper describes a series of related programs that this researcher used with time series data, specifically it mentions ask price and bid price and his sample data has an observation for every second.
http://pages.stern.nyu.edu/~jhasbrou/Research/Working%20Papers/CointDocumentation01.pdf

Perhaps these references will get you started on the approach to your analytic task.

cynthia
adrian_waw
Calcite | Level 5
Thanks for the links Cynthia. I'll give it a go.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1243 views
  • 0 likes
  • 2 in conversation