Hello everybody,
I have two tables. First table has name, date, time and intraday price variables. It means there is an intraday price for each name in a specific date and time. Second table has name, date and daily price and the daily price is intraday price aggregation for each name and date.
I try to write a program which performs the procedure below:
1- It can find same observations by name and date in two tables and then;
2- If first and last intraday price be out of 0.962 and 1.0398 times of daily price in last day; then deletes all data related to that specific name and date in table 1.
The statement is:
IF first AND last (intraday price for specific name & date) NOT IN [0.962*(daily price of yesterday), 1.0398*(daily price of yesterday)] THEN DELETE.
For instance, consider two tables which are below:
table 1
name | date | time | intraday price |
A | 1 | 1 | 3 |
A | 1 | 2 | 2 |
A | 1 | 3 | 1 |
B | 1 | 1 | 3.1 |
B | 1 | 2 | 1 |
B | 1 | 3 | 4 |
B | 1 | 4 | 2.9 |
C | 1 | 1 | 3 |
C | 1 | 2 | 2 |
A | 2 | 1 | 3.95 |
A | 2 | 2 | 3 |
A | 2 | 3 | 6 |
A | 2 | 4 | 4.01 |
B | 2 | 1 | 6 |
B | 2 | 2 | 1 |
C | 2 | 1 | 6.1 |
C | 2 | 2 | 3 |
C | 2 | 3 | 2 |
B | 3 | 1 | 5 |
B | 3 | 2 | 7 |
C | 3 | 1 | 8 |
C | 3 | 2 | 2 |
C | 3 | 3 | 3 |
C | 3 | 4 | 2 |
And table 2:
name | date | daily price |
A | 0 | 5 |
B | 0 | 3 |
C | 0 | 7 |
A | 1 | 4 |
B | 1 | 4 |
C | 1 | 6 |
B | 2 | 3 |
C | 2 | 5 |
Then, based on the procedure, The result is:
name | date | time | intraday price |
B | 1 | 1 | 3.1 |
B | 1 | 2 | 1 |
B | 1 | 3 | 4 |
B | 1 | 4 | 2.9 |
A | 2 | 1 | 3.95 |
A | 2 | 2 | 3 |
A | 2 | 3 | 6 |
A | 2 | 4 | 4.01 |
C | 2 | 1 | 6.1 |
C | 2 | 2 | 3 |
C | 2 | 3 | 2 |
Would you please tell me how I can do that?
Thanks in advance.
Thanks @Shmuel and @Kurt_Bremser
The statement is:
IF first AND last (intraday price for specific name & date) NOT IN [0.962*(daily price of yesterday), 1.0398*(daily price of yesterday)] THEN DELETE.
There are still two problems:
1- lines
if last.date then last_price = intraday_price;
and
not (min_price le last_price le max_price)
should be added.
2- There is lack of matching for last data in table 1 because of merging. So the calculation doesn't work well and I add data for 09 MAY 2008 to solve it.
Based on their works, the result is:
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
proc sql;
create table table3 as
select * from table1, table2
where table1.name=table2.name and table1.date=table2.date;
quit;
data table2_new;
set table2;
by name;
/* save price of yesterday */
lag_Price = lag(Price);
if first.name then lag_Price = .;
run;
data to_delete(keep = name date);
merge table3 (in=in1)
table2_new (in=in2);
by name date;
retain start_price last_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intradayprice;
if last.date then last_price = intradayprice;
if last.date then do;
min_price = 0.962 * lag_Price;
max_price = 1.0398 * lag_Price;
if not (min_price le start_price le max_price) and not (min_price le last_price le max_price)
then output;
end;
run;
data want;
merge table3 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;
Sort your table1 by name, date and time.
Sort table2 by name and date.
Create a new table from merging table1 and table2 (by name date), with a new variable start_price, which is retained. Set start_price at first.date. At last.date, compare start_price and current price with the aggregated price according to your formula, and output if condition is met. Keep only name and date.
Now, merge table1 with the new table (again by name and date), and keep or delete if a match is found/not found.
Here is the code based on @Kurt_Bremser design:
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
data to_delete(keep = name date);
merge table1 (in=in1)
table2 (in=in2);
by name date;
retain start_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intraday_price;
if last.date then do;
min_price = 0.962 * daily_price;
max_price = 1.0398 * daily_price;
if not (min_price le start_price le max_price)
then output;
end;
run;
data want;
merge table 1 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;
Thanks @Shmuel
However the code doesn't work correct.
Here is an example of the table 1:
data WORK.TABLE1; infile datalines dsd truncover; input name:$3. date:DATE9. time:TIME8. intraday_price:32.; format date DATE9. time TIME8.; label name="name" date="date" time="time" intraday_price="intraday price"; datalines4; A,07MAY2008,11:32:41,3 A,07MAY2008,12:32:41,2 A,07MAY2008,13:32:41,1 A,08MAY2008,11:32:41,3.95 A,08MAY2008,12:32:41,3 A,08MAY2008,13:32:41,6 A,08MAY2008,14:32:41,4.01 B,07MAY2008,11:32:41,3.1 B,07MAY2008,12:32:41,1 B,07MAY2008,13:32:41,4 B,07MAY2008,14:32:41,2.9 B,08MAY2008,11:32:41,6 B,08MAY2008,12:32:41,1 B,09MAY2008,11:32:41,5 B,09MAY2008,12:32:41,7 C,07MAY2008,11:32:41,3 C,07MAY2008,12:32:41,2 C,08MAY2008,11:32:41,6.1 C,08MAY2008,12:32:41,3 C,08MAY2008,13:32:41,2 C,09MAY2008,11:32:41,8 C,09MAY2008,12:32:41,2 C,09MAY2008,13:32:41,3 C,09MAY2008,14:32:41,2 ;;;;
And the table 2 is:
data WORK.TABLE2; infile datalines dsd truncover; input name:$3. date:DATE9. daily_price:32.; format date DATE9.; label name="name" date="date" daily_price="daily price"; datalines4; A,05MAY2008,3
B,05MAY2008,6
C,05MAY2008,5
A,06MAY2008,5 A,07MAY2008,4 B,06MAY2008,3 B,07MAY2008,4 B,08MAY2008,3 C,06MAY2008,7 C,07MAY2008,6 C,08MAY2008,5 ;;;;
Please consider that the daily price of yesterday should be used in formula.
So the result should be:
name | date | time | intraday price |
B | 7-May-08 | 11:32:41 | 3.1 |
B | 7-May-08 | 12:32:41 | 1 |
B | 7-May-08 | 13:32:41 | 4 |
B | 7-May-08 | 14:32:41 | 2.9 |
A | 8-May-08 | 11:32:41 | 3.95 |
A | 8-May-08 | 12:32:41 | 3 |
A | 8-May-08 | 13:32:41 | 6 |
A | 8-May-08 | 14:32:41 | 4.01 |
C | 8-May-08 | 11:32:41 | 6.1 |
C | 8-May-08 | 12:32:41 | 3 |
C | 8-May-08 | 13:32:41 | 2 |
Please notice that the date variable in table 1 starts from 07MAY2008, however the table 2 begins from 05MAY2008.
Maybe I made a mistake to explain it.
The statement is:
IF first AND last (intraday price for specific name & date) NOT IN [0.962*(daily price of yesterday), 1.0398*(daily price of yesterday)] THEN DELETE.
@aminkarimid, do you understand the coed?
I hoped you'll be able to check it and addapt it, more precisely, to your needs.
Maybe if you change the next lines - defining which data to remove - :
if not (min_price le start_price le max_price)
then output;
into:
if not (min_price le start_price le max_price) and not (min_price le
intraday_price
le max_price)
then output;
it will fit your demnd;
Thanks @Shmuel;
I can almost understand your code.
I think there is a problem:
min_price = 0.962 * daily_price; max_price = 1.0398 * daily_price;
The daily_price which is used in these lines should be the daily price of specific name and date in yesterday.
@aminkarimid wrote:
Thanks @Shmuel;
I can almost understand your code.
I think there is a problem:min_price = 0.962 * daily_price; max_price = 1.0398 * daily_price;The daily_price which is used in these lines should be the daily price of specific name and date in yesterday.
Then you should add 1 to the date in table2 before merging. Note that SAS dates are counts of days.
@aminkarimid, I got your point.
I have added a step to save the previous day daily_price in table2 using lag() fubction;
Pay attention - you cannot use lag function conditionally (under if statement).
Here is the full tested code:
data WORK.TABLE1;
infile datalines dsd truncover;
input name:$3. date:DATE9. time:TIME8. intraday_price:32.;
format date DATE9. time TIME8.;
label name="name" date="date" time="time" intraday_price="intraday price";
datalines4;
A,07MAY2008,11:32:41,3
A,07MAY2008,12:32:41,2
A,07MAY2008,13:32:41,1
A,08MAY2008,11:32:41,3.95
A,08MAY2008,12:32:41,3
A,08MAY2008,13:32:41,6
A,08MAY2008,14:32:41,4.01
B,07MAY2008,11:32:41,3.1
B,07MAY2008,12:32:41,1
B,07MAY2008,13:32:41,4
B,07MAY2008,14:32:41,2.9
B,08MAY2008,11:32:41,6
B,08MAY2008,12:32:41,1
B,09MAY2008,11:32:41,5
B,09MAY2008,12:32:41,7
C,07MAY2008,11:32:41,3
C,07MAY2008,12:32:41,2
C,08MAY2008,11:32:41,6.1
C,08MAY2008,12:32:41,3
C,08MAY2008,13:32:41,2
C,09MAY2008,11:32:41,8
C,09MAY2008,12:32:41,2
C,09MAY2008,13:32:41,3
C,09MAY2008,14:32:41,2
;;;;
run;
data WORK.TABLE2;
infile datalines dsd truncover;
input name:$3. date:DATE9. daily_price:32.;
format date DATE9.;
label name="name" date="date" daily_price="daily price";
datalines4;
A,05MAY2008,3
B,05MAY2008,6
C,05MAY2008,5
A,06MAY2008,5
A,07MAY2008,4
B,06MAY2008,3
B,07MAY2008,4
B,08MAY2008,3
C,06MAY2008,7
C,07MAY2008,6
C,08MAY2008,5
;;;;
run;
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
data table2_new;
set table2;
by name;
/* save price of yesterday */
lag_price = lag(daily_price);
if first.name then lag_price = .;
run;
data to_delete(keep = name date);
merge table1 (in=in1)
table2_new (in=in2);
by name date;
retain start_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intraday_price;
if last.date then do;
min_price = 0.962 * lag_price;
max_price = 1.0398 * lag_price;
if not (min_price le start_price le max_price) and
not (min_price le intraday_price le max_price)
then output;
end;
run;
data want;
merge table1 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;
Thanks @Shmuel and @Kurt_Bremser
The statement is:
IF first AND last (intraday price for specific name & date) NOT IN [0.962*(daily price of yesterday), 1.0398*(daily price of yesterday)] THEN DELETE.
There are still two problems:
1- lines
if last.date then last_price = intraday_price;
and
not (min_price le last_price le max_price)
should be added.
2- There is lack of matching for last data in table 1 because of merging. So the calculation doesn't work well and I add data for 09 MAY 2008 to solve it.
Based on their works, the result is:
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
proc sql;
create table table3 as
select * from table1, table2
where table1.name=table2.name and table1.date=table2.date;
quit;
data table2_new;
set table2;
by name;
/* save price of yesterday */
lag_Price = lag(Price);
if first.name then lag_Price = .;
run;
data to_delete(keep = name date);
merge table3 (in=in1)
table2_new (in=in2);
by name date;
retain start_price last_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intradayprice;
if last.date then last_price = intradayprice;
if last.date then do;
min_price = 0.962 * lag_Price;
max_price = 1.0398 * lag_Price;
if not (min_price le start_price le max_price) and not (min_price le last_price le max_price)
then output;
end;
run;
data want;
merge table3 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;
@aminkarimid wrote:
Thanks @Kurt_Bremser;
Would you please write the code?
because I am new with SAS and I cannot do it on my own.
Then you should start to learn immediately here: https://support.sas.com/edu/schedules.html?ctry=us&crs=PROG1
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.