Hi everyone,
I need some help. I have a massive dataset (tick by tick data). The columns of the table are: Dates(i.e.different dates), Hour, Minutes, Price, Volume. My problem is that some minutes are missing, so I want to insert them in the data. For instance I have data for minute 25 26 28 30 35 and I want to add rows for minute 27, 29, 31, 32,34, 33 etc for each hour and each date. The data for the price and volume in the new rows should be empty. Could you please give any suggestion? Thanks
You have two options. One is to go through the dataset, and do do-loops where gaps exist. The other option is to create a template and merge that back to your data. I will give an example of the second approach:
data have;
id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;
id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;
id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;
id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;
format date date9.;
run;
/* Create a dataset expanded for each ID from minimum to maximum */
proc sql;
create table TEMPLATE as
select ID,
min(dhms(date,hour,minute,0)) as LOW,
max(dhms(date,hour,minute,0)) as HIGH
from HAVE
group by ID;
quit;
data template (drop=low high i);
set template;
do i=low to high by 60;
date=datepart(i);
hour=hour(i);
minute=minute(i);
output;
end;
run;
/* Merge the template onto the data */
proc sql;
create table WANT as
select COALESCE(A.ID,B.ID) as ID,
COALESCE(A.DATE,B.DATE) as DATE format=date9.,
COALESCE(A.HOUR,B.HOUR) as HOUR,
COALESCE(A.MINUTE,B.MINUTE) as MINUTE,
A.PRICE,
A.VOLUME
from WORK.HAVE A
full join WORK.TEMPLATE B
on A.ID=B.ID
and A.DATE=B.DATE
and A.HOUR=B.HOUR
and A.MINUTE=B.MINUTE;
quit;
You have two options. One is to go through the dataset, and do do-loops where gaps exist. The other option is to create a template and merge that back to your data. I will give an example of the second approach:
data have;
id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;
id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;
id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;
id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;
format date date9.;
run;
/* Create a dataset expanded for each ID from minimum to maximum */
proc sql;
create table TEMPLATE as
select ID,
min(dhms(date,hour,minute,0)) as LOW,
max(dhms(date,hour,minute,0)) as HIGH
from HAVE
group by ID;
quit;
data template (drop=low high i);
set template;
do i=low to high by 60;
date=datepart(i);
hour=hour(i);
minute=minute(i);
output;
end;
run;
/* Merge the template onto the data */
proc sql;
create table WANT as
select COALESCE(A.ID,B.ID) as ID,
COALESCE(A.DATE,B.DATE) as DATE format=date9.,
COALESCE(A.HOUR,B.HOUR) as HOUR,
COALESCE(A.MINUTE,B.MINUTE) as MINUTE,
A.PRICE,
A.VOLUME
from WORK.HAVE A
full join WORK.TEMPLATE B
on A.ID=B.ID
and A.DATE=B.DATE
and A.HOUR=B.HOUR
and A.MINUTE=B.MINUTE;
quit;
Thank you so much RW9 for the quick reply. I will try it and come back to you whether it works for my dataset. I run your example and works.
Just a quick question. How can I get the ID for each date?Should I do a loop? Thanks
Not sure I am following you? This section will get you distinct ID's with their first and last dates, its used in the loop over, so if min date is 01jan15 and 05jan15 is max, then it will do each day.
I mean in my set I do not have ID column. So, I want first to generate this column for each date. For example I have 01/02/2007, 01/02/2007 etc, 02/02/2007 etc. I understand I should have an ID 1 for 01/02/2007, ID 2 for 02/02/2007 etc in order to implement the code.
Oh, just drop the id bit then:
proc sql;
create table TEMPLATE as
select min(dhms(date,hour,minute,0)) as LOW,
max(dhms(date,hour,minute,0)) as HIGH
from HAVE;
quit;
Thank you so much RW9. It worked perfectly. I did the necessary adjustments and now I have exactly what I wanted. I am new to SAS and I have tried everything by myself, so your help saved me hours . I have been trying to fix this problem for the past two days. :smileyplain:
data have; id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output; id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output; id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output; id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output; format date date9.; run; data want; merge have have(firstobs=2 rename=(id=_id date=_date hour=_hour minute=_minute)); output; if id=_id then do; do i=dhms(date,hour,minute,0)+60 to dhms(_date,_hour,_minute,0)-60 by 60; date=datepart(i);hour=hour(timepart(i));minute=minute(timepart(i));price=.;volume=.; output; end; end; format i datetime.; drop _:; run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.