BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Themis_84
Calcite | Level 5

Hi everyone,

I need some help. I have a massive dataset (tick by tick data). The columns of the table are: Dates(i.e.different dates), Hour, Minutes, Price, Volume. My problem is that some minutes are missing, so I want to insert them in the data. For instance I have data for minute 25 26 28 30 35 and I want to add rows for minute 27, 29, 31, 32,34, 33 etc for each hour and each date. The data for the price and volume  in the new rows should be empty. Could you please give any suggestion? Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have two options.  One is to go through the dataset, and do do-loops where gaps exist.  The other option is to create a template and merge that back to your data.  I will give an example of the second approach:

data have;

  id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;

  id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;

  id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;

  id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;

  format date date9.;

run;

/* Create a dataset expanded for each ID from minimum to maximum */

proc sql;

  create table TEMPLATE as

  select  ID,

          min(dhms(date,hour,minute,0)) as LOW,

          max(dhms(date,hour,minute,0)) as HIGH

  from    HAVE

  group by ID;

quit;

data template (drop=low high i);

  set template;

  do i=low to high by 60;

    date=datepart(i);

    hour=hour(i);

    minute=minute(i);

    output;

  end;

run;

/* Merge the template onto the data */

proc sql;

  create table WANT as

  select  COALESCE(A.ID,B.ID) as ID,

          COALESCE(A.DATE,B.DATE) as DATE format=date9.,

          COALESCE(A.HOUR,B.HOUR) as HOUR,

          COALESCE(A.MINUTE,B.MINUTE) as MINUTE,

          A.PRICE,

          A.VOLUME

  from    WORK.HAVE A

  full join WORK.TEMPLATE B

  on      A.ID=B.ID

  and     A.DATE=B.DATE

  and     A.HOUR=B.HOUR

  and     A.MINUTE=B.MINUTE;

quit;

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have two options.  One is to go through the dataset, and do do-loops where gaps exist.  The other option is to create a template and merge that back to your data.  I will give an example of the second approach:

data have;

  id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;

  id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;

  id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;

  id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;

  format date date9.;

run;

/* Create a dataset expanded for each ID from minimum to maximum */

proc sql;

  create table TEMPLATE as

  select  ID,

          min(dhms(date,hour,minute,0)) as LOW,

          max(dhms(date,hour,minute,0)) as HIGH

  from    HAVE

  group by ID;

quit;

data template (drop=low high i);

  set template;

  do i=low to high by 60;

    date=datepart(i);

    hour=hour(i);

    minute=minute(i);

    output;

  end;

run;

/* Merge the template onto the data */

proc sql;

  create table WANT as

  select  COALESCE(A.ID,B.ID) as ID,

          COALESCE(A.DATE,B.DATE) as DATE format=date9.,

          COALESCE(A.HOUR,B.HOUR) as HOUR,

          COALESCE(A.MINUTE,B.MINUTE) as MINUTE,

          A.PRICE,

          A.VOLUME

  from    WORK.HAVE A

  full join WORK.TEMPLATE B

  on      A.ID=B.ID

  and     A.DATE=B.DATE

  and     A.HOUR=B.HOUR

  and     A.MINUTE=B.MINUTE;

quit;

Themis_84
Calcite | Level 5

Thank you so much RW9 for the quick reply. I will try it and come back to you whether it works for my dataset. I run your example and works.

Themis_84
Calcite | Level 5

Just a quick question. How can I get the ID for each date?Should I do a loop? Thanks

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not sure I am following you?  This section will get you distinct ID's with their first and last dates, its used in the loop over, so if min date is 01jan15 and 05jan15 is max, then it will do each day.

Themis_84
Calcite | Level 5

I mean in my set I do not have ID column. So, I want first to generate this column for each date. For example I have 01/02/2007, 01/02/2007 etc, 02/02/2007 etc. I understand I should have an ID 1 for 01/02/2007, ID 2 for 02/02/2007 etc in order to implement the code.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Oh, just drop the id bit then:

proc sql;

  create table TEMPLATE as

  select  min(dhms(date,hour,minute,0)) as LOW,

          max(dhms(date,hour,minute,0)) as HIGH

  from    HAVE;

quit;

Themis_84
Calcite | Level 5

Thank you so much RW9. It worked perfectly. Smiley Happy  I did the necessary adjustments and now I have exactly what I wanted.  I am new to SAS and I have tried everything by myself, so your help saved me hours Smiley Wink. I have been trying to fix this problem for the past two days. :smileyplain:

Ksharp
Super User
data have;
  id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;
  id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;
  id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;
  id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;
  format date date9.;
run;
data want;
 merge have have(firstobs=2 rename=(id=_id date=_date hour=_hour minute=_minute));
 output;
 if id=_id then do;
   do i=dhms(date,hour,minute,0)+60 to dhms(_date,_hour,_minute,0)-60 by 60;
    date=datepart(i);hour=hour(timepart(i));minute=minute(timepart(i));price=.;volume=.;
     output;
   end;
 end; format i datetime.;
drop _:;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4048 views
  • 4 likes
  • 3 in conversation