BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Themis_84
Calcite | Level 5

Hi everyone,

I need some help. I have a massive dataset (tick by tick data). The columns of the table are: Dates(i.e.different dates), Hour, Minutes, Price, Volume. My problem is that some minutes are missing, so I want to insert them in the data. For instance I have data for minute 25 26 28 30 35 and I want to add rows for minute 27, 29, 31, 32,34, 33 etc for each hour and each date. The data for the price and volume  in the new rows should be empty. Could you please give any suggestion? Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have two options.  One is to go through the dataset, and do do-loops where gaps exist.  The other option is to create a template and merge that back to your data.  I will give an example of the second approach:

data have;

  id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;

  id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;

  id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;

  id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;

  format date date9.;

run;

/* Create a dataset expanded for each ID from minimum to maximum */

proc sql;

  create table TEMPLATE as

  select  ID,

          min(dhms(date,hour,minute,0)) as LOW,

          max(dhms(date,hour,minute,0)) as HIGH

  from    HAVE

  group by ID;

quit;

data template (drop=low high i);

  set template;

  do i=low to high by 60;

    date=datepart(i);

    hour=hour(i);

    minute=minute(i);

    output;

  end;

run;

/* Merge the template onto the data */

proc sql;

  create table WANT as

  select  COALESCE(A.ID,B.ID) as ID,

          COALESCE(A.DATE,B.DATE) as DATE format=date9.,

          COALESCE(A.HOUR,B.HOUR) as HOUR,

          COALESCE(A.MINUTE,B.MINUTE) as MINUTE,

          A.PRICE,

          A.VOLUME

  from    WORK.HAVE A

  full join WORK.TEMPLATE B

  on      A.ID=B.ID

  and     A.DATE=B.DATE

  and     A.HOUR=B.HOUR

  and     A.MINUTE=B.MINUTE;

quit;

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have two options.  One is to go through the dataset, and do do-loops where gaps exist.  The other option is to create a template and merge that back to your data.  I will give an example of the second approach:

data have;

  id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;

  id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;

  id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;

  id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;

  format date date9.;

run;

/* Create a dataset expanded for each ID from minimum to maximum */

proc sql;

  create table TEMPLATE as

  select  ID,

          min(dhms(date,hour,minute,0)) as LOW,

          max(dhms(date,hour,minute,0)) as HIGH

  from    HAVE

  group by ID;

quit;

data template (drop=low high i);

  set template;

  do i=low to high by 60;

    date=datepart(i);

    hour=hour(i);

    minute=minute(i);

    output;

  end;

run;

/* Merge the template onto the data */

proc sql;

  create table WANT as

  select  COALESCE(A.ID,B.ID) as ID,

          COALESCE(A.DATE,B.DATE) as DATE format=date9.,

          COALESCE(A.HOUR,B.HOUR) as HOUR,

          COALESCE(A.MINUTE,B.MINUTE) as MINUTE,

          A.PRICE,

          A.VOLUME

  from    WORK.HAVE A

  full join WORK.TEMPLATE B

  on      A.ID=B.ID

  and     A.DATE=B.DATE

  and     A.HOUR=B.HOUR

  and     A.MINUTE=B.MINUTE;

quit;

Themis_84
Calcite | Level 5

Thank you so much RW9 for the quick reply. I will try it and come back to you whether it works for my dataset. I run your example and works.

Themis_84
Calcite | Level 5

Just a quick question. How can I get the ID for each date?Should I do a loop? Thanks

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not sure I am following you?  This section will get you distinct ID's with their first and last dates, its used in the loop over, so if min date is 01jan15 and 05jan15 is max, then it will do each day.

Themis_84
Calcite | Level 5

I mean in my set I do not have ID column. So, I want first to generate this column for each date. For example I have 01/02/2007, 01/02/2007 etc, 02/02/2007 etc. I understand I should have an ID 1 for 01/02/2007, ID 2 for 02/02/2007 etc in order to implement the code.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Oh, just drop the id bit then:

proc sql;

  create table TEMPLATE as

  select  min(dhms(date,hour,minute,0)) as LOW,

          max(dhms(date,hour,minute,0)) as HIGH

  from    HAVE;

quit;

Themis_84
Calcite | Level 5

Thank you so much RW9. It worked perfectly. Smiley Happy  I did the necessary adjustments and now I have exactly what I wanted.  I am new to SAS and I have tried everything by myself, so your help saved me hours Smiley Wink. I have been trying to fix this problem for the past two days. :smileyplain:

Ksharp
Super User
data have;
  id=1; date="01jan15"d; hour=3; minute=4; price=12; volume=16; output;
  id=1; date="01jan15"d; hour=3; minute=5; price=16; volume=20; output;
  id=1; date="01jan15"d; hour=3; minute=10; price=8; volume=4; output;
  id=1; date="01jan15"d; hour=5; minute=10; price=7; volume=25; output;
  format date date9.;
run;
data want;
 merge have have(firstobs=2 rename=(id=_id date=_date hour=_hour minute=_minute));
 output;
 if id=_id then do;
   do i=dhms(date,hour,minute,0)+60 to dhms(_date,_hour,_minute,0)-60 by 60;
    date=datepart(i);hour=hour(timepart(i));minute=minute(timepart(i));price=.;volume=.;
     output;
   end;
 end; format i datetime.;
drop _:;
run;

Xia Keshan

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4126 views
  • 4 likes
  • 3 in conversation