Convert SQL code into Proq SQL - (matching data task)

Reply
Contributor
Posts: 57

Convert SQL code into Proq SQL - (matching data task)

Hello,

I have programmed an SQL code into access and would like to use the same in SAS since sas can take more observations than access (access has a limit of 2million rows)

Could you tell me how my code should be adjusted to run into SAS as it currently produces several error messages?

My sql code matches two datasets, using a date variable (date,time,) as the basis.

Below is my code:

SELECT (select top 1 reutersminute.Date_G_ from reutersminute where reutersminute.Date_G_ + reutersminute.Time_G_  >=euro04112013.LastModificationDate order by  reutersminute.Date_G_ + reutersminute.Time_G_) AS rDate_G_, (select top 1 reutersminute.Time_G_ from reutersminute where reutersminute.Date_G_ + reutersminute.Time_G_  >=euro04112013.LastModificationDate order by  reutersminute.Date_G_ + reutersminute.Time_G_) AS rTime_G_, (select top 1 reutersminute.Ex_Cntrb_ID from reutersminute where reutersminute.Date_G_ + reutersminute.Time_G_  >=euro04112013.LastModificationDate order by  reutersminute.Date_G_ + reutersminute.Time_G_) AS rEx_Cntrb_ID, (select top 1 reutersminute.Bid_Price from reutersminute where reutersminute.Date_G_ + reutersminute.Time_G_  >=euro04112013.LastModificationDate order by  reutersminute.Date_G_ + reutersminute.Time_G_) AS rBid_Price, (select top 1 reutersminute.Ask_Price from reutersminute where reutersminute.Date_G_ + reutersminute.Time_G_  >=euro04112013.LastModificationDate order by  reutersminute.Date_G_ + reutersminute.Time_G_) AS rAsk_Price, (select top 1 reutersminute.Midquote from reutersminute where reutersminute.Date_G_ + reutersminute.Time_G_  >=euro04112013.LastModificationDate order by  reutersminute.Date_G_ + reutersminute.Time_G_) AS rMidquote, euro04112013.*

FROM euro04112013

ORDER BY LastModificationDate;

Any help is appreciated

Thanks

Neo

Attachment
Attachment
Super User
Posts: 19,815

Re: Convert SQL code into Proq SQL - (matching data task)

The TOP function isn't SAS SQL valid.

Contributor
Posts: 57

Re: Convert SQL code into Proq SQL - (matching data task)

what would be an equivalent function in SAS? or no 1 to 1 option exists?

Super User
Super User
Posts: 7,050

Re: Convert SQL code into Proq SQL - (matching data task)

Sounds like you are trying to mimic FIRST. processing in SQL.

Join the two files by your criteria and then use a data step to implement the FIRST. processing.  Probably more efficient to do it as a view.

Also you cannot add a DATE and TIME in SAS and get a meaningful result.  Dates are stored as the integer number of days and time is the integer number of seconds.  You can use DHMS() function to convert a date and time value into a datetime value.

proc sql ;

create view temp as select * from reutersminute, euro04112013

where dhms(reutersminute.Date_G_,0,0,reutersminute.Time_G_)  >=euro04112013.LastModificationDate

order by euro04112013.LastModificationDate, reutersminute.Date_G_ ,reutersminute.Time_G_

;

quit;

data want ;

set temp;

by LastModificationDate;

if first.LastModificationDate;

run;

Contributor
Posts: 57

Re: Convert SQL code into Proq SQL - (matching data task)

Hi Tom

apologies for the late reply.

I tried to run the code a couple of times but i believe there is an issue with the data format. See below the log for details:

1 proc sql ;

2 create view temp as select * from reutersminute, euro04112013

3 where dhms(reutersminute.Date_G_,0,0,reutersminute.Time_G_)

3 ! >=euro04112013.LastModificationDate

4 order by euro04112013.LastModificationDate, reutersminute.Date_G_ ,reutersminute.Time_G_

5 ;

NOTE: SQL view WORK.TEMP has been defined.

6 quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.12 seconds

      cpu time            0.01 seconds

7

8 data want ;

9 set temp;

NOTE: Data file WORK.REUTERSMINUTE.DATA is in a format that is native to another host, or the

      file encoding does not match the session encoding. Cross Environment Data Access will be

      used, which might require additional CPU resources and might reduce performance.

NOTE: The execution of this query involves performing one or more Cartesian product joins that

      can not be optimized.

10 by LastModificationDate;

11 if first.LastModificationDate;

12 run;

NOTE: Invalid argument to function DHMS. Missing values may be generated.

NOTE: There were 127258 observations read from the data set WORK.REUTERSMINUTE.

NOTE: There were 575953 observations read from the data set WORK.EURO04112013.

NOTE: There were 0 observations read from the data set WORK.TEMP.

NOTE: The data set WORK.WANT has 0 observations and 13 variables.

NOTE: DATA statement used (Total process time):

      real time           1:04:04.20

      cpu time            50:00.89


Maybe you have an idea how to update the code to run correctly?

Many thanks in advance

Neo

Super User
Super User
Posts: 7,050

Re: Convert SQL code into Proq SQL - (matching data task)

Look at some of the values for your three date/time/datetime variables and see that values they are actually using.  Then you can adjust your join criteria.

Contributor
Posts: 57

Re: Convert SQL code into Proq SQL - (matching data task)

Tom, is it possible to look into the example excel files i uploaded and let me know what should be updated in the code you proposed? is the date format the problem or something else?

Your support is very much appreciated

Best regards

Neo

Super User
Super User
Posts: 7,050

Re: Convert SQL code into Proq SQL - (matching data task)

So it looks the data is there in the Excel file but it is formatted very strangely.

How did you attempt to get the data into SAS?

Are those Excel files the actual source data or were they generated from a real database system?  It would be better to let SAS pull the data from a real database if you have one so it can tell if the values are dates, times or datetimes.

You could get better results by assigning proper formats to the values in Excel and exporting the sheets to CSV files and then telling SAS to import those.

Contributor
Posts: 57

Re: Convert SQL code into Proq SQL - (matching data task)

I think the data look a bit strange now as I exported them from sas into excel. I can provide the sas files directly. I build the sas database from csv files since i have 8million rows in one of the two files (the reuters one)

Let me share the files in SAS and see if you can give me a bit more advice how to run the code correctly

Neo

Super User
Super User
Posts: 7,050

Re: Convert SQL code into Proq SQL - (matching data task)

So all of your date and time related variables actually have DATETIME values in them.  You could just add them as you were doing since the date part of the time variable is 0 and the time part of the date variable is also 0.  But it make more sense to convert DATE_G_ to an actual date and TIME_G_ to an actual time.

262 proc sql ;

263 create table temp as

264 select *

265 from x.reuterssample a

266 , x.eurosample b

267 where dhms(datepart(A.Date_G_),0,0,timepart(A.Time_G_))  >= b.LastModificationDate

268 order by b.LastModificationDate, a.Date_G_ , a.Time_G_

269  ;

NOTE: Data file X.REUTERSSAMPLE.DATA is in a format that is native to another host, or the

file encoding does not match the session encoding. Cross Environment Data Access will be

used, which might require additional CPU resources and might reduce performance.

NOTE: Data file X.EUROSAMPLE.DATA is in a format that is native to another host, or the file

encoding does not match the session encoding. Cross Environment Data Access will be

used, which might require additional CPU resources and might reduce performance.

NOTE: The execution of this query involves performing one or more Cartesian product joins that

can not be optimized.

NOTE: Table WORK.TEMP created, with 2383188 rows and 13 columns.

270 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time           3.86 seconds

cpu time            4.68 seconds

271

272 data want ;

273 set temp;

274 by LastModificationDate;

275 if first.LastModificationDate;

276 date_g_ = datepart(date_g_);

277 time_g_ = timepart(time_g_);

278 format date_g_ date9. time_g_ time8. lastmodificationdate datetime20.;

279 N+1;

280 if N <= 3 then put (_all_) (=/);

281 run;

minute=02JAN12:00:52

Date_G_=02JAN2012

Time_G_=0:52:01

Ex_Cntrb_ID=ZUERCHER KB  ZUR

Volume=

Bid_Price=1.2946

Ask_Price=1.2947

_ric2=EUR/USD

Midquote=1.29465

SamplingWeight=6

Deal_Id=81373851

SpotRate=1.29459

LastModificationDate=02JAN2012:00:51:59

N=1

minute=02JAN12:00:53

Date_G_=02JAN2012

Time_G_=0:53:08

Ex_Cntrb_ID=CIMB         GFX

Volume=

Bid_Price=1.2945

Ask_Price=1.2948

_ric2=EUR/USD

Midquote=1.29465

SamplingWeight=2

Deal_Id=81373852

SpotRate=1.2946

LastModificationDate=02JAN2012:00:52:05

N=2

minute=02JAN12:06:00

Date_G_=02JAN2012

Time_G_=6:00:55

Ex_Cntrb_ID=DANSKE BANK  COP

Volume=

Bid_Price=1.2929

Ask_Price=1.2934

_ric2=EUR/USD

Midquote=1.29315

SamplingWeight=12

Deal_Id=81374257

SpotRate=1.29311

LastModificationDate=02JAN2012:06:00:30

N=3

NOTE: There were 2383188 observations read from the data set WORK.TEMP.

NOTE: The data set WORK.WANT has 482 observations and 14 variables.

NOTE: DATA statement used (Total process time):

real time           0.29 seconds

      cpu time            0.29 second

Contributor
Posts: 57

Re: Convert SQL code into Proq SQL - (matching data task)

Hi Tom,

the upated code works for  the sample but it looks not to be efficient for the whole dataset. I will give an example, the first table which is created matches too many records from the one dataset into the other. this means that if we want to match 500 000 rows from one dataset using as a source the other one which has 8 million the system is not able to process those. it gives me a sort failure. this is the case since the first table that is created has minimu 70-80million of rows which then need to be sorted. look at the code from below. is there another way to create a smaller table in the first part of the code you provided? to limit for example the number of rows which will be matched to e.g 10. also will it help to sort both datasets by datetime before i start the matching process??

82

83

84   data neo.want ;

85   set temp;

86   by LastModificationDate;

87   if first.LastModificationDate;

88   date_g_ = datepart(date_g_);

89   time_g_ = timepart(time_g_);

90   format date_g_ date9. time_g_ time8. lastmodificationdate datetime20.;

91   N+1;

92   if N <= 3 then put (_all_) (=/);

93   run;

minute=02JAN12:00:52

Date_G_=02JAN2012

Time_G_=0:52:01

Ex_Cntrb_ID=ZUERCHER KB  ZUR

Volume=

Bid_Price=1.2946

Ask_Price=1.2947

_ric2=EUR/USD

Midquote=1.29465

SamplingWeight=6

Deal_Id=81373851

SpotRate=1.29459

LastModificationDate=02JAN2012:00:51:59

N=1

minute=02JAN12:00:53

Date_G_=02JAN2012

Time_G_=0:53:08

Ex_Cntrb_ID=CIMB         GFX

Volume=

Bid_Price=1.2945

Ask_Price=1.2948

_ric2=EUR/USD

Midquote=1.29465

SamplingWeight=2

Deal_Id=81373852

SpotRate=1.2946

LastModificationDate=02JAN2012:00:52:05

N=2

minute=02JAN12:06:00

Date_G_=02JAN2012

Time_G_=6:00:55

Ex_Cntrb_ID=DANSKE BANK  COP

Volume=

Bid_Price=1.2929

Ask_Price=1.2934

_ric2=EUR/USD

Midquote=1.29315

SamplingWeight=12

Deal_Id=81374257

SpotRate=1.29311

LastModificationDate=02JAN2012:06:00:30

N=3

NOTE: There were 33754860 observations read from the data set WORK.TEMP.

NOTE: The data set NEO.WANT has 2857 observations and 14 variables.

NOTE: DATA statement used (Total process time):

      real time           56.96 seconds

      cpu time            5.97 seconds

94   proc sql ;

95   create table neo.temp as select *from neo.reutersminute a, neo.euro04112013 b

96   where dhms(datepart(A.Date_G_),0,0,timepart(A.Time_G_))  >= b.LastModificationDate

97   order by b.LastModificationDate, a.Date_G_ , a.Time_G_;

NOTE: Data file NEO.REUTERSMINUTE.DATA is in a format that is native to another host, or the

      file encoding does not match the session encoding. Cross Environment Data Access will be

      used, which might require additional CPU resources and might reduce performance.

NOTE: The execution of this query involves performing one or more Cartesian product joins that

      can not be optimized.

ERROR: Sort execution failure.

98   quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

      real time           8:17.85

      cpu time            4:00.50

Super User
Super User
Posts: 7,050

Re: Convert SQL code into Proq SQL - (matching data task)

Try creating the first step as a VIEW (might not help).  Better would be if you have SAS/Access to whatever database the data really is in then it might be able to define the view in that database and avoid the SORT step (or at least make the database do the sorting).

Ask a Question
Discussion stats
  • 11 replies
  • 631 views
  • 0 likes
  • 3 in conversation