About machete

machete · ‎10-02-2013

the variable is called 'lastmodificationdate', below an example record: '06JAN2012:11:57:06.00' it has the following format: DATETIME21.2 the code you provided does still not work i also tried to ammend it but it still does not work: data match2.eurodailynew; set match2.eurodaily; newdate=input(LastModificationDate,datetime18.); time=timepart(newdate); format time time8.; run; Neo

machete · ‎10-02-2013

Hi I used the code you proposed, it just converted the original time variable to the current date today: what was before e.g 06JAN2012:11:57:06.00 now is : 02OCT2013:17:43:02.95 there should be some mistake somewhere or? Best Neo

machete · ‎10-02-2013

i treid this one data match2.eurodailynew; set match2.eurodaily; LastModificationDate=datetime(); time=timepart(LastModificationDate); run; but then the time variable delivers a number instead of hh:mm:ss, how can i reformat this? Thnx Neo

machete · ‎10-02-2013

Hello Jagadish, i am traying to ammend your code as I need to execute the code for the whole dataset, could you tell me where the mistake is in your opinion: data match2.eurodailynew; set match2.eurodaily; input LastModificationDate $18.; time=substr(LastModificationDate,11,8); cards; thanks Neo

machete · ‎10-02-2013

Dear all I am trying to convert a date variable of the below format: “05/01/201208:45:24” (informat is DATETIME21.) Using this command: newtime=substr(Last_Modification_Date,11,9); as I would like to have only hh:mm:ss however with the above command i do not receive result: ‘.’ do you have an idea why this code does not work? Many thanks in advance for the support Best Neo

machete · ‎05-06-2013

Dear Art, I am coming back once again to the open question from this post: https://communities.sas.com/message/154166#154166 I used this code from below which shortened my reuters dataset substantially - up to 5 times less observations. https://communities.sas.com/message/155689 Then I tried to re run your solution (this post) but my pc was executing for a couple of hours before I cancelled the statements. Last time the problem was disk space now simply the execution time was too long. Any ideas how I could solve this issue? Maybe I should break down both datasets into one currency per dataset? Is there a code that will de - merge and re - merge the results because other wise it will take me ages to do that. Plus i am not sure, will this solve my execution problem? Let me know what you think Thanks&have a nice day Neo

machete · ‎05-05-2013

PG, Many thanks for this one, it worked perfectly. Apologies for the late reply, had the impression I had provided feedback Cheers Neo

machete · ‎03-03-2013

Dear all, This question partly originates from a larger problem currently addressed. For more info see below: I have a large data set with 64 million observations containing high frequency (up to the second) data for currencies for a time period of 68 days. I would like to reduce this dataset to a minute interval that is. Randomly to pick for each currency 1 observation per minute. This should net 60x24=1440 observations per day per currency and around 100 000(1440x68) per currency for the whole time period. Since I have around 10 currencies the dataset will be reduced from 64 million to 10x100 000= 1 million. Do you have any ideas on how to reduce the dataset based on my suggestion? I will then use this reduced dataset to overcome the computation difficulties that appear in the matching question post (see above link) Attached is a sample of the data set for only once currency. Thank you Best Neo

machete · ‎02-27-2013

Hello you both, Thnx for your comments. In order to avoid confusions: The reported errors for missing values and the duplicates procedure from above is not related to the current problem we are discussing. When I copy pasted the errors from SAS related to the problem we are discussing(matching) I took the above errors as well by mistake - they were part of other operations I executed in SAS. That is why once again I confirm that the final Reuters dataset has no missing values(I crossed checked) and the only reason the code provided as solution did not execute was because of limited disk space - the operation stopped when the file was 50 GB, although the datasets used to solve the problem are times smaller. Neo

machete · ‎02-26-2013

Hi, I checked again the reuters source file, there are no dates missing. These were missing from the test file, the errors which are showed came from two subsequent estimations in SAS - ignore that one. I think the issue here is disk space and how to go about it - see my last post. Neo

machete · ‎02-25-2013

Hi Art, When I tried the code in the sample dataset it worked. I tried once again on the a bigger set. Looks like disk space is the problem, the procedure stopped when the file was 52GB...something should be wrong in the code(??) as both datasets used for solving this problem are around 1GB NOTE: Libref NEO was successfully assigned as follows: Engine: V9 Physical Name: d:\phd thesis\sas files ERROR: Insufficient space in file NEO.WANTTEST.DATA. NOTE: The DATA step has been abnormally terminated. NOTE: There were 479 observations read from the data set WORK.CHFDAILY. WARNING: The data set NEO.WANTTEST may be incomplete. When this step was stopped there were 6919415 observations and 62 variables. NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format. NOTE: DATA statement used (Total process time): real time 9:43:56.14 cpu time 2:46.42 Any suggestions? Neo

machete · ‎02-24-2013

Hi Art, Reuters data are arranged based on Currencies and then by date, however i had already created a new file containing only one currency when running the code, thus with ordered date. The date format is standard, the total reuters dataset is based on GMT and during random days they are ordered based on time, beginning with 00:01 and ending at 23:59 Below are the error messages I have received when executing the code (i think also when executing one additional command). Do you have an idea what might be the problem? WARNING: Multiple lengths were specified for the variable Cur1Cur2 by input data set(s). This may cause truncation of data. NOTE: There were 16529 observations read from the data set NEO.CHFDAILY. NOTE: The data set WORK.CHFDAILY has 16529 observations and 42 variables. NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format. NOTE: DATA statement used (Total process time): real time 0.55 seconds cpu time 0.59 seconds 389 390 proc sort data=chfdaily; 391 by dt1; 392 run; NOTE: There were 16529 observations read from the data set WORK.CHFDAILY. NOTE: The data set WORK.CHFDAILY has 16529 observations and 42 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.40 seconds cpu time 0.45 seconds 393 394 data chfreuters (drop=date_G_ time_G_); 395 set neo.chfreuters; 396 format dt2 datetime19.; 397 dt2=dhms(datepart(date_G_),hour(time_G_),minute(time_G_),second(time_G_)); 398 run; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 4126234 at 397:7 4126234 at 397:12 NOTE: There were 4126234 observations read from the data set NEO.CHFREUTERS. NOTE: The data set WORK.CHFREUTERS has 4126234 observations and 21 variables. NOTE: DATA statement used (Total process time): real time 35.60 seconds cpu time 6.52 seconds 399 400 data want; 401 retain start_rpointer; 402 set chfdaily; 403 if _n_ eq 1 then start_rpointer=1; 404 current_rpointer=start_rpointer; 405 do while (dt1 ge dt2 and not have2end); 406 set chfreuters point=current_rpointer end=have2end; 407 if dt2 ge dt1 and Cur1Cur2 eq _ric2 then do; 408 output; 409 start_rpointer=current_rpointer; 410 end; 411 else do; 412 current_rpointer+1; 413 end; 414 end; 415 run; NOTE: The DATA step has been abnormally terminated. NOTE: There were 1 observations read from the data set WORK.CHFDAILY. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 63 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 1:45:49.23 cpu time 1:44:46.60 416 417 418 PROC SORT 419 data=neo.cobasetneo NODUP; 420 BY _all_; 421 RUN; ERROR: No disk space is available for the write operation. Filename = C:\Users\Ali\AppData\Local\Temp\SAS Temporary Files\SAS_util00010000218C_Ali-PC\ut218C000002.utl. ERROR: Failure while attempting to write page 910 of sorted run 120. ERROR: Failure while attempting to write page 268660 to utility file 1. ERROR: Failure encountered while creating initial set of sorted runs. ERROR: Failure encountered during external sort. ERROR: Sort execution failure. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 1890001 observations read from the data set NEO.COBASETNEO. WARNING: The data set NEO.COBASETNEO may be incomplete. When this step was stopped there were 0 observations and 43 variables. WARNING: Data set NEO.COBASETNEO was not replaced because this step was stopped. NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format. NOTE: PROCEDURE SORT used (Total process time): real time 28:12.22 cpu time 3:02.64 Do you think that making the datasets shorter (smaller number of variables) for the matching procedure and then remerging e.g based on the deal_Id would also be an option? Let me know what you think Thanks Neo

machete · ‎02-24-2013

Dear Art, Thanks for the reply. This code worked for doing the first match. However I am facing two problems: 1. When I try to run this on a larger sample - one currency (14000 observations in the first dataset to be matched from 1,5million from Reuters) for the whole period (68 days) the system crashes after two hours of processing.I tried this a couple of times with other combinations of currencies but it still crashes. Is there a way we can adjust the code to execute stepwise day by day(which will mean to break down the dataset into 68 days automatically and then re-merge it)? Breaking down the dataset manually into days and currencies will take me probably days to do.Do you have any other possible solutions? 2. Can you provide an update solving my second problem as well - matching the reuters records which are 30,60,720 seconds further in time than the record in Dataset A etc Or shall we find a stable solution for point no 1 from above first and then deal with this one? Thanks Neo

machete · ‎02-17-2013

Hi Art, I think if we treat this as two sub problems will be easier to solve. The first problem is datetime (in dataset A) <= the closest datetime (in dataset B) and then the second problem will be matching datetime (in dataset A) <= database B_datetime "+30 seconds / 1min / 30mins / 60mins / 24hrs as discussed in earlier posts" which will give us 5 more variables in the output. I would suggest if you have already a code in mind to post it here so that I test it, then I think will be more easier to finalize the solution with small iterations Neo

machete · ‎02-16-2013

Hi Art, What you mention rgd datetime makes sense then this should not be a problem. The solution will be actually easier than what you propose. The base time is fixed. That means for all the matches To (starting time) is constant and it is the datetime of a transaction in dataset A. we then simply match this one with those in dataset B which are +30seconds, +1...+24hrs further in time. as you have written it above equations (1) to (6), the left handside should be always datetime (in dataset A) in summary: (1) datetime(in dataset A) <= the closest datetime in dataset B) (2) datetime(in dataset A) <= database B_datetime+30 seconds (3) datetime(in dataset A) <= database B_datetime+1 minute (4) datetime(in dataset A) <= database B_datetime+30 minutes (5) datetime(in dataset A) <= database B_datetime+60 minutes (6) datetime(in dataset A) <= database B_datetime+24 hours does it make sense? all the above conditioned on currency in database A = currency in database B (this is a text variable, i am not sure if this makes a difference for the code) i am not sure if we need this part you provided in all the equations: <= next datetime(s) in datasetB ? I have the feeling we are close to have the solution described in a logical order. Thanks Neo

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: Question on Proc SQL procedure with a case function

Re: Question on Proc SQL procedure with a case function

Re: Question on Proc SQL procedure with a case function

Question on Proc SQL procedure with a case function

Re: How to locate a record in a time series that is 'x' minutes ahead ...

Re: How to locate a record in a time series that is 'x' minutes ahead ...

Re: How to locate a record in a time series that is 'x' minutes ahead ...

Re: How to locate a record in a time series that is 'x' minutes ahead ...

Re: How to locate a record in a time series that is 'x' minutes ahead ...

Re: How to locate a record in a time series that is 'x' minutes ahead ...

Re: How to convert a date&time variable into a time variable

Re: How to convert a date&time variable into a time variable

Re: How to convert a date&time variable into a time variable

Re: How to convert a date&time variable into a time variable

How to convert a date&time variable into a time variable

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to reduce a large time series dataset

How to reduce a large time series dataset

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to perform a complex merge/matching procedure from two dataset...

Re: How to perform a complex merge/matching procedure from two dataset...