DATA Step, Macro, Functions and more

SUBSTR to change dates as text to actual dates

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

SUBSTR to change dates as text to actual dates

[ Edited ]

I have a CSV-file containing dossier numbers, dossier titles, dossier creators and dates (deadline, start date, close date, creation date). When a dossier is created, it will contain 5 sub dossiers. The first dossier will become active and the other four will become waiting. Cancelled and waiting dossiers do not have any dates, except for a creation date. Instead of showing dates, these cells will contain the value "nulldate". To make the import (date / infile / length / format / informat / input / keep) easier, I decided to import every column as a textstring. The idea was to change the format of the dates once required, using the SUBSTR command. (Date format is dd/mm/yyyy hh:mm.)

 

Example:

 

deadlinestart_dateclose_dater_creation_date
11/04/2016 14:5329/02/2016 14:53nulldate29/02/2016 14:53
11/04/2016 15:0429/02/2016 15:0430/03/2016 14:5729/02/2016 15:04
nulldatenulldatenulldate1/03/2016 11:04
8/03/2016 11:241/03/2016 11:247/03/2016 15:521/03/2016 11:24
nulldatenulldatenulldate1/03/2016 11:24
25/04/2016 12:014/04/2016 12:01nulldate1/03/2016 11:24
15/03/2016 11:361/03/2016 11:351/03/2016 12:571/03/2016 11:35
15/03/2016 10:411/03/2016 10:411/03/2016 10:411/03/2016 10:40
19/04/2016 12:278/03/2016 12:27nulldate1/03/2016 10:55
8/03/2016 10:581/03/2016 10:587/03/2016 14:151/03/2016 10:58
nulldatenulldatenulldate1/03/2016 11:11
nulldatenulldatenulldate1/03/2016 11:14
15/04/2016 11:461/04/2016 11:46nulldate1/03/2016 11:15
8/03/2016 11:151/03/2016 11:151/04/2016 11:441/03/2016 11:15
nulldatenulldatenulldate1/03/2016 10:36
nulldatenulldatenulldate1/03/2016 11:35
nulldatenulldatenulldate1/03/2016 10:41
28/03/2016 9:5614/03/2016 9:56nulldate1/03/2016 11:18
21/03/2016 14:2129/02/2016 14:2116/03/2016 10:5229/02/2016 14:21
21/03/2016 14:3429/02/2016 14:3417/03/2016 12:0429/02/2016 14:34

 

How would I go about using SUBSTR to change the format from $char16. to the actual format used through a datastep in my query (not the query I used for importing, but the query I used to analyse my data).

 

Additionally, later on I will need to compare some of these dates to a fixed date. This is because I am working with exports from a database and so the fixed date becomes the date of the export day. (%LET date=1APR2016;)

 

It doesn't matter which format it ends up using, as long as I can make the analysis. Any suggestions how I can start this?


Accepted Solutions
Solution
‎04-14-2016 10:18 AM
Super User
Posts: 7,781

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

Since r_creation_date is a timestamp, don't include it in the length statement at all. Timestamps are best left at the default length of 8 (numeric).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

To clarify:

 

I have to check if the dossiers were handled before the deadline. For dossiers that are completed, a close date is available. The calculation will become like this:

 

  when ('DONE') do;
    	done+1;
    	if close_date > deadline then done_overtime+1;
  end;

 

However, for dossiers that are still active, there will be no close date available. In that case I will need to compare it to another date instead. Since the export I am currently using, was created on the 1st of April, this fixed date has to be April 1st, 2016. The calculation will then become like this:

 

when ('ACTIVE') do;
    	active+1;
    	if date() > deadline then active_overtime+1;
end;

 

Naturally this calculation won't work, since SAS thinks the dates are a textstring and not dates. Importing the dates as textstrings is a necessity because not only does it actually contain a textstring ("nulldate"), I need to import the data from the CSV-file untouched. A colelague pointed out that the best approach would be to use the import query for importing everything as a textstring and then later on use the SUBSTR command to change the format in the datastep from your actual analysis.

Super User
Posts: 7,781

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

Hands down, the best method to deal with these problems is to store dates and timestamps as intended, and use formats for display or output as necessary; all comparisons and mathematical operations become a breeze instead of a massive PITA.

 

So the way to go in your situation is to

- read the column as character

- check if it contains "nulldate"

- if yes, set (numerical) date/timestamp variable to missing (.)

- else convert with proper format

- drop temporary character variable

 

If you want to preserve the original unconverted data, add a sufficiently large character variable in the data set and fill it with the contents of _infile_.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

Posted in reply to KurtBremser

What you are suggesting makes sense. If I understand correctly, I would have to do this during the import, don't I? Just in case I will show you the import query I used.

 

data unload.task;
infile '\\networklocation\unloads\JNX\JNX_unload_tasks_ 201603_.csv' dlm=';' truncover firstobs=2 lrecl=500;

length
  r_object_id $16
  object_name $70
  deadline $16
  start_date $16
  close_date $16
  task_handler $25
  task_state $9
  dossier_nr $6
  sequence $1
  r_creation_date $16
;

format
  r_object_id $char16.
  object_name $char70.
  deadline $char16.
  start_date $char16.
  close_date $char16.
  task_handler $char25.
  task_state $char9.
  dossier_nr $char6.
  sequence $char1.
  r_creation_date $char16.
;

informat
  r_object_id $char16.
  object_name $char70.
  deadline $char16.
  start_date $char16.
  close_date $char16.
  task_handler $char25.
  task_state $char9.
  dossier_nr $char6.
  sequence $char1.
  r_creation_date $char16.
;

input
  r_object_id
  object_name
  deadline
  start_date
  close_date
  task_handler
  task_state
  dossier_nr
  sequence
  r_creation_date
;

keep
  r_object_id
  object_name 
  deadline
  start_date
  close_date
  task_handler
  task_state
  dossier_nr
  sequence
  r_creation_date
;
run;

 

This brings us back to the format of the dates. The format itself is should be date10., but that would be without the "hh:mm". I thought the SUBSTR could manage that by saying something like: "new_deadline=SUBSTR(deadline,1,10)", but I can't get that to work at all.

Super User
Super User
Posts: 7,950

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

You would use inputs and puts to arrange your data after the fact - not sure why you don't just update this on the read in?  Here is an example of creating date, time, and datetime variables:

data have;
  length deadline  start_date 	close_date r_creation_date $50; 
  infile datalines dlm=",";
  input deadline $ start_date $	close_date $ r_creation_date $;
datalines;
11/04/2016 14:53,29/02/2016 14:53,nulldate,	29/02/2016 14:53
11/04/2016 15:04,29/02/2016 15:04,30/03/2016 14:57,	29/02/2016 15:04
nulldate,nulldate,nulldate,1/03/2016 11:04
;
run;

data want;
  set have;
  if deadline ne "nulldate" then do;
    deadlined=input(scan(deadline,1," "),ddmmyy10.);
    deadlinet=input(scan(deadline,2," "),time5.);
    deadlinedt=input(put(deadlined,yymmdd10.)||"T"||put(deadlinet,tod5.)||":00",e8601dt.);
  end;
  format deadlinedt datetime. deadlined date9. deadlinet time5.;
run;
Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

[ Edited ]

I think I see your point. If I were to do this in the analysis query, the query itself would become lengthy, the query would become very unorderly and perhaps even slow to execute?

 

If I were to do this from the import, could I still do format $char16. and then use the proper informat?  But is there a format that sticks to the used format? Datetime16. perhaps? But how will the import handle the nulldate values if I use that format?

Super User
Super User
Posts: 7,950

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

You should be able to use:

informat <your_variable> anydtdtm.;

In your import code, this should read it in as a date time.  If the import value is missing, then it will be missing in the output variable.  If you actually have the word "nulldate" present in your import file then you have problems, and that needs to be removed before importing - personally I wouldn't accept a file which had that kind of thing in it.

 

See SAS help:

http://support.sas.com/documentation/cdl/en/leforinforref/63324/HTML/default/viewer.htm#p1hsn1ji141r...

 

On your other point, yes, doing conversions in a query would take longer, the functions get called each row.  Better to have data in a good strcuture before starting to process it.

Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

What if I were to do the unthinkable and alter my unload? Only alter it by changing "nulldate" with "-". If I am not mistaken, then the format you mentioned above should work, right?

Super User
Super User
Posts: 7,950

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

I would leave it as "", i.e. toally empty.  

 

But then I would also have a data import agreement in place signed off by both parties, which detailed, as receiver of the data, the exact structure, file format, and contents of the data file as I want it to be (or near as).  To not have any of this makes me a very sad importer Smiley Surprised)

Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

[ Edited ]

If I do that in an import, all my cells get shifted. For example, a CSV-file:

 

(First row contains variable names.)

 

A;B;C;D;E

X;X;;Y;Z

X;X;Y;Y;Z

 

Will give me a table like:

 

ABCDE
XXYZ 
XXYYZ

 

Even though you can clearly see that the it says "X;X;;Y;Z", it somehow ignores the ";;" and makes all my cells move one to the left. The unloads I have been using, have quite a few empty cells and so this became a problem. The quick and easy fix was to fill every empty cell with "-". So the CSV-file would look like:

 

A;B;C;D;E

X;X;-;Y;Z

X;X;Y;Y;Z

 

Will give me a table like this:

 

ABCDE
XX-YZ
XXYYZ

 

That makes me wonder, is that normal? If so, would it be wise to replace nulldate with an empty cell?

Super User
Super User
Posts: 7,950

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

Change your infile "..." statement to include (either) DSD or missover option - can't think which it is, think its DSD:

infile "...\textfile.txt" dlm=";" dsd;

Super User
Posts: 7,781

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

When this kind of shifting happens, it can usually be corrected by using the dsd option in the infile statement. This causes two delimiters in sequence to be interpreted as a missing value. Without dsd, the two delimiters are taken as one.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

Posted in reply to KurtBremser

That works wonders! Thanks!

 

However, I still have an error with my anydate.

 

Imagine just having a CSV-file with one column "dates" and in it a bunch of dates in the format "dd/mm/yy hh:mm". The code I have right now, looks like:

 

data dates;
infile 'C:\temp\dates.csv'  dlm=';' truncover firstobs=2 lrecl=500;

length
  date $16
;

format
  date $char16.
;

informat
  date anydtdtm16.
;

input
  date
;

keep
  date
;

run;

 

It results with an error in the log (from the original query and table):

 

 

1542    r_creation_date anydtdtm16.
                        -----------
                        48
ERROR 48-59: The informat $ANYDTDTM was not found or could not be loaded.

 

 

I guess I must be missing something somewhere? Do I need to adjust something in the format or even the input?

Super User
Posts: 7,781

Re: SUBSTR to change dates as text to actual dates

Posted in reply to Yves_Boonen

You try to assign a numerical format to a variable that was already defined as character. SAS then tries to correct your "obvious" mistake by searching for a proper character format (that would be named $anydtdtm), which does not exist.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: SUBSTR to change dates as text to actual dates

Posted in reply to KurtBremser

Makes sense. My idea to solve this issue would be to change the format in the format section. When I try that, it results in the same error. I will just post the full query and the full log.

 

data unload.task;
infile 'C:\temp\JNX_unload_tasks_ 201603_.csv' dlm=';' truncover firstobs=2 lrecl=500 dsd;

length
  r_object_id $16
  object_name $70
  deadline $16
  start_date $16
  close_date $16
  task_handler $25
  task_state $9
  dossier_nr $6
  sequence $1
  r_creation_date $16
;

format
  r_object_id $char16.
  object_name $char70.
  deadline anydtdtm16.
  start_date anydtdtm16.
  close_date anydtdtm16.
  task_handler $char25.
  task_state $char9.
  dossier_nr $char6.
  sequence $char1.
  r_creation_date anydtdtm16.
;

informat
  r_object_id $char16.
  object_name $char70.
  deadline anydtdtm16.
  start_date anydtdtm16.
  close_date anydtdtm16.
  task_handler $char25.
  task_state $char9.
  dossier_nr $char6.
  sequence $char1.
  r_creation_date anydtdtm16.
;

input
  r_object_id
  object_name
  deadline
  start_date
  close_date
  task_handler
  task_state
  dossier_nr
  sequence
  r_creation_date
;

keep
  r_object_id
  object_name 
  deadline
  start_date
  close_date
  task_handler
  task_state
  dossier_nr
  sequence
  r_creation_date
;

run;

 

2543  data unload.task;
2544  infile
2544! '\\GLOW001.AD.SYS\0005_data\diensten\Bvh_h000\Dbe_h001\Rpb_0396\Prc_07ng\Plo_0425\Acn_02pp\Z
2544! cn_04io\1_Data\Interne_Werking\2_Operationeel\5_procescontrole\Gegevens\JNX\MAAND\CIJFERS\At
2544! ropos\unloads\JNX\JNX_unload_tasks_ 201603_.csv' dlm=';' truncover firstobs=2 lrecl=500 dsd;
2545
2546  /* De lengte werd exact bepaald m.b.v. de functie =LEN(celnaam) in Excel. */
2547
2548  length
2549    r_object_id $16
2550    object_name $70
2551    deadline $16
2552    start_date $16
2553    close_date $16
2554    task_handler $25
2555    task_state $9
2556    dossier_nr $6
2557    sequence $1
2558    r_creation_date $16
2559  ;
2560
2561  /* Alle variabelen als tekst inlezen, behalve de datums. */
2562
2563  format
2564    r_object_id $char16.
2565    object_name $char70.
2566    deadline anydtdtm16.
                 -----------
                 48
2567    start_date anydtdtm16.
                   -----------
                   48
2568    close_date anydtdtm16.
                   -----------
                   48
ERROR 48-59: The format $ANYDTDTM was not found or could not be loaded.

2569    task_handler $char25.
2570    task_state $char9.
2571    dossier_nr $char6.
2572    sequence $char1.
2573    r_creation_date anydtdtm16.
                        -----------
                        48
ERROR 48-59: The format $ANYDTDTM was not found or could not be loaded.

2574  ;
2575
2576  /* Alle variabelen als tekst uitsturen, behalve de datums. */
2577
2578  informat
2579    r_object_id $char16.
2580    object_name $char70.
2581    deadline anydtdtm16.
                 -----------
                 48
2582    start_date anydtdtm16.
                   -----------
                   48
2583    close_date anydtdtm16.
                   -----------
                   48
ERROR 48-59: The informat $ANYDTDTM was not found or could not be loaded.

2584    task_handler $char25.
2585    task_state $char9.
2586    dossier_nr $char6.
2587    sequence $char1.
2588    r_creation_date anydtdtm16.
                        -----------
                        48
ERROR 48-59: The informat $ANYDTDTM was not found or could not be loaded.

2589  ;
2590
2591  input
2592    r_object_id
2593    object_name
2594    deadline
2595    start_date
2596    close_date
2597    task_handler
2598    task_state
2599    dossier_nr
2600    sequence
2601    r_creation_date
2602  ;
2603
2604  /* Volledige selectie maken. */
2605
2606  keep
2607    r_object_id
2608    object_name
2609    deadline
2610    start_date
2611    close_date
2612    task_handler
2613    task_state
2614    dossier_nr
2615    sequence
2616    r_creation_date
2617  ;
2618
2619  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set UNLOAD.TASK may be incomplete.  When this step was stopped there were 0
         observations and 10 variables.
WARNING: Data set UNLOAD.TASK was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.46 seconds
      cpu time            0.09 seconds


 

It seems as though SAS doesn't "know" this format? Could that be the case?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 25 replies
  • 454 views
  • 12 likes
  • 5 in conversation