BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yves_Boonen
Quartz | Level 8

I have a CSV-file containing dossier numbers, dossier titles, dossier creators and dates (deadline, start date, close date, creation date). When a dossier is created, it will contain 5 sub dossiers. The first dossier will become active and the other four will become waiting. Cancelled and waiting dossiers do not have any dates, except for a creation date. Instead of showing dates, these cells will contain the value "nulldate". To make the import (date / infile / length / format / informat / input / keep) easier, I decided to import every column as a textstring. The idea was to change the format of the dates once required, using the SUBSTR command. (Date format is dd/mm/yyyy hh:mm.)

 

Example:

 

deadlinestart_dateclose_dater_creation_date
11/04/2016 14:5329/02/2016 14:53nulldate29/02/2016 14:53
11/04/2016 15:0429/02/2016 15:0430/03/2016 14:5729/02/2016 15:04
nulldatenulldatenulldate1/03/2016 11:04
8/03/2016 11:241/03/2016 11:247/03/2016 15:521/03/2016 11:24
nulldatenulldatenulldate1/03/2016 11:24
25/04/2016 12:014/04/2016 12:01nulldate1/03/2016 11:24
15/03/2016 11:361/03/2016 11:351/03/2016 12:571/03/2016 11:35
15/03/2016 10:411/03/2016 10:411/03/2016 10:411/03/2016 10:40
19/04/2016 12:278/03/2016 12:27nulldate1/03/2016 10:55
8/03/2016 10:581/03/2016 10:587/03/2016 14:151/03/2016 10:58
nulldatenulldatenulldate1/03/2016 11:11
nulldatenulldatenulldate1/03/2016 11:14
15/04/2016 11:461/04/2016 11:46nulldate1/03/2016 11:15
8/03/2016 11:151/03/2016 11:151/04/2016 11:441/03/2016 11:15
nulldatenulldatenulldate1/03/2016 10:36
nulldatenulldatenulldate1/03/2016 11:35
nulldatenulldatenulldate1/03/2016 10:41
28/03/2016 9:5614/03/2016 9:56nulldate1/03/2016 11:18
21/03/2016 14:2129/02/2016 14:2116/03/2016 10:5229/02/2016 14:21
21/03/2016 14:3429/02/2016 14:3417/03/2016 12:0429/02/2016 14:34

 

How would I go about using SUBSTR to change the format from $char16. to the actual format used through a datastep in my query (not the query I used for importing, but the query I used to analyse my data).

 

Additionally, later on I will need to compare some of these dates to a fixed date. This is because I am working with exports from a database and so the fixed date becomes the date of the export day. (%LET date=1APR2016;)

 

It doesn't matter which format it ends up using, as long as I can make the analysis. Any suggestions how I can start this?

1 ACCEPTED SOLUTION
24 REPLIES 24
Yves_Boonen
Quartz | Level 8

To clarify:

 

I have to check if the dossiers were handled before the deadline. For dossiers that are completed, a close date is available. The calculation will become like this:

 

  when ('DONE') do;
    	done+1;
    	if close_date > deadline then done_overtime+1;
  end;

 

However, for dossiers that are still active, there will be no close date available. In that case I will need to compare it to another date instead. Since the export I am currently using, was created on the 1st of April, this fixed date has to be April 1st, 2016. The calculation will then become like this:

 

when ('ACTIVE') do;
    	active+1;
    	if date() > deadline then active_overtime+1;
end;

 

Naturally this calculation won't work, since SAS thinks the dates are a textstring and not dates. Importing the dates as textstrings is a necessity because not only does it actually contain a textstring ("nulldate"), I need to import the data from the CSV-file untouched. A colelague pointed out that the best approach would be to use the import query for importing everything as a textstring and then later on use the SUBSTR command to change the format in the datastep from your actual analysis.

Kurt_Bremser
Super User

Hands down, the best method to deal with these problems is to store dates and timestamps as intended, and use formats for display or output as necessary; all comparisons and mathematical operations become a breeze instead of a massive PITA.

 

So the way to go in your situation is to

- read the column as character

- check if it contains "nulldate"

- if yes, set (numerical) date/timestamp variable to missing (.)

- else convert with proper format

- drop temporary character variable

 

If you want to preserve the original unconverted data, add a sufficiently large character variable in the data set and fill it with the contents of _infile_.

Yves_Boonen
Quartz | Level 8

What you are suggesting makes sense. If I understand correctly, I would have to do this during the import, don't I? Just in case I will show you the import query I used.

 

data unload.task;
infile '\\networklocation\unloads\JNX\JNX_unload_tasks_ 201603_.csv' dlm=';' truncover firstobs=2 lrecl=500;

length
  r_object_id $16
  object_name $70
  deadline $16
  start_date $16
  close_date $16
  task_handler $25
  task_state $9
  dossier_nr $6
  sequence $1
  r_creation_date $16
;

format
  r_object_id $char16.
  object_name $char70.
  deadline $char16.
  start_date $char16.
  close_date $char16.
  task_handler $char25.
  task_state $char9.
  dossier_nr $char6.
  sequence $char1.
  r_creation_date $char16.
;

informat
  r_object_id $char16.
  object_name $char70.
  deadline $char16.
  start_date $char16.
  close_date $char16.
  task_handler $char25.
  task_state $char9.
  dossier_nr $char6.
  sequence $char1.
  r_creation_date $char16.
;

input
  r_object_id
  object_name
  deadline
  start_date
  close_date
  task_handler
  task_state
  dossier_nr
  sequence
  r_creation_date
;

keep
  r_object_id
  object_name 
  deadline
  start_date
  close_date
  task_handler
  task_state
  dossier_nr
  sequence
  r_creation_date
;
run;

 

This brings us back to the format of the dates. The format itself is should be date10., but that would be without the "hh:mm". I thought the SUBSTR could manage that by saying something like: "new_deadline=SUBSTR(deadline,1,10)", but I can't get that to work at all.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You would use inputs and puts to arrange your data after the fact - not sure why you don't just update this on the read in?  Here is an example of creating date, time, and datetime variables:

data have;
  length deadline  start_date 	close_date r_creation_date $50; 
  infile datalines dlm=",";
  input deadline $ start_date $	close_date $ r_creation_date $;
datalines;
11/04/2016 14:53,29/02/2016 14:53,nulldate,	29/02/2016 14:53
11/04/2016 15:04,29/02/2016 15:04,30/03/2016 14:57,	29/02/2016 15:04
nulldate,nulldate,nulldate,1/03/2016 11:04
;
run;

data want;
  set have;
  if deadline ne "nulldate" then do;
    deadlined=input(scan(deadline,1," "),ddmmyy10.);
    deadlinet=input(scan(deadline,2," "),time5.);
    deadlinedt=input(put(deadlined,yymmdd10.)||"T"||put(deadlinet,tod5.)||":00",e8601dt.);
  end;
  format deadlinedt datetime. deadlined date9. deadlinet time5.;
run;
Yves_Boonen
Quartz | Level 8

I think I see your point. If I were to do this in the analysis query, the query itself would become lengthy, the query would become very unorderly and perhaps even slow to execute?

 

If I were to do this from the import, could I still do format $char16. and then use the proper informat?  But is there a format that sticks to the used format? Datetime16. perhaps? But how will the import handle the nulldate values if I use that format?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You should be able to use:

informat <your_variable> anydtdtm.;

In your import code, this should read it in as a date time.  If the import value is missing, then it will be missing in the output variable.  If you actually have the word "nulldate" present in your import file then you have problems, and that needs to be removed before importing - personally I wouldn't accept a file which had that kind of thing in it.

 

See SAS help:

http://support.sas.com/documentation/cdl/en/leforinforref/63324/HTML/default/viewer.htm#p1hsn1ji141r...

 

On your other point, yes, doing conversions in a query would take longer, the functions get called each row.  Better to have data in a good strcuture before starting to process it.

Yves_Boonen
Quartz | Level 8

What if I were to do the unthinkable and alter my unload? Only alter it by changing "nulldate" with "-". If I am not mistaken, then the format you mentioned above should work, right?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would leave it as "", i.e. toally empty.  

 

But then I would also have a data import agreement in place signed off by both parties, which detailed, as receiver of the data, the exact structure, file format, and contents of the data file as I want it to be (or near as).  To not have any of this makes me a very sad importer 🐵

Yves_Boonen
Quartz | Level 8

If I do that in an import, all my cells get shifted. For example, a CSV-file:

 

(First row contains variable names.)

 

A;B;C;D;E

X;X;;Y;Z

X;X;Y;Y;Z

 

Will give me a table like:

 

ABCDE
XXYZ 
XXYYZ

 

Even though you can clearly see that the it says "X;X;;Y;Z", it somehow ignores the ";;" and makes all my cells move one to the left. The unloads I have been using, have quite a few empty cells and so this became a problem. The quick and easy fix was to fill every empty cell with "-". So the CSV-file would look like:

 

A;B;C;D;E

X;X;-;Y;Z

X;X;Y;Y;Z

 

Will give me a table like this:

 

ABCDE
XX-YZ
XXYYZ

 

That makes me wonder, is that normal? If so, would it be wise to replace nulldate with an empty cell?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Change your infile "..." statement to include (either) DSD or missover option - can't think which it is, think its DSD:

infile "...\textfile.txt" dlm=";" dsd;

Kurt_Bremser
Super User

When this kind of shifting happens, it can usually be corrected by using the dsd option in the infile statement. This causes two delimiters in sequence to be interpreted as a missing value. Without dsd, the two delimiters are taken as one.

Yves_Boonen
Quartz | Level 8

That works wonders! Thanks!

 

However, I still have an error with my anydate.

 

Imagine just having a CSV-file with one column "dates" and in it a bunch of dates in the format "dd/mm/yy hh:mm". The code I have right now, looks like:

 

data dates;
infile 'C:\temp\dates.csv'  dlm=';' truncover firstobs=2 lrecl=500;

length
  date $16
;

format
  date $char16.
;

informat
  date anydtdtm16.
;

input
  date
;

keep
  date
;

run;

 

It results with an error in the log (from the original query and table):

 

 

1542    r_creation_date anydtdtm16.
                        -----------
                        48
ERROR 48-59: The informat $ANYDTDTM was not found or could not be loaded.

 

 

I guess I must be missing something somewhere? Do I need to adjust something in the format or even the input?

Kurt_Bremser
Super User

You try to assign a numerical format to a variable that was already defined as character. SAS then tries to correct your "obvious" mistake by searching for a proper character format (that would be named $anydtdtm), which does not exist.

Yves_Boonen
Quartz | Level 8

Oh, stupid me... I should remove the "$" in the length, shouldn't I? Smiley Surprised

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 2335 views
  • 12 likes
  • 5 in conversation