BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
garag
Calcite | Level 5


Hi,

I have the following problem reading date from csv file.

I've a file where the first row is something like this:

AAA BBB CCC 12/2/2014

I'm running the following code

data _null_;

   infile "my.csv"

          dlm='09'x

          missover

          dsd

          lrecl=4096

          firstobs = 1

          ;

   input a $ b $ c $ d mmddyy10.;

   call symput ('mydate', put(d, yymmddn8.));

   stop;

run;

but it does not working expecially when in the date the month or the day have only one char. (es Februray is 2, not 02)

Do there in SAS a format to read date where month or day are written with the minumum number of char and not always with the 0 ahead?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So your problem is what data_null_ diagnosed. Because you switched from list mode to formatted input style in the middle of your INPUT statement SAS is confused by the TAB that is following the date field. Because you told it to read 10 characters it read the TAB also and this made for an invalid value for the informat.  If you are reading a delimited file stick with list input.

data want ;

   infile 'myfile.csv' dlm='09'x dsd truncover obs=1;

   length a b c $8 d 8 ;

   informat d mmddyy10.;

    format d yymmddn8. ;

   input a b c d ;

   call symputx('mydate',put(d,yymmddn8.));

run;

View solution in original post

12 REPLIES 12
AncaTilea
Pyrite | Level 9

Hi.

When I run this (a modified version of what you wrote):

data _null_;

   infile datalines dlm=" " missover  dsd;

   input a $ b $ c $ d mmddyy10.;

   format d yymmddn8.;

   call symput ('mydate', put(d, yymmddn8.));

   stop;

   datalines;

AAA BBB CCC 2/2/2014

run;

%put &mydate.;

I get the date right.

When I run this (your code):

data _null_;

   infile "my.csv"

/*   infile datalines*/

   dlm=" " missover  dsd firstobs = 1 lrecl = 4096;

   input a $ b $ c $ d mmddyy10.;

   format d yymmddn8.;

   call symput ('mydate', put(d, yymmddn8.));

   stop;

/*   datalines;*/

/*AAA BBB CCC 2/2/2014*/

run;

%put &mydate.;

It doesn't work....because it reads all sorts of gibberish from the file, as you can see by modifying your data _NULL_ to a data step:

data step;

   infile "my.csv"

   dlm=" " missover  dsd firstobs = 1 lrecl = 4096;

   input a $ b $ c $ d mmddyy10.;

   format d yymmddn8.;

run;

proc print;run;

I am curious what's happening.

Do you get weird characters as well? or it's just me.

Anca.

Tom
Super User Tom
Super User

Do you just want to take the last field in the input file and convert it to a string of digits in yyyymmdd format?

data _null_;

  infile 'myfile' obs=1 ;

  input;

  call symputx('mydate',put(input(scan(_infile_,-1,'090D0A20'x),mmddyy10.),yymmddn8.));

run;

garag
Calcite | Level 5

Here you are what my file contains:

I tahke it from the log running

data _null_;

   infile "my.csv" ;

   input;

   list;

run;

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----

1   CHAR  AAAAA:.BBBBBB.CCCCCCC.2/2/2014.. 32
    ZONE  44444304444440444444403232333300
    NUMR  11111A92222229333333392F2F20149D

the code you before provided me works, I mean I can get the date but I lost the other fields contained in the line.

The initial gol was easly (how happen in the other languageSmiley Happy) read the complete line.

But I will split my goal in two steps, read the first three fields and than the last one using your code...

Many thanks!

Tom
Super User Tom
Super User

So your problem is what data_null_ diagnosed. Because you switched from list mode to formatted input style in the middle of your INPUT statement SAS is confused by the TAB that is following the date field. Because you told it to read 10 characters it read the TAB also and this made for an invalid value for the informat.  If you are reading a delimited file stick with list input.

data want ;

   infile 'myfile.csv' dlm='09'x dsd truncover obs=1;

   length a b c $8 d 8 ;

   informat d mmddyy10.;

    format d yymmddn8. ;

   input a b c d ;

   call symputx('mydate',put(d,yymmddn8.));

run;

garag
Calcite | Level 5

On my side your code works also if I write

data _null_...

but it is not a problem on the reason...I think my brain works in a way more different than SAS, anyway I couldn't understand the reason...Smiley Happy

many thanks

Tom
Super User Tom
Super User

The special keyword _NULL_ when used as the dataset name for a DATA step just means that no real dataset is created.  So you can use all of the programming features of the DATA step but without actually creating a dataset.

Other trick I included was the OBS=1 option on the INFILE since you appeared to only want to read the first line.  You had added a STOP statement to halt the data step.

Normally with SAS you do not need to worry about the number of lines in a file as it will automatically loop over the data step code and stop when it runs out of data.  So if your input file really only has one line neither the STOP or the OBS=1 is needed. And if it doesn't have only one line then perhaps there is something strange going on with your data.

LinusH
Tourmaline | Level 20

ddmmyy10. can read dates with 8 positions. I thin rather you are running to some issue due to end of record.

One work-around is to read as char (specify length $10 first), then use the input function to create your SAS date variable.

The other is to lift out the 10 part of the informat, but then you need to rely on the YEARCUTOFF= option.

Data never sleeps
Tom
Super User Tom
Super User

See what happens when you replace MISSOVER with the more appropriate TRUNCOVER. It is almost never right to use MISSOVER as you usually do not want to throw away short fields.

Also you might have end-of-line issues.  If you are reading a file created on a PC on a UNIX system by default SAS will treat the extra Carriage Return ('0D'X) as data and this will mess up the data for the last column.  You can try adding the TERMSTR=CRLF option to your INFILE statement.

garag
Calcite | Level 5

HI Tom,

Thanks for you almost  useful reply.

I replaced MISSOVER with TRUNCOVER but never changed.

I also confirm that a Carriage Return is present at the end of the line but adding the option TERMSTR=CRLF to infile statement haven't changed so much: it still doesn't work.

To be honest I'm quite new in SAS, but some things that with some other language we can do in few minutes with SAS we need a lot of time and a lot of tries to make it working....also if you ask to very SAS expert...it seems that there isn't logical way to get the things working, only try and try and try...


Tom
Super User Tom
Super User

Of course I find the opposite, things that are simple to do with SAS take forever to figure out how to do with less powerful languages with their strange syntax rules.

Another thing to check is does your file really have tabs as the delimiter?  The example you posted just has spaces between the values.

Also why not let SAS to show you what is actually in your file.  You can use the LIST statement to get SAS to dump the lines from your input file to the log.

data _null_;

   infile "my.csv" ;

   input;

   list;

run;

You can then see if there are tabs ('09'X) or CR ('0D'x) or other strange things like 'A0'X that can get into files, especially if they have been corrupted by Microsoft products.

data_null__
Jade | Level 19

I think the problem is exacerbated by mixing input "types".  The OPs input statement changes from LIST input to formatted input at the last variable.  This "switch" is very subtle and not well understood it would seem.  Formatted input is rarely needed when reading delimited fields.

Peter_C
Rhodochrosite | Level 12

running the following code

data _null_;

   infile "my.csv"

          dlm='09'x

          missover

          dsd

          lrecl=4096

          firstobs = 1

          ;

   input a $ b $ c $ d mmddyy10.;

   call symput ('mydate', put(d, yymmddn8.));

   stop;

run;

but it does not working expecially when in the date the month or the day have only one char. (es Februray is 2, not 02)

as @Tom pointed out the explicit (or implied) width for the informat resulted in the TAB being treated as data rather than delimiter. As usual the INPUT statement has a feature to support what you want to do. It is the "informat modifier" :

Just adapt your input statement to

input a $ b $ c $ d   :  mmddyy10.;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 12212 views
  • 0 likes
  • 6 in conversation