BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xyxu
Quartz | Level 8

I have a CSV file in which date variables look weird. The following picture is what I see when if I open the CSV in Excel. Also, many rows refused to change when I attempted to format the column into a single date format in Excel.

xyxu_0-1596473846835.png

Please see the attached file as an example. When I import it into SAS, I use

data date_format;
	infile "&in_path\date_format_issue.csv" delimiter = ',' dsd missover lrecl = 32767 firstobs = 2;
	informat id 20. date anydtdte.;
	format date yymmdd10.;
	input id date;
run;

When I check the number of unique IDs per month using

proc sql;
	create table temp as
	select distinct intnx('month', date, 0, 'e') as month format yymmdd10., count(unique id) as num
	from date_format
	group by month
	order by month;
quit;

I found that the date variable is causing problems. The numbers are incorrect because dates are corrupt. For example, there should be 1300+ unique IDs in Jan, but I got only 588. When my friend uses his computer to open the CSV file, he was able to format the date in Excel correctly. Unfortunately I couldn't do this in my computer. How can I import the dates accurately into SAS? 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@xyxu wrote:

Yeah this explanation makes perfect sense. I should definitely determine the date with my own judgement. 


And when you look at the file without letting Excel auto-convert the contents you can tell that ALL of the values are compatible with using DMY order.  So just read it using DMMYY informat and you should be good.

997   data test;
998     infile "&path\&fname" dsd truncover firstobs=2;
999     input id $ date_str :$20. ;
1000    date = input(date_str,ddmmyy10.);
1001    format date yymmdd10.;
1002  run;

NOTE: The infile "C:\Downloads\date_format_issue.csv" is:
      Filename=C:\Downloads\date_format_issue.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=1305230,
      Last Modified=03Aug2020:16:02:41,
      Create Time=03Aug2020:16:02:40

NOTE: 82844 records were read from the infile "C:\Downloads\date_format_issue.csv".
      The minimum record length was 11.
      The maximum record length was 15.
NOTE: The data set WORK.TEST has 82844 observations and 3 variables.

View solution in original post

13 REPLIES 13
AMSAS
SAS Super FREQ
When I see an excel file and the columns are left-aligned (as in your 3rd row and others) the first thing I think is it's Excel Text field and probably has a ' at the start forcing it to be text. e.g. '22/02/2019
AMSAS
SAS Super FREQ
data read ;
	infile "<path>\date_format_issue.csv" dlm="," firstobs=2 obs=20 ;
	input 
		id $
		date : ddmmyy10. ;
	put id= date= date7. ;
run ;

Try the above code, this should work

Kurt_Bremser
Super User

Do not use Excel to inspect CSV files. Excel will show you what it thinks is in the file, which is very often not what's actually in there. Always use text editors to inspect CSV or other text files you want to import into SAS.

For this particular file, use:

data want;
infile "path to your file" dlm=',' truncover;
input id $ date :ddmmyy10.;
format date yymmdd10.;
run;

(your Excel seems to be set to only accept MDY order in dates).

xyxu
Quartz | Level 8

If I use the text editor to inspect, it's really hard to distinguish dates with different formats, e.g., "1/3/2019" could be either "Jan 03, 2019" or "Mar 01, 2019". When I open the file in Excel, the date is either left-aligned or right-aligned, allowing me to guess which is the month and which is the day (by comparing with dates like "22/3/2019" that has the same alignment). 

 

I am feeling insecure about getting this into SAS. How do we know we are getting the date right?

Tom
Super User Tom
Super User

Excel will mess up the dates if you let it guess how to convert the text.

Read the field as text from the CSV file.  Then you can write your own code to look at the values and decide which is the best informat to use for that value.   What if the string looks like 10/12/2019?  is that the tenth of December or October twelfth?  You could use information from other fields or other observations for the same subject/patient to make an informed decision.  Excel is just going to pick one.

 

ballardw
Super User

@xyxu wrote:

If I use the text editor to inspect, it's really hard to distinguish dates with different formats, e.g., "1/3/2019" could be either "Jan 03, 2019" or "Mar 01, 2019". When I open the file in Excel, the date is either left-aligned or right-aligned, allowing me to guess which is the month and which is the day (by comparing with dates like "22/3/2019" that has the same alignment). 

 

I am feeling insecure about getting this into SAS. How do we know we are getting the date right?


So you trust a program to guess better that you who should know the data?

This little exercise may help reconsider.

Open an Excel spreadsheet.

Type 1-5. That is a one, the dash character, then a 5.

Hit enter to go to the next cell.

What value does Excel show?

Go back to that cell and set the cell to numeric; right click in the cell, select Format Cells from the menu and then click on the number tab and then Number in the menu.

Does the result look anything like what you typed?

 

Excel does similar to cells when opening a CSV. Which is why you should not use Excel to examine CSV, or at least not trust what it shows. And if you save the CSV after opening in Excel then the values may well have been changed, sometimes drastically.

 

 

xyxu
Quartz | Level 8

Yeah this explanation makes perfect sense. I should definitely determine the date with my own judgement. 

Tom
Super User Tom
Super User

@xyxu wrote:

Yeah this explanation makes perfect sense. I should definitely determine the date with my own judgement. 


And when you look at the file without letting Excel auto-convert the contents you can tell that ALL of the values are compatible with using DMY order.  So just read it using DMMYY informat and you should be good.

997   data test;
998     infile "&path\&fname" dsd truncover firstobs=2;
999     input id $ date_str :$20. ;
1000    date = input(date_str,ddmmyy10.);
1001    format date yymmdd10.;
1002  run;

NOTE: The infile "C:\Downloads\date_format_issue.csv" is:
      Filename=C:\Downloads\date_format_issue.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=1305230,
      Last Modified=03Aug2020:16:02:41,
      Create Time=03Aug2020:16:02:40

NOTE: 82844 records were read from the infile "C:\Downloads\date_format_issue.csv".
      The minimum record length was 11.
      The maximum record length was 15.
NOTE: The data set WORK.TEST has 82844 observations and 3 variables.
xyxu
Quartz | Level 8
Yeah this works. I asked a similar question: what does the "colon" do in the input?
Tom
Super User Tom
Super User

@xyxu wrote:
Yeah this works. I asked a similar question: what does the "colon" do in the input?

That is a modifier on the INPUT statement

input id $ date :mmddyy. ;

that allows you to include an informat specification in the INPUT statement without actually using fixed format input mode.  It basically makes your INPUT of this variable only read the next field on the line instead of a fixed number of columns.  It is critical when reading a delimited file (like a CSV file) as otherwise you might not read all of the field or read past the end of the field into the next field.

https://documentation.sas.com/?docsetId=lestmtsref&docsetTarget=n0lrz3gb7m9e4rn137op544ddg0v.htm&doc...

 

You can also get the same result by assigning an INFORMAT to the variable using an INFORMAT statement.  Then the INPUT statement just needs to list the name and not mention what informat to use.

input id $ date ;
informat date ddmmyy.;

Note that when using list mode to read you do NOT need to give it a width on the informat since the width will be ignored anyway.  SAS will read all of the next field whether it is longer or shorter than the width on the informat.

xyxu
Quartz | Level 8
I see. I will just use the informat approach to avoid making a mistake. Thank you very much!
Kurt_Bremser
Super User

@xyxu wrote:

If I use the text editor to inspect, it's really hard to distinguish dates with different formats, e.g., "1/3/2019" could be either "Jan 03, 2019" or "Mar 01, 2019". When I open the file in Excel, the date is either left-aligned or right-aligned, allowing me to guess which is the month and which is the day (by comparing with dates like "22/3/2019" that has the same alignment). 

 

I am feeling insecure about getting this into SAS. How do we know we are getting the date right?


Looking at the file, it was obvious to me that the dates were in DMY order; the second group of numbers never exceeds 12. Using the data step I proposed, any date not in DMY order would cause a log message, so you have an automated check.

 

PS all this would be moot if people had the intelligence to use the ISO 8601 standardized YYYY-MM-DD date format when moving data.

xyxu
Quartz | Level 8

Result from this code looks right to me, but I am not fully understanding it. What does

 

date :ddmmyy10.;

 

in particular, the colon, tell SAS to do? It seems to be crucial here.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 20420 views
  • 4 likes
  • 5 in conversation