I have a CSV file in which date variables look weird. The following picture is what I see when if I open the CSV in Excel. Also, many rows refused to change when I attempted to format the column into a single date format in Excel.
Please see the attached file as an example. When I import it into SAS, I use
data date_format;
	infile "&in_path\date_format_issue.csv" delimiter = ',' dsd missover lrecl = 32767 firstobs = 2;
	informat id 20. date anydtdte.;
	format date yymmdd10.;
	input id date;
run;When I check the number of unique IDs per month using
proc sql;
	create table temp as
	select distinct intnx('month', date, 0, 'e') as month format yymmdd10., count(unique id) as num
	from date_format
	group by month
	order by month;
quit;I found that the date variable is causing problems. The numbers are incorrect because dates are corrupt. For example, there should be 1300+ unique IDs in Jan, but I got only 588. When my friend uses his computer to open the CSV file, he was able to format the date in Excel correctly. Unfortunately I couldn't do this in my computer. How can I import the dates accurately into SAS?
@xyxu wrote:
Yeah this explanation makes perfect sense. I should definitely determine the date with my own judgement.
And when you look at the file without letting Excel auto-convert the contents you can tell that ALL of the values are compatible with using DMY order. So just read it using DMMYY informat and you should be good.
997   data test;
998     infile "&path\&fname" dsd truncover firstobs=2;
999     input id $ date_str :$20. ;
1000    date = input(date_str,ddmmyy10.);
1001    format date yymmdd10.;
1002  run;
NOTE: The infile "C:\Downloads\date_format_issue.csv" is:
      Filename=C:\Downloads\date_format_issue.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=1305230,
      Last Modified=03Aug2020:16:02:41,
      Create Time=03Aug2020:16:02:40
NOTE: 82844 records were read from the infile "C:\Downloads\date_format_issue.csv".
      The minimum record length was 11.
      The maximum record length was 15.
NOTE: The data set WORK.TEST has 82844 observations and 3 variables.
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		data read ;
	infile "<path>\date_format_issue.csv" dlm="," firstobs=2 obs=20 ;
	input 
		id $
		date : ddmmyy10. ;
	put id= date= date7. ;
run ;Try the above code, this should work
Do not use Excel to inspect CSV files. Excel will show you what it thinks is in the file, which is very often not what's actually in there. Always use text editors to inspect CSV or other text files you want to import into SAS.
For this particular file, use:
data want;
infile "path to your file" dlm=',' truncover;
input id $ date :ddmmyy10.;
format date yymmdd10.;
run;(your Excel seems to be set to only accept MDY order in dates).
If I use the text editor to inspect, it's really hard to distinguish dates with different formats, e.g., "1/3/2019" could be either "Jan 03, 2019" or "Mar 01, 2019". When I open the file in Excel, the date is either left-aligned or right-aligned, allowing me to guess which is the month and which is the day (by comparing with dates like "22/3/2019" that has the same alignment).
I am feeling insecure about getting this into SAS. How do we know we are getting the date right?
Excel will mess up the dates if you let it guess how to convert the text.
Read the field as text from the CSV file. Then you can write your own code to look at the values and decide which is the best informat to use for that value. What if the string looks like 10/12/2019? is that the tenth of December or October twelfth? You could use information from other fields or other observations for the same subject/patient to make an informed decision. Excel is just going to pick one.
@xyxu wrote:
If I use the text editor to inspect, it's really hard to distinguish dates with different formats, e.g., "1/3/2019" could be either "Jan 03, 2019" or "Mar 01, 2019". When I open the file in Excel, the date is either left-aligned or right-aligned, allowing me to guess which is the month and which is the day (by comparing with dates like "22/3/2019" that has the same alignment).
I am feeling insecure about getting this into SAS. How do we know we are getting the date right?
So you trust a program to guess better that you who should know the data?
This little exercise may help reconsider.
Open an Excel spreadsheet.
Type 1-5. That is a one, the dash character, then a 5.
Hit enter to go to the next cell.
What value does Excel show?
Go back to that cell and set the cell to numeric; right click in the cell, select Format Cells from the menu and then click on the number tab and then Number in the menu.
Does the result look anything like what you typed?
Excel does similar to cells when opening a CSV. Which is why you should not use Excel to examine CSV, or at least not trust what it shows. And if you save the CSV after opening in Excel then the values may well have been changed, sometimes drastically.
Yeah this explanation makes perfect sense. I should definitely determine the date with my own judgement.
@xyxu wrote:
Yeah this explanation makes perfect sense. I should definitely determine the date with my own judgement.
And when you look at the file without letting Excel auto-convert the contents you can tell that ALL of the values are compatible with using DMY order. So just read it using DMMYY informat and you should be good.
997   data test;
998     infile "&path\&fname" dsd truncover firstobs=2;
999     input id $ date_str :$20. ;
1000    date = input(date_str,ddmmyy10.);
1001    format date yymmdd10.;
1002  run;
NOTE: The infile "C:\Downloads\date_format_issue.csv" is:
      Filename=C:\Downloads\date_format_issue.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=1305230,
      Last Modified=03Aug2020:16:02:41,
      Create Time=03Aug2020:16:02:40
NOTE: 82844 records were read from the infile "C:\Downloads\date_format_issue.csv".
      The minimum record length was 11.
      The maximum record length was 15.
NOTE: The data set WORK.TEST has 82844 observations and 3 variables.
					
				
			
			
				
			
			
			
			
			
			
			
		
@xyxu wrote:
Yeah this works. I asked a similar question: what does the "colon" do in the input?
That is a modifier on the INPUT statement
input id $ date :mmddyy. ;that allows you to include an informat specification in the INPUT statement without actually using fixed format input mode. It basically makes your INPUT of this variable only read the next field on the line instead of a fixed number of columns. It is critical when reading a delimited file (like a CSV file) as otherwise you might not read all of the field or read past the end of the field into the next field.
You can also get the same result by assigning an INFORMAT to the variable using an INFORMAT statement. Then the INPUT statement just needs to list the name and not mention what informat to use.
input id $ date ;
informat date ddmmyy.;Note that when using list mode to read you do NOT need to give it a width on the informat since the width will be ignored anyway. SAS will read all of the next field whether it is longer or shorter than the width on the informat.
@xyxu wrote:
If I use the text editor to inspect, it's really hard to distinguish dates with different formats, e.g., "1/3/2019" could be either "Jan 03, 2019" or "Mar 01, 2019". When I open the file in Excel, the date is either left-aligned or right-aligned, allowing me to guess which is the month and which is the day (by comparing with dates like "22/3/2019" that has the same alignment).
I am feeling insecure about getting this into SAS. How do we know we are getting the date right?
Looking at the file, it was obvious to me that the dates were in DMY order; the second group of numbers never exceeds 12. Using the data step I proposed, any date not in DMY order would cause a log message, so you have an automated check.
PS all this would be moot if people had the intelligence to use the ISO 8601 standardized YYYY-MM-DD date format when moving data.
Result from this code looks right to me, but I am not fully understanding it. What does
date :ddmmyy10.;
in particular, the colon, tell SAS to do? It seems to be crucial here.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
