Hello, guys
I have a question about how to convert data in a .csv file. I use sas university edition.
The file has one column, which has both numbers that look like (date type) (ex.10/18/1991) and date (ex.10/10/1997) in the same column.
My goal is combined the month, day, and year into one variable representing the date. However, I m still struggling in converting all of them.
Need guidelines and help.
Data
Date | Team | City | State | OSU | OPP |
10/10/1997 | Toronto | Columbus | Ohio | 5 | 0 |
10/18/1997 | Miami | Oxford | Ohio | 0 | 3 |
10/24/1997 | Merrimack | Columbus | Ohio | 2 | 7 |
10/26/1997 | Merrimack | Columbus | Ohio | 5 | 3 |
10/31/1997 | Clarkson | Potsdam | New York | 1 | 1 |
11/01/1997 | Clarkson | Potsdam | New York | 6 | 2 |
11/07/1997 | Western Michigan | Columbus | Ohio | 1 | 3 |
11/08/1997 | Notre Dame | Columbus | Ohio | 3 | 2 |
data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs = 2 delimiter="/" dsd
;
input
Date DDMMYY8. Team $ City $ State $ OSU $ OPP $
;
proc
print data = hockey;
Thank you
You image looks like the result of loading the data into an Excel version of that is using a different convention for date format than the data. So the values with day of month <= 12 were converted into dates and the other values were left as strings. Note if you want to insert an image into your post here use the "Insert/edit image" (looks like the moon over some mountains) rather than just attaching it as a file. That way readers can see the image without downloading it.
So you need to use the proper delimiter on your INFILE statement. The default is a comma when you use the DSD option. But you will need to look at your actual CSV file with a text editor to see whether someone has sent you a file using a different delimiter and just used the .csv extension for convenience.
You need to use the right INFORMAT for your date values. They look to be in MDY order. You should also attach a FORMAT to your date variable so that SAS will display the value in a human readable form. I would recommend using YYMMDD10 to avoid the confusion between MDY and DMY order that seems to impacting you now. You should also define the lengths of your character variables, otherwise SAS will default to using $8 for the length.
Note if you want to include the informats in the INPUT statement when reading from a delimited file you should always use the colon modifier in front of the informat specification to prevent SAS from "eating" the delimiter when the value is shorter than the width used in the informat specification. And add the TRUNCOVER option to the INFILE statement to properly handle lines that do not every column.
data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs=2 dlm="," dsd truncover
;
input
date :mmyydd10.
team :$10.
city :$20.
state :$2.
OSU :$10.
OPP :$10.
;
format date yymmdd10.;
run;
You're reading in the data incorrectly. For one you assume the date is DDMMYY, when it appears to be MMDDYY.
Use PROC IMPORT to import the data. Look at the log. Take the code from the log and customize it as needed.
But first, review how SAS stores dates and how format/informats work. A SAS date is a number, which represents the number of days from Jan 1, 1960. A format allows it to be shown as anything from the Month alone to the full date, as in Saturday July 8, 2017.
@Koke wrote:
Hello, guys
I have a question about how to convert data in a .csv file. I use sas university edition.
The file has one column, which has both numbers that look like (date type) (ex.10/18/1991) and date (ex.10/10/1997) in the same column.
My goal is combined the month, day, and year into one variable representing the date. However, I m still struggling in converting all of them.
Need guidelines and help.
Data
Date Team City State OSU OPP 10/10/1997 Toronto Columbus Ohio 5 0 10/18/1997 Miami Oxford Ohio 0 3 10/24/1997 Merrimack Columbus Ohio 2 7 10/26/1997 Merrimack Columbus Ohio 5 3 10/31/1997 Clarkson Potsdam New York 1 1 11/01/1997 Clarkson Potsdam New York 6 2 11/07/1997 Western Michigan Columbus Ohio 1 3 11/08/1997 Notre Dame Columbus Ohio 3 2 data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs = 2 delimiter="/" dsd
;
input
Date DDMMYY8. Team $ City $ State $ OSU $ OPP $
;
proc
print data = hockey;Thank you
You have the wrong informat for your date variable. Try it with mmddyy10.
You probably will want to include a format for it, as well, so that you can see that it worked. I usually like to use date9. for such formats, but any date format would work.
Art, CEO, AnalystFinder.com
Are you sure your input is a CSV (Comma Separated) file?
YES - means your delimiter is ',' - but your code contains delimiter = '/'
which is wrong.
Your input date seems to be in MM/DD/YYYY format.
If it is a csv file try next code:
data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs = 2 delimiter="," dsd;
input date mmddyy10.
Team $ City $ State $ OSU $ OPP $ ;
run;
proc print=hockey; run;
Thank you, Shmuel
I fix mistakes follow from your code. In the output, some observation in date column shows the number that I want, but some show (.).
So I look back to the data file.
Date |
10/10/1997 (date type in csv.) |
10/18/1997 (general type in csv.) |
10/24/1997 (general type in csv.) |
10/26/1997 (general type in csv.) |
10/31/1997 (general type in csv.) |
11/01/1997(date type in csv.) |
11/07/1997(date type in csv.) |
11/08/1997(date type in csv.) |
This is an output
Obs date Team City State OSU OPP 1 2 3 4 5 6 7 8
13797 | Toronto | Columbus | Ohio | 5 | |
13805 | Miami | Oxford | Ohio | 0 | |
13811 | Merrimac | Columbus | Ohio | 2 | |
13813 | Merrimac | Columbus | Ohio | 5 | |
13818 | Clarkson | Potsdam | New York | 1 | |
. | Clarkson | Potsdam | New York | 6 | 2 |
. | Western | Columbus | Ohio | 1 | 3 |
. | Notre Da | Columbus | Ohio | 3 | 2 |
I assume that the problem is I have both number type and date type in the same column. As a result, the (.) show in some of the column. The log shows this statement WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
Here all of the log
According to the log the date, in some rows, is not in mmddyy10. format.
On line 7 the date is: 11/1/1997
On line 19 the date is 1/3/1998
that means that day/month don't include leading zero.
I didn't find an informat to convert those to dates, therefore, new suggestion:
data hockey; infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv" firstobs = 2 delimiter="," dsd; input datex $ Team $ City $ State $ OSU $ OPP $ ; mm = input(scan(datex,1,'/'),best2.); dd = input(scan(datex,2
,'/'
),best2.);
yy = input(scan(datex,3,'/'),best4.);
date = mdy(mm,dd,yy);
drop datex;
run;
@Shmuel: mmddyy10. informat doesn't require leading zeros. The errors, I'm fairly certain, came from not having using the : operator for the text fields.
Art, CEO, AnalystFinder.com
@art297, thank you. The : informats are new to me and I'm still not used to use them.
I have checked the :dmmddyy10. informat and found it works fine.
Thanks, again.
The data getting read into the wrong columns is the result of using formatted input mode instead of list input mode when reading a delimited file. If you want to list the informat in the INPUT statement then you need to add the : modifier to keep your INPUT statement as one that is only using list mode input. Otherwise when the actual date uses fewer than 10 characters the delimiter will become part of the value read. This will both mess up the value and get SAS confused about where it should be on the line to read the next field.
input date :mmddyy10. Team :$8. City $ State $ OSU $ OPP $ ;
I find if instead use a LENGTH statement to define your variables and an INFORMAT statement to tell SAS how to read the value then writing the INPUT statement is easier and these types or mistakes can be avoided. If you define all of the variables in advance then you do not even need the $ on the input statement since SAS already knows the variables are character.
length date 8 Team $8 City $8 State $8 OSU $8 OPP $8 ;
informat date mmddyy10. ;
format date mmddyy10. ;
input date Team City State OSU OPP ;
You image looks like the result of loading the data into an Excel version of that is using a different convention for date format than the data. So the values with day of month <= 12 were converted into dates and the other values were left as strings. Note if you want to insert an image into your post here use the "Insert/edit image" (looks like the moon over some mountains) rather than just attaching it as a file. That way readers can see the image without downloading it.
So you need to use the proper delimiter on your INFILE statement. The default is a comma when you use the DSD option. But you will need to look at your actual CSV file with a text editor to see whether someone has sent you a file using a different delimiter and just used the .csv extension for convenience.
You need to use the right INFORMAT for your date values. They look to be in MDY order. You should also attach a FORMAT to your date variable so that SAS will display the value in a human readable form. I would recommend using YYMMDD10 to avoid the confusion between MDY and DMY order that seems to impacting you now. You should also define the lengths of your character variables, otherwise SAS will default to using $8 for the length.
Note if you want to include the informats in the INPUT statement when reading from a delimited file you should always use the colon modifier in front of the informat specification to prevent SAS from "eating" the delimiter when the value is shorter than the width used in the informat specification. And add the TRUNCOVER option to the INFILE statement to properly handle lines that do not every column.
data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs=2 dlm="," dsd truncover
;
input
date :mmyydd10.
team :$10.
city :$20.
state :$2.
OSU :$10.
OPP :$10.
;
format date yymmdd10.;
run;
A few slight changes to @Tom's suggested code will give you what you want:
data hockey; infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv" firstobs=2 dlm="," dsd truncover ; input date :mmddyy10. team :$20. city :$20. state :$15. OSU OPP ; format date mmddyy10.; run;
Art, CEO, AnalystFinder.com
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.