BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Koke
Fluorite | Level 6

Hello, guys

I have a question about how to convert data in a .csv file. I use sas university edition.

The file has one column, which has both numbers that look like (date type) (ex.10/18/1991) and date (ex.10/10/1997) in the same column.

My goal is combined the month, day, and year into one variable representing the date. However, I m still struggling in converting all of them. 

Need guidelines and help. 

Data

DateTeamCityStateOSUOPP
10/10/1997 TorontoColumbus  Ohio 50
10/18/1997Miami Oxford  Ohio 03
10/24/1997Merrimack   Columbus  Ohio 27
10/26/1997Merrimack   Columbus  Ohio  53
10/31/1997ClarksonPotsdam  New York11
11/01/1997ClarksonPotsdam  New York62
11/07/1997Western MichiganColumbus  Ohio13
11/08/1997Notre DameColumbus  Ohio32
data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs = 2 delimiter="/" dsd
;
input
Date DDMMYY8. Team $ City $ State $ OSU $ OPP $
;
proc
print data = hockey;

Thank you

 


3.3.png
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You image looks like the result of loading the data into an Excel version of that is using a different convention for date format than the data. So the values with day of month <= 12 were converted into dates and the other values were left as strings.  Note if you want to insert an image into your post here use the "Insert/edit image" (looks like the moon over some mountains) rather than just attaching it as a file.  That way readers can see the image without downloading it.

 

So you need to use the proper delimiter on your INFILE statement. The default is a comma when you use the DSD option.  But you will need to look at your actual CSV file with a text editor to see whether someone has sent you a file using a different delimiter and just used the .csv extension for convenience.

 

You need to use the right INFORMAT for your date values. They look to be in MDY order. You should also attach a FORMAT to your date variable so that SAS will display the value in a human readable form.  I would recommend using YYMMDD10 to avoid the confusion between MDY and DMY order that seems to impacting you now.   You should also define the lengths of your character variables, otherwise SAS will default to using $8 for the length.

 

Note if you want to include the informats in the INPUT statement when reading from a delimited file you should always use the colon modifier in front of the informat specification to prevent SAS from "eating" the delimiter when the value is shorter than the width used in the informat specification.  And add the TRUNCOVER option to the INFILE statement to properly handle lines that do not every column.

 

data hockey;
  infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
    firstobs=2 dlm="," dsd truncover
  ;
  input
    date :mmyydd10.
    team :$10.
    city :$20.
    state :$2.
    OSU :$10.
    OPP :$10.
  ;
  format date yymmdd10.;
run;

View solution in original post

10 REPLIES 10
Reeza
Super User

You're reading in the data incorrectly. For one you assume the date is DDMMYY, when it appears to be MMDDYY.

 

Use PROC IMPORT to import the data. Look at the log. Take the code from the log and customize it as needed.

 

But first, review how SAS stores dates and how format/informats work. A SAS date is a number, which represents the number of days from Jan 1, 1960. A format allows it to be shown as anything from the Month alone to the full date, as in Saturday July 8, 2017. 

 


@Koke wrote:

Hello, guys

I have a question about how to convert data in a .csv file. I use sas university edition.

The file has one column, which has both numbers that look like (date type) (ex.10/18/1991) and date (ex.10/10/1997) in the same column.

My goal is combined the month, day, and year into one variable representing the date. However, I m still struggling in converting all of them. 

Need guidelines and help. 

Data

Date Team City State OSU OPP
10/10/1997  Toronto Columbus   Ohio  5 0
10/18/1997 Miami  Oxford   Ohio  0 3
10/24/1997 Merrimack    Columbus   Ohio  2 7
10/26/1997 Merrimack    Columbus   Ohio   5 3
10/31/1997 Clarkson Potsdam   New York 1 1
11/01/1997 Clarkson Potsdam   New York 6 2
11/07/1997 Western Michigan Columbus   Ohio 1 3
11/08/1997 Notre Dame Columbus   Ohio 3 2
data hockey;
infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
firstobs = 2 delimiter="/" dsd
;
input
Date DDMMYY8. Team $ City $ State $ OSU $ OPP $
;
proc
print data = hockey;

Thank you

 


 

 

art297
Opal | Level 21

You have the wrong informat for your date variable. Try it with mmddyy10. 

 

You probably will want to include a format for it, as well, so that you can see that it worked. I usually like to use date9. for such formats, but any date format would work.

 

Art, CEO, AnalystFinder.com

 

Shmuel
Garnet | Level 18

Are you sure your input is a CSV (Comma Separated) file?

YES - means your delimiter is ',' - but your code contains delimiter = '/' 

which is wrong.

 

Your input date seems to be in MM/DD/YYYY format.

 

If it is a csv file try next code:

data hockey;
  infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
           firstobs = 2 delimiter="," dsd;

  input date mmddyy10.  
          Team $ City $ State $ OSU $ OPP $ ;
run;

proc print=hockey; run;
Koke
Fluorite | Level 6

Thank you, Shmuel

I fix mistakes follow from your code. In the output, some observation in date column shows the number that I want, but some show (.).

So I look back to the data file. 

Date
10/10/1997 (date type in csv.)
10/18/1997 (general type in csv.)
10/24/1997 (general type in csv.)
10/26/1997 (general type in csv.)
10/31/1997 (general type in csv.)
11/01/1997(date type in csv.)
11/07/1997(date type in csv.)
11/08/1997(date type in csv.)

 

This is an output

Obs date Team City State OSU OPP 1 2 3 4 5 6 7 8

13797 TorontoColumbusOhio5
13805 MiamiOxfordOhio0
13811 MerrimacColumbusOhio2
13813 MerrimacColumbusOhio5
13818 ClarksonPotsdamNew York1
.ClarksonPotsdamNew York62
.WesternColumbusOhio13
.Notre DaColumbusOhio32

 

I assume that the problem is I have both number type and date type in the same column. As a result, the (.) show in some of the column. The log shows this statement WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.

 

Here all of the log

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 data hockey;
63 infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
64 firstobs = 2 delimiter="," dsd;
65
66 input date mmddyy10. Team $ 8. City $ State $ OSU $ OPP $ ;
67 run;
 
NOTE: The infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv" is:
(no system-specific pathname available),
(no system-specific file attributes available)
 
NOTE: Invalid data for date in line 7 1-10.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
7 11/1/1997,Clarkson,Potsdam , New York,6,2 41
date=. Team=Clarkson City= State=Potsdam OSU=New York OPP=6 _ERROR_=1 _N_=6
NOTE: Invalid data for date in line 8 1-10.
8 11/7/1997,Western Michigan,Columbus , Ohio,1,3 46
date=. Team=Western City=Michigan State=Columbus OSU=Ohio OPP=1 _ERROR_=1 _N_=7
NOTE: Invalid data for date in line 9 1-10.
9 11/8/1997,Notre Dame,Columbus , Ohio,3,2 40
date=. Team=Notre Da City=me State=Columbus OSU=Ohio OPP=3 _ERROR_=1 _N_=8
NOTE: Invalid data for date in line 14 1-10.
14 12/5/1997,Alaska-Fairbanks,Columbus , Ohio,8,3 46
date=. Team=Alaska-F City=airbanks State=Columbus OSU=Ohio OPP=8 _ERROR_=1 _N_=13
NOTE: Invalid data for date in line 15 1-10.
15 12/6/1997,Alaska-Fairbanks,Columbus , Ohio,4,0 46
date=. Team=Alaska-F City=airbanks State=Columbus OSU=Ohio OPP=4 _ERROR_=1 _N_=14
NOTE: Invalid data for date in line 18 1-10.
18 1/2/1998,Michigan,Ann Arbor, Michigan,4,2 41
date=. Team=ichigan, City=Ann Arbo State=Michigan OSU=4 OPP=2 _ERROR_=1 _N_=17
NOTE: Invalid data for date in line 19 1-10.
19 1/3/1998,Michigan,Ann Arbor, Michigan,6,0 41
date=. Team=ichigan, City=Ann Arbo State=Michigan OSU=6 OPP=0 _ERROR_=1 _N_=18
NOTE: Invalid data for date in line 20 1-10.
20 1/9/1998,Lake Superior,Columbus , Ohio,7,0 42
date=. Team=ake Supe City=rior State=Columbus OSU=Ohio OPP=7 _ERROR_=1 _N_=19
NOTE: Invalid data for date in line 21 1-10.
21 1/10/1998,Ferris State,Columbus , Ohio,5,3 42
date=. Team=Ferris S City=tate State=Columbus OSU=Ohio OPP=5 _ERROR_=1 _N_=20
NOTE: Invalid data for date in line 22 1-10.
22 1/18/1998,Bowling Green,Columbus , Ohio,4,2 43
date=. Team=Bowling City=Green State=Columbus OSU=Ohio OPP=4 _ERROR_=1 _N_=21
NOTE: Invalid data for date in line 23 1-10.
23 1/24/1998,Northern Michigan,Columbus , Ohio,2,0 47
date=. Team=Northern City=Michigan State=Columbus OSU=Ohio OPP=2 _ERROR_=1 _N_=22
NOTE: Invalid data for date in line 24 1-10.
24 1/25/1998,Notre Dame,Columbus , Ohio,5,3 40
date=. Team=Notre Da City=me State=Columbus OSU=Ohio OPP=5 _ERROR_=1 _N_=23
NOTE: Invalid data for date in line 25 1-10.
25 1/30/1998,Western Michigan,Kalamazoo, Michigan,4,2 50
date=. Team=Western City=Michigan State=Kalamazo OSU=Michigan OPP=4 _ERROR_=1 _N_=24
NOTE: Invalid data for date in line 26 1-10.
26 2/6/1998,Michigan State,Columbus , Ohio,4,2 43
date=. Team=ichigan City=State State=Columbus OSU=Ohio OPP=4 _ERROR_=1 _N_=25
NOTE: Invalid data for date in line 27 1-10.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
27 2/7/1998,Alaska-Fairbanks,Columbus , Ohio,4,4 45
date=. Team=laska-Fa City=irbanks State=Columbus OSU=Ohio OPP=4 _ERROR_=1 _N_=26
NOTE: Invalid data for date in line 28 1-10.
28 2/13/1998,Notre Dame,South Bend,Indiana,5,3 43
date=. Team=Notre Da City=me State=South Be OSU=Indiana OPP=5 _ERROR_=1 _N_=27
NOTE: Invalid data for date in line 29 1-10.
29 2/14/1998,Michigan State,East Lansing,Michigan,4,1 50
date=. Team=Michigan City=State State=East Lan OSU=Michigan OPP=4 _ERROR_=1 _N_=28
NOTE: Invalid data for date in line 30 1-10.
30 2/26/1998,Miami,Columbus , Ohio,5,2 35
date=. Team=Miami,Co City=lumbus State=Ohio OSU=5 OPP=2 _ERROR_=1 _N_=29
NOTE: Invalid data for date in line 31 1-10.
31 3/13/1998,Lake Superior,Columbus , Ohio,2,1 43
date=. Team=Lake Sup City=erior State=Columbus OSU=Ohio OPP=2 _ERROR_=1 _N_=30
NOTE: Invalid data for date in line 32 1-10.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
32 3/14/1998,Lake Superior,Columbus , Ohio,6,0 43
date=. Team=Lake Sup City=erior State=Columbus OSU=Ohio OPP=6 _ERROR_=1 _N_=31
NOTE: 36 records were read from the infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv".
The minimum record length was 35.
The maximum record length was 56.
NOTE: The data set WORK.HOCKEY has 36 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.02 seconds
 
 
68
69 proc print data = hockey; run;
 
NOTE: There were 36 observations read from the data set WORK.HOCKEY.
NOTE: PROCEDURE PRINT used (Total process time):
real time 1.56 seconds
cpu time 1.50 seconds
 
 
70
71 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
84
 
Thank you for your time and guiding

 

Shmuel
Garnet | Level 18

According to the log the date, in some rows, is not in mmddyy10. format.

On line 7 the date is: 11/1/1997

On line 19 the date is 1/3/1998

that means that day/month don't include leading zero.

 

I didn't find an informat to convert those to dates, therefore, new suggestion:

data hockey;
  infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
           firstobs = 2 delimiter="," dsd;

  input datex  $  
          Team $ City $ State $ OSU $ OPP $ ;

  mm = input(scan(datex,1,'/'),best2.);
  dd = input(scan(datex,2,'/'),best2.); 
yy = input(scan(datex,3,'/'),best4.);
date = mdy(mm,dd,yy);
drop datex;
run;

 

art297
Opal | Level 21

@Shmuel: mmddyy10. informat doesn't require leading zeros. The errors, I'm fairly certain, came from not having using the : operator for the text fields.

 

Art, CEO, AnalystFinder.com

 

Shmuel
Garnet | Level 18

@art297, thank you. The : informats are new to me and I'm still not used to use them.

I have checked the :dmmddyy10. informat and found it works fine.

Thanks, again.

Tom
Super User Tom
Super User

The data getting read into the wrong columns is the result of using formatted input mode instead of list input mode when reading a delimited file.  If you want to list the informat in the INPUT statement then you need to add the : modifier to keep your INPUT statement as one that is only using list mode input.  Otherwise when the actual date uses fewer than 10 characters the delimiter will become part of the value read. This will both mess up the value and get SAS confused about where it should be on the line to read the next field.

input date :mmddyy10. Team :$8. City $ State $ OSU $ OPP $ ;

I find if instead use a LENGTH statement to define your variables and an INFORMAT statement to tell SAS how to read the value then writing the INPUT statement is easier and these types or mistakes can be avoided.  If you define all of the variables in advance then you do not even need the $ on the input statement since SAS already knows the variables are character.

length date 8 Team $8 City $8 State $8 OSU $8 OPP $8 ;
informat date mmddyy10. ;
format date mmddyy10. ;
input date Team City State OSU OPP ;
Tom
Super User Tom
Super User

You image looks like the result of loading the data into an Excel version of that is using a different convention for date format than the data. So the values with day of month <= 12 were converted into dates and the other values were left as strings.  Note if you want to insert an image into your post here use the "Insert/edit image" (looks like the moon over some mountains) rather than just attaching it as a file.  That way readers can see the image without downloading it.

 

So you need to use the proper delimiter on your INFILE statement. The default is a comma when you use the DSD option.  But you will need to look at your actual CSV file with a text editor to see whether someone has sent you a file using a different delimiter and just used the .csv extension for convenience.

 

You need to use the right INFORMAT for your date values. They look to be in MDY order. You should also attach a FORMAT to your date variable so that SAS will display the value in a human readable form.  I would recommend using YYMMDD10 to avoid the confusion between MDY and DMY order that seems to impacting you now.   You should also define the lengths of your character variables, otherwise SAS will default to using $8 for the length.

 

Note if you want to include the informats in the INPUT statement when reading from a delimited file you should always use the colon modifier in front of the informat specification to prevent SAS from "eating" the delimiter when the value is shorter than the width used in the informat specification.  And add the TRUNCOVER option to the INFILE statement to properly handle lines that do not every column.

 

data hockey;
  infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
    firstobs=2 dlm="," dsd truncover
  ;
  input
    date :mmyydd10.
    team :$10.
    city :$20.
    state :$2.
    OSU :$10.
    OPP :$10.
  ;
  format date yymmdd10.;
run;
art297
Opal | Level 21

A few slight changes to @Tom's suggested code will give you what you want:

data hockey;
  infile "/folders/myshortcuts/SASUniversityEdition/module 3/hockey.csv"
    firstobs=2 dlm="," dsd truncover
  ;
  input
    date :mmddyy10.
    team :$20.
    city :$20.
    state :$15.
    OSU
    OPP
  ;
  format date mmddyy10.;
run;

Art, CEO, AnalystFinder.com

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2049 views
  • 4 likes
  • 5 in conversation