See my comments:
options LOCALE=en_AU;
data fatalities;
infile "/group/FATALITIES.csv" delimiter="," firstobs=2 dsd missover;
informat
FATALITY_ID
EVENT_ID
FATALITY_TYPE
$10. /* if this is not meant for the previous three variables, give each its own informat */
FATALITY_DATE ddmmyy10.
FATALITY_AGE /* if this is numeric, add a humeric informat here */
FATALITY_SEX $xxx. /* insert proper length here */
FATALITY_LOCATION $80.
;
format
FATALITY_DATE date9.
/* no formats needed for pure numbers or strings */
;
input
FATALITY_ID
EVENT_ID
FATALITY_TYPE
FATALITY_DATE
FATALITY_AGE
FATALITY_SEX
FATALITY_LOCATION
;
run;
In case of problems, post the log by copy/pasting it into a window opened with this button:
For future postings, when you share data with your question, please consider providing the data in machine-readable format (like CSV, in this case) instead of PDF. Here's how I identified the problem:
1. I pasted your code into SAS Studio and used the FORMAT CODE feature to make it more readable.
2. Reviewing the formatted code, the issue is easier to spot. The INFORMAT and FORMAT statements contain the entry FATALITY_SEX$
which is causing your error.
data fatalities;
infile "/group/FATALITIES.csv" delimiter="," firstobs=2 dsd missover;
input FATALITY_ID EVENT_ID FATALITY_TYPE$ FATALITY_DATE FATALITY_AGE
FATALITY_SEX$ FATALITY_LOCATION$;
informat FATALITY_ID EVENT_ID FATALITY_TYPE $10. FATALITY_DATE ddmmyy10.
FATALITY_AGE FATALITY_SEX$ FATALITY_LOCATION $80.;
format FATALITY_ID EVENT_ID FATALITY_TYPE $10. FATALITY_DATE date9.
FATALITY_AGE FATALITY_SEX$ FATALITY_LOCATION $80.;
run;
A $ is not acceptable to specify the format/informat here. A valid format specification is required for the FORMAT statement, and a valid informat specification on the INFORMAT statement. Something like this would work:
data fatalities;
infile "/group/FATALITIES.csv" delimiter="," firstobs=2 dsd missover;
input FATALITY_ID EVENT_ID FATALITY_TYPE$ FATALITY_DATE FATALITY_AGE
FATALITY_SEX$ FATALITY_LOCATION$;
informat FATALITY_ID EVENT_ID FATALITY_TYPE $10. FATALITY_DATE ddmmyy10.
FATALITY_AGE FATALITY_SEX $1. FATALITY_LOCATION $80.;
format FATALITY_ID EVENT_ID FATALITY_TYPE $10. FATALITY_DATE date9.
FATALITY_AGE FATALITY_SEX $1. FATALITY_LOCATION $80.;
run;
With an INFORMAT statement provided, the $ characters on the INPUT statement are no longer required to ensure values are read in as character, so they could be removed. And finally, I highly recommend that you don't apply formats to character values - it can make it difficult to view the actual values during downstream processing. So my final recommendation would be something like this:
data fatalities;
infile "/group/FATALITIES.csv" delimiter="," firstobs=2 dsd missover;
informat FATALITY_ID EVENT_ID FATALITY_TYPE $10. FATALITY_DATE ddmmyy10.
FATALITY_AGE FATALITY_SEX $1. FATALITY_LOCATION $80.;
format FATALITY_DATE date9.;
input FATALITY_ID EVENT_ID FATALITY_TYPE FATALITY_DATE FATALITY_AGE
FATALITY_SEX FATALITY_LOCATION$;
run;
May the SAS be with you!
Mark
Hi Jedi,
Thanks for your reply!
I tried your code but it still didnt work in SAS...
I have trouble reading two different formats of date and FATALITY_LOCATION in SAS
I intended to read the dates in AU format and FATALITY_LOCATION properly..
Have been trying to work it out for a few hours and still couldnt work it out.. sigh
I downloaded your csv file to my University Edition and ran this with success:
data fatalities;
infile "/folders/myfolders/F1.csv" delimiter="," firstobs=2 dsd truncover;
informat
FATALITY_ID $5.
EVENT_ID $6.
FATALITY_TYPE $1.
FATALITY_DATE mmddyy10.
FATALITY_AGE 3.
FATALITY_SEX $1.
FATALITY_LOCATION $80.
;
format
FATALITY_DATE yymmdd10.
;
input
FATALITY_ID
EVENT_ID
FATALITY_TYPE
FATALITY_DATE
FATALITY_AGE
FATALITY_SEX
FATALITY_LOCATION
;
run;
Note that the dates are in MMDDYY format in your CSV.
Having the actual data to work with makes a big difference 😁. So, here are a few lines from the CSV:
FATALITY_ID,EVENT_ID,FATALITY_TYPE,FATALITY_DATE,FATALITY_AGE,FATALITY_SEX,FATALITY_LOCATION
38341,817354,D,06/06/2019,97,M,Outside/Open Areas
38480,819069,D,06/09/2019,45,M,Vehicle/Towed Trailer
38532,820152,D,06/20/2019,,M,In Water
It's clear that these dates are written in the typical US format, month/day/year - or MMDDYY in SAS-speak. The dates are 10 characters wide, so we must use the INFORMAT MMDDYY10. to properly read the values. Once the proper INFORMAT for the date is supplied on the INFORMAT statement, it should work just fine:
data fatalities;
infile "/group/FATALITIES.csv" delimiter="," firstobs=2 dsd missover;
informat FATALITY_ID EVENT_ID FATALITY_TYPE $10. FATALITY_DATE mmddyy10.
FATALITY_AGE FATALITY_SEX $1. FATALITY_LOCATION $80.;
format FATALITY_DATE date9.;
input FATALITY_ID EVENT_ID FATALITY_TYPE FATALITY_DATE FATALITY_AGE
FATALITY_SEX FATALITY_LOCATION$;
run;
May the SAS be with you!
Mark
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.