Hi,
Wondering if someone can help? (Using SAS OnDemand for Academics)
I'm trying to create a dataset in SAS. This is the code I have:
DATA BI.Portcalls2018;
infile '/home/u57411980/sas/Portcalls2018.csv' dlm=',';
input Country :$10. Ship Type :$32. Median time in port (days) :$4. Average age of vessels :$2. Average size (GT) of vessels :$5. Maximum size (GT) of vessels :$5. Average cargo carrying capacity :$5. Maximum cargo carrying capacity :$5. Average container carrying capac :$4. Maximum container carrying capac :$4. ;
run;
proc print DATA=BI.Portcalls2018;
run;
I then hit run and check the Log and I get these errors, which I don't understand:
How many columns are in the CSV file?
Your current code is attempting to read 38 variables.
input
Country :$10.
Ship
Type :$32.
Median
time
in
port
(days) :$4.
Average
age
of
vessels :$2.
Average
size
(GT)
of
vessels :$5.
Maximum
size
(GT)
of
vessels :$5.
Average
cargo
carrying
capacity :$5.
Maximum
cargo
carrying
capacity :$5.
Average
container
carrying
capac :$4.
Maximum
container
carrying
capac :$4.
;
You seem to be trying to use LABELS as NAMES for the variables. Use the NAMES of the variables in the INPUT statement. You can then use a LABEL statement to attach those long descriptive strings (which will be cumbersome at best to use as names) to the variables as the LABEL of the variable.
The specific error in your INPUT statement that SAS is flagging is the missing format group following the variable group because you include some variable names, like days or GT, in parenthesis.
Other mistakes include not using the DSD and TRUNCOVER option on the INFILE statement. The first will make sure that quoted and empty values are processed properly. The later will prevent SAS from moving to a new line if any line does not have values for all of the variables being read by the INPUT statement.
This is one time I would suggest running Proc Import on your CSV file. The LOG will show the data step code to read the file.
Note that unless you have the option validvarname=any; set then variable names can only start with a letter or underscore character and contain letters, the underscore or digits. No spaces ( ) or other characters.
The INPUT statement does use ( and ) but that relates to instruction on how to read a group of variables and your posted code does mean those rules.
Start with these two documentation pages for the input statement:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lestmtsref/n0oaql83drile0n141pdacojq97s.htm
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lestmtsref/n0lrz3gb7m9e4rn137op544ddg0v.htm
Best practice on the forum is to copy the text from the log then on the forum open a text box with the </> icon and paste the text. The diagnostic characters that would appear with this, and my other characters, then appear in the correct position in relation to the code where the main message window has reformatted the text.
Question: Why would want to read median or average values as character? That doesn't make sense for modeling or further analysis in most forms.
A variable that doesn't fully comply with SAS naming standards must be treated as a SAS name literal and can only get referenced using syntax:
'<name literal>'n
The first non-compliant variable in your list is Ship Type. Because there is a blank you can only use such a name as literal via syntax: 'Ship Type'n
Right now the SAS compiler will interpret these two terms as two variable: Ship and Type
The next non-compliant variable is Median time in port (days)
Up to the bracket the SAS compiler will interpret each term as a variable and that's why the compiler only throws an error when it encounters the opening bracket as this is invalid syntax in any case.
As long as your names don't exceed 32 characters you could use SAS name literals. It's not really recommended though as using such variable names in code is just harder and cumbersome.
Normally, SAS does not support variable names that have blanks and special characters like parantheses in them. If you need to do that, you will have to set the option VALIDVARNAME=ANY, and use a special syntax to refer to the names, like this:
DATA BI.Portcalls2018;
infile '/home/u57411980/sas/Portcalls2018.csv' dlm=',';
input Country :$10. 'Ship Type'n :$32. 'Median time in port (days)'n :$4. 'Average age of vessels'n :$2. 'Average size (GT) of vessels'n :$5. 'Maximum size (GT) of vessels'n :$5. 'Average cargo carrying capacity'n :$5. 'Maximum cargo carrying capacity'n :$5. 'Average container carrying capac'n :$4. 'Maximum container carrying capac'n :$4. ;
run;
And you will then have to refer to them that way all the time.
Another possibility is to use the long names as variable labels, and use shorter name for the actual variables, e.g.:
DATA BI.Portcalls2018;
infile '/home/u57411980/sas/Portcalls2018.csv' dlm=',';
input Country :$10. Ship_Type :$32. Days_in_port :$4. ....
label
Ship_type='Ship Type'
Days_in_port='Median time in port (days)'
;
run;
Then you can use the shorter names in your programs, and get the long names in PROC PRINT etc.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.