The semicolon is seems to be optional for data after the DATALINES statement. Honestly this is the first time I have encountered this in SAS programming so far, I wonder if there are other situations where semicolon is optional. I thought semicolon is compulsory in SAS statements.
data setA; input Num VarA $; datalines; 1 A1 2 A2 3 A3 ; run; proc print data=setA;run; data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 run; proc print data=setB;run;
And if you don't put the seimicolon in a new line but after the last data line like
3 A3;
That obs will not be read.
The run is more or less optional if there is a ;
Your behavior for
data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 run;
Is basically the same as
data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 ;
However if you do not have one of those form, or the ;;;; if using datalines4 or cards 4, you are likely to generate invalid data messages unless the next thing encountered is an implied section such as a Proc statement. Consider:
data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 /* some comment */ proc print; run;
No semicolon, invalid data message:
673 data setB; 674 input Num VarB $; 675 datalines; NOTE: Invalid data for Num in line 680 1-2. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 680 /* some comment */ Num=. VarB=some _ERROR_=1 _N_=4 NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.SETB has 4 observations and 2 variables. N
and a result of
The run is more or less optional if there is a ;
Your behavior for
data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 run;
Is basically the same as
data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 ;
However if you do not have one of those form, or the ;;;; if using datalines4 or cards 4, you are likely to generate invalid data messages unless the next thing encountered is an implied section such as a Proc statement. Consider:
data setB; input Num VarB $; datalines; 1 B1 2 B2 3 B3 /* some comment */ proc print; run;
No semicolon, invalid data message:
673 data setB; 674 input Num VarB $; 675 datalines; NOTE: Invalid data for Num in line 680 1-2. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 680 /* some comment */ Num=. VarB=some _ERROR_=1 _N_=4 NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.SETB has 4 observations and 2 variables. N
and a result of
I would not characterize the semicolon as optional here, even though you observe that things work in your test case with or without it.
DATALINES is a declarative statement, which means it is processed during DATA step compilation. The compiler here might seem to be forgiving...but I think any of would advise you to include the semicolon because it's clear you're ending the statement there, and it's always good provide clear indicators to the compiler.
DATALINES4 (you might have read) is an alternative to use when your input data might contain semicolons. In that case you end the statement with 4 semicolons.
DATALINES is an alias for the CARDS statement, which harkens back to the time when data was fed into SAS programs using punch cards. And that system had very little room for ambiguity!
I would like to see consistent behavior of the DATALINES statement and its termination.
run;
should be treated as data followed by a terminating semicolon, resulting in an ERROR or a LOST CARD in most cases.
There is a consistency here, buried in the syntax. After DATALINES, the first line that contains a semicolon is a programming statement. Everything before that is data. This program works perfectly fine (or at least it did the last time I checked):
data setB;
input Num VarB $;
datalines;
1 B1
2 B2
3 B3
proc means;
run;
It treats PROC MEANS as a programming statement, and computes statistics based on NUM.
Proc and section breaks work IF the Proc or other section word line of code contains the semicolon
If not, as in this example, Proc print or other section word such as Data becomes data until the line with the ; is encountered:
data junk; input a b $; datalines; 1 a 2 b 3 c proc print data=junk; run;
Which treats "proc print" as data.
733 data junk; 734 input a b $; 735 datalines; NOTE: Invalid data for a in line 739 1-4. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 739 proc print a=. b=print _ERROR_=1 _N_=4 NOTE: The data set WORK.JUNK has 4 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 740 data=junk; ---- 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 741 run;
Or try:
data junk; input a b $; datalines; 1 a 2 b 3 c data junk2 junk3; set junk; if a=2 then output junk2; else output junk3; run;
"Normally" a DATA statement might end the previous one, but if the data statement continues to another line or more than the lines until the semicolon are encountered are data.
So just simplify your life and use a separate semicolon, or 4, to end your data lines block and don't rely on special treatment of some lines that do not require the semicolon to be on the same code line as the start of the statement.
@Kurt_Bremser wrote:
I would like to see consistent behavior of the DATALINES statement and its termination.
run;
should be treated as data followed by a terminating semicolon, resulting in an ERROR or a LOST CARD in most cases.
But that would be inconsistent with how it handles this code.
data want;
input a b c ;
datalines;
1 2 3
4 5 6
7 8 9;
proc print;
run;
How many observations does WANT have?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.