BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Geoghegan
Obsidian | Level 7

I'm trying to follow the code on this site Test for the equality of two proportions in SAS - The DO Loop for the section called A chi-square test for association in SAS. I basically need to compare the proportion in one area which was tested for something to the proportion in another area which was tested and see if they are significantly different proportions, but I can't get the code to work right. I get this error:  

 

NOTE: Invalid data for N in line 79 1-6.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
79 CountyB Yes 71
Group=CountyA Seq=No N=. _ERROR_=1 _N_=2
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
 
My full code is: 
 
data underfive;
length Group $15 Test $3;
input Group Test N;
datalines;
CountyA Yes 55
CountyA No 45027
CountyB Yes 71
CountyB No 311726;
 
Once I had that in I figured I would run this:
 
proc freq data=underfive order=data;
weight N;
tables Group*Test/chisq;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Hmm, my guess (and it's only a guess) is that there might be an unprintable character in your datalines.  This can happen if you copied/pasted the data from a MS Word file. Word and other "rich text" formats might have a CR/LF line ending, which could make SAS think there is a blank line between the 2nd and 3rd lines of data.

 

My suggestion: type the program verbatim into the SAS editor. Don't copy and paste it. Do you get the same error, or does the DATA step now run correctly?

View solution in original post

13 REPLIES 13
ballardw
Super User

Can't replicate with the posted example.

 

The main message windows on this forum will reformat text so it is possible that the code you posted has been modified in such a way that the error won't appear.

The code you show cannot generate the error shown.

Your shown INPUT statement does not create a variable named SEQ as shown with this:

Group=CountyA Seq=No N=. _ERROR_=1 _N_=2

Your code has variables Group, Test and N.

 

I suggest that you copy your code and paste into a text box opened on the forum using the </> icon above the message window and we can see if that will behave the same.

Geoghegan
Obsidian | Level 7

</>

data underfive;
length Group $9 Test $3;
input Group Test N;
datalines;
Worcester Yes 55
Worcester No 45027
NonWor Yes 71
NonWor No 311726
;

</>

Sorry, I had copied part of an old version in, this is my current code. It gives this error:

NOTE: Invalid data for N in line 79 1-6.

But does create a dataset but the numbers aren't all included

ballardw
Super User

As posted that does not generate any error or invalid data :

3468  data underfive;
3469  length Group $9 Test $3;
3470  input Group Test N;
3471  datalines;

NOTE: The data set USER.UNDERFIVE has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


3476  ;

So you have to be running something different to generate such an invalid data message.

 


@Geoghegan wrote:

</>

data underfive;
length Group $9 Test $3;
input Group Test N;
datalines;
Worcester Yes 55
Worcester No 45027
NonWor Yes 71
NonWor No 311726
;

</>

Sorry, I had copied part of an old version in, this is my current code. It gives this error:

NOTE: Invalid data for N in line 79 1-6.

But does create a dataset but the numbers aren't all included


 

Rick_SAS
SAS Super FREQ

Be sure to put the semicolon after the DATALINES statement on a line by itself:

 

data underfive;
length Group $15 Test $3;
input Group Test N;
datalines;
CountyA Yes 55
CountyA No 45027
CountyB Yes 71
CountyB No 311726
;
 
proc freq data=underfive order=data;
   weight N;
   tables Group*Test/chisq;
run;
Geoghegan
Obsidian | Level 7

Thank you! That helped make it create a dataset, though now the variables don't have the values they should (only had 3 obs and one is blank for N)

Rick_SAS
SAS Super FREQ

I don't know how you are running the code, but I assure you that the DATA step generates four observations:

 

data underfive;
length Group $15 Test $3;
input Group Test N;
datalines;
CountyA Yes 55
CountyA No 45027
CountyB Yes 71
CountyB No 311726
;
 
proc print data=underfive;
run;

Rick_SAS_0-1715712633628.png

 

Geoghegan
Obsidian | Level 7

Do the amount of spaces on the lines where it has CountyA Yes etc.. matter? I'm trying to figure out why it's telling me it only has three obs:

 

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.UNDERFIVE has 3 observations and 3 variables.
Rick_SAS
SAS Super FREQ

Post your code by doing the following:

1. Click the "Insert SAS Code" icon (looks like a running man). A dialog box will pop up. 

2. Paste the EXACT code that generates the error into the dialog box.

3. Click OK to display the code in the thread.

4. Click Post so we can see the code.

Geoghegan
Obsidian | Level 7
data underfive;
length Group $15 Test $3;
input Group Test N;
datalines;
Worcester      Yes  55
Worcester      No	45027
NonWor         Yes  71
NonWor         No   311726
;
Rick_SAS
SAS Super FREQ

OK, here is my log for the code you posted. Show us yours.

7033  data underfive;
7034  length Group $15 Test $3;
7035  input Group Test N;
7036  datalines;

NOTE: The data set WORK.UNDERFIVE has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds


7041  ;

 

Geoghegan
Obsidian | Level 7
Submission: May 14, 2024 4:29:12 PM
 
 
 
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data underfive;
74 length Group $15 Test $3;
75 input Group Test N;
76 datalines;
 
NOTE: Invalid data for N in line 79 1-6.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
79 NonWor Yes 71
Group=Worcester Test=No N= _ERROR_=1 _N_=2
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.UNDERFIVE has 3 observations and 3 variables.
NOTE: Compressing data set WORK.UNDERFIVE increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 567.62k
OS Memory 31904.00k
Timestamp 05/14/2024 08:29:13 PM
Step Count 201 Switch Count 2
Page Faults 0
Page Reclaims 208
Page Swaps 0
Voluntary Context Switches 24
Involuntary Context Switches 0
Block Input Operations 832
Block Output Operations 264
 
 
81 ;
82
83 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
95

 
Rick_SAS
SAS Super FREQ

Hmm, my guess (and it's only a guess) is that there might be an unprintable character in your datalines.  This can happen if you copied/pasted the data from a MS Word file. Word and other "rich text" formats might have a CR/LF line ending, which could make SAS think there is a blank line between the 2nd and 3rd lines of data.

 

My suggestion: type the program verbatim into the SAS editor. Don't copy and paste it. Do you get the same error, or does the DATA step now run correctly?

Geoghegan
Obsidian | Level 7

Ahh thank you! I opened a new program and typed it out and it worked just fine, thanks so much for your help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1593 views
  • 2 likes
  • 3 in conversation