BookmarkSubscribeRSS Feed
France
Quartz | Level 8

hello all,

 

there are some errors in a .sas7bdat file. which shows like below:

 

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355  390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26,  ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248  184,,A ,930063756,9999-12-31,  ,0,0 35
pat_publn_id=184 publn_auth=  publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935  148740,,A ,930101529,9999-12-31,  ,0,0 38
pat_publn_id=148740 publn_auth=  publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964  165714,,A ,930111753,9999-12-31,  ,0,0 38
pat_publn_id=165714 publn_auth=  publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026  790,,A1,930090062,9999-12-31,  ,0,0 35
pat_publn_id=790 publn_auth=  publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024

 

As there is not much error, I'd like to manually modify them. But I cannot revise them in the Viewtable.

therefore, could you please give me some suggestions about how to locate a specific line and revise observations in .sas7bdat file?

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

That isn't a sas7bdat dataset file.  You are reading data in from an external file , a comma separated variable (CSV) file it seems to me.  And in this import program you are specifying an informat, length, or format which is incorrect for the given data:

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944

This tells you everything you need to know.  At line 68262946, position 33, contains invalid data for the given variable appln_id.

Much the same as for the other note.  It even prints what it has read in for that line. 

Either fix the import file, or your import program (neither of which we can see).   

France
Quartz | Level 8

thanks for your reply. but what should I do to fix the import file if it is too large to open ?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its a text file, you should be able to open it in Notepad or something like that.

Alternatively fix the import program, which I can't see so I can't be specific, but maybe if appln is numeric, set it to be character for example.  If your just using proc import or something, then take the code out of the log window which is generated after you run, and paste that into your code window, you can then edit it.

I assume you got a data specification with the datafile?  Otherwise how are you importing it correctly?

As I say, without some specifics its hard to say,

France
Quartz | Level 8

it is too large for Notepad, Could you recommend other software to open it please?

besides, I use following code to read the file in,

data S.Publications ;
infile 'R:/Li/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767 firstobs = 3 ;
input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind:$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims:29. ;
format publn_date :YYMMDDd10. ;
run ;

could you please explain more clearly about how to locate and edit a specific observation? as I cannot revise the variable in the Viewtable.

 

Reeza
Super User

Haven't we been through this already a few weeks ago? This is the Patent Stat data correct?
The answers haven't changed.

 

And the log actually tells you exactly which record is problematic, see the _N_ in the log. 

 


@France wrote:

thanks for your reply. but what should I do to fix the import file if it is too large to open ?


 

France
Quartz | Level 8

yes I think there are some mistakes in file that I obtained, but not the original database. But currently it is the only file I can use. As you see, there is not too much errors in the file, so I want to revise it. Could you give me some suggestions about it please?

Reeza
Super User

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 

 

Or review it manually.


Reviewing your errors, note the _N_ as the last item. That tells you what line is wrong in the file. I can't remember if the header is counted, so that's something you'll want to verify. 

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944

 


@France wrote:

yes I think there are some mistakes in file that I obtained, but not the original database. But currently it is the only file I can use. As you see, there is not too much errors in the file, so I want to revise it. Could you give me some suggestions about it please?


 

France
Quartz | Level 8

thanks for your reply,  Could I fix the error if I have found the error line manually in the Viewtable? The variable can be selected but cannot be changed. Could I fix it in the Viewtable ?

Reeza
Super User
Even if you could, I wouldn't advise that, to the point where I'll say its possible.
Instead, program it so you have a history and if you ever need to re-run your code (ie you find a different data error) you'll have to manually repeat the steps and have no idea.

Your code would end up like:

if _n_ = 3245345 then do;
<insert code to reassign variables>
end;
else if _n_ = XXXXX then do;
<insert code>
end;
France
Quartz | Level 8

Thanks for your suggestion and I'd like to program it, but could you please explain the method more clearly? I have understood the meaning of IF-THEN/ELSE, but I don't know which codes should be used in <insert code to reassign variables> and <insert code>? and where I should put these codes?

 

Besides, is it one step of your previous suggestion or a different method?

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 


the following is the code and results which I have provided, so you don't need to check previous questions.  

1    data S.Publications ;
2    infile 'R:/L/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 3 ;
3    input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind
3  ! :$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims
3  ! :29. ;
4    format publn_date :YYMMDDd10. ;
5    run ;

NOTE: Data file S.PUBLICATIONS.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/Publications.csv' is:
      Filename=R:\L\P\Publications.csv,
      RECFM=V,LRECL=32767,
      File Size (bytes)=5635168276,
      Last Modified=11 September 2017 20:15:18 o'clock,
      Create Time=15 March 2018 15:41:54 o'clock

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355  390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26,  ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248  184,,A ,930063756,9999-12-31,  ,0,0 35
pat_publn_id=184 publn_auth=  publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935  148740,,A ,930101529,9999-12-31,  ,0,0 38
pat_publn_id=148740 publn_auth=  publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964  165714,,A ,930111753,9999-12-31,  ,0,0 38
pat_publn_id=165714 publn_auth=  publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026  790,,A1,930090062,9999-12-31,  ,0,0 35
pat_publn_id=790 publn_auth=  publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024

really thanks for your help.

 

Reeza
Super User

This tells you the records you need to fix. You also want to check if your log has a message towards the end about supressing errors, because the maximum has been reached. If that's the case you need to modify your code.

 

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 

 

I have no idea how you want to fix your data set, I'm assuming you know the records you want to change. Say I found from the steps above that the records 3, 5,6 needed to have age=13 instead of whatever value they have, in the SASHELP.CLASS data set.

 

data want;
set sashelp.class;


if _n_  = 3 then age=13;
if _n_ = 5 then age=13;
if _n_ = 6 then age=13;

*if _n_ in (3, 5, 6) then age=13; *another way of writing that;

run;

@France wrote:

Thanks for your suggestion and I'd like to program it, but could you please explain the method more clearly? I have understood the meaning of IF-THEN/ELSE, but I don't know which codes should be used in <insert code to reassign variables> and <insert code>? and where I should put these codes?

 

Besides, is it one step of your previous suggestion or a different method?

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 


the following is the code and results which I have provided, so you don't need to check previous questions.  

1    data S.Publications ;
2    infile 'R:/L/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 3 ;
3    input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind
3  ! :$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims
3  ! :29. ;
4    format publn_date :YYMMDDd10. ;
5    run ;

NOTE: Data file S.PUBLICATIONS.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/Publications.csv' is:
      Filename=R:\L\P\Publications.csv,
      RECFM=V,LRECL=32767,
      File Size (bytes)=5635168276,
      Last Modified=11 September 2017 20:15:18 o'clock,
      Create Time=15 March 2018 15:41:54 o'clock

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355  390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26,  ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248  184,,A ,930063756,9999-12-31,  ,0,0 35
pat_publn_id=184 publn_auth=  publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935  148740,,A ,930101529,9999-12-31,  ,0,0 38
pat_publn_id=148740 publn_auth=  publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964  165714,,A ,930111753,9999-12-31,  ,0,0 38
pat_publn_id=165714 publn_auth=  publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026  790,,A1,930090062,9999-12-31,  ,0,0 35
pat_publn_id=790 publn_auth=  publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024

really thanks for your help.

 


 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst its possible as @Reeza mentions to code this in, or possibly do it manually, hardcoding in this way is quite a bad idea.  A re-run may not do this, or need something different for example.  Go back and fix the sourse, or alter the data specification so the problem fields can contain all the data, so for the first note you have variable app_ln which expects a numeric (and I don't think 29 length is valid for a numeric, so you may have issues there.

appln_id :29.

 Now if you change that to be a character variable - which can handle most data, then the data will read in correctly.  You could then have code which creates a numeric variant of that variable, and thus be able to check quite easily.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1160 views
  • 0 likes
  • 3 in conversation