hello all,
there are some errors in a .sas7bdat file. which shows like below:
NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946 390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355 390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26, ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248 184,,A ,930063756,9999-12-31, ,0,0 35
pat_publn_id=184 publn_auth= publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935 148740,,A ,930101529,9999-12-31, ,0,0 38
pat_publn_id=148740 publn_auth= publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964 165714,,A ,930111753,9999-12-31, ,0,0 38
pat_publn_id=165714 publn_auth= publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026 790,,A1,930090062,9999-12-31, ,0,0 35
pat_publn_id=790 publn_auth= publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024
As there is not much error, I'd like to manually modify them. But I cannot revise them in the Viewtable.
therefore, could you please give me some suggestions about how to locate a specific line and revise observations in .sas7bdat file?
That isn't a sas7bdat dataset file. You are reading data in from an external file , a comma separated variable (CSV) file it seems to me. And in this import program you are specifying an informat, length, or format which is incorrect for the given data:
NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946 390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
This tells you everything you need to know. At line 68262946, position 33, contains invalid data for the given variable appln_id.
Much the same as for the other note. It even prints what it has read in for that line.
Either fix the import file, or your import program (neither of which we can see).
thanks for your reply. but what should I do to fix the import file if it is too large to open ?
Its a text file, you should be able to open it in Notepad or something like that.
Alternatively fix the import program, which I can't see so I can't be specific, but maybe if appln is numeric, set it to be character for example. If your just using proc import or something, then take the code out of the log window which is generated after you run, and paste that into your code window, you can then edit it.
I assume you got a data specification with the datafile? Otherwise how are you importing it correctly?
As I say, without some specifics its hard to say,
it is too large for Notepad, Could you recommend other software to open it please?
besides, I use following code to read the file in,
data S.Publications ;
infile 'R:/Li/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767 firstobs = 3 ;
input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind:$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims:29. ;
format publn_date :YYMMDDd10. ;
run ;
could you please explain more clearly about how to locate and edit a specific observation? as I cannot revise the variable in the Viewtable.
Haven't we been through this already a few weeks ago? This is the Patent Stat data correct?
The answers haven't changed.
And the log actually tells you exactly which record is problematic, see the _N_ in the log.
@France wrote:
thanks for your reply. but what should I do to fix the import file if it is too large to open ?
yes I think there are some mistakes in file that I obtained, but not the original database. But currently it is the only file I can use. As you see, there is not too much errors in the file, so I want to revise it. Could you give me some suggestions about it please?
1. Use PROC PRINTTO to direct your log to a file
2. Read the file (it's a text file) to extract the _N_
3. That's the records you need to fix.
Or review it manually.
Reviewing your errors, note the _N_ as the last item. That tells you what line is wrong in the file. I can't remember if the header is counted, so that's something you'll want to verify.
NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946 390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
@France wrote:
yes I think there are some mistakes in file that I obtained, but not the original database. But currently it is the only file I can use. As you see, there is not too much errors in the file, so I want to revise it. Could you give me some suggestions about it please?
thanks for your reply, Could I fix the error if I have found the error line manually in the Viewtable? The variable can be selected but cannot be changed. Could I fix it in the Viewtable ?
Thanks for your suggestion and I'd like to program it, but could you please explain the method more clearly? I have understood the meaning of IF-THEN/ELSE, but I don't know which codes should be used in <insert code to reassign variables> and <insert code>? and where I should put these codes?
Besides, is it one step of your previous suggestion or a different method?
1. Use PROC PRINTTO to direct your log to a file
2. Read the file (it's a text file) to extract the _N_
3. That's the records you need to fix.
the following is the code and results which I have provided, so you don't need to check previous questions.
1 data S.Publications ;
2 infile 'R:/L/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767
2 ! firstobs = 3 ;
3 input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind
3 ! :$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims
3 ! :29. ;
4 format publn_date :YYMMDDd10. ;
5 run ;
NOTE: Data file S.PUBLICATIONS.DATA is in a format that is native to another host, or the
file encoding does not match the session encoding. Cross Environment Data Access will be
used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/Publications.csv' is:
Filename=R:\L\P\Publications.csv,
RECFM=V,LRECL=32767,
File Size (bytes)=5635168276,
Last Modified=11 September 2017 20:15:18 o'clock,
Create Time=15 March 2018 15:41:54 o'clock
NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946 390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355 390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26, ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248 184,,A ,930063756,9999-12-31, ,0,0 35
pat_publn_id=184 publn_auth= publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935 148740,,A ,930101529,9999-12-31, ,0,0 38
pat_publn_id=148740 publn_auth= publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964 165714,,A ,930111753,9999-12-31, ,0,0 38
pat_publn_id=165714 publn_auth= publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026 790,,A1,930090062,9999-12-31, ,0,0 35
pat_publn_id=790 publn_auth= publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024
really thanks for your help.
This tells you the records you need to fix. You also want to check if your log has a message towards the end about supressing errors, because the maximum has been reached. If that's the case you need to modify your code.
1. Use PROC PRINTTO to direct your log to a file
2. Read the file (it's a text file) to extract the _N_
3. That's the records you need to fix.
I have no idea how you want to fix your data set, I'm assuming you know the records you want to change. Say I found from the steps above that the records 3, 5,6 needed to have age=13 instead of whatever value they have, in the SASHELP.CLASS data set.
data want;
set sashelp.class;
if _n_ = 3 then age=13;
if _n_ = 5 then age=13;
if _n_ = 6 then age=13;
*if _n_ in (3, 5, 6) then age=13; *another way of writing that;
run;
@France wrote:
Thanks for your suggestion and I'd like to program it, but could you please explain the method more clearly? I have understood the meaning of IF-THEN/ELSE, but I don't know which codes should be used in <insert code to reassign variables> and <insert code>? and where I should put these codes?
Besides, is it one step of your previous suggestion or a different method?
1. Use PROC PRINTTO to direct your log to a file
2. Read the file (it's a text file) to extract the _N_
3. That's the records you need to fix.
the following is the code and results which I have provided, so you don't need to check previous questions.1 data S.Publications ; 2 infile 'R:/L/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767 2 ! firstobs = 3 ; 3 input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind 3 ! :$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims 3 ! :29. ; 4 format publn_date :YYMMDDd10. ; 5 run ; NOTE: Data file S.PUBLICATIONS.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. NOTE: The infile 'R:/L/P/Publications.csv' is: Filename=R:\L\P\Publications.csv, RECFM=V,LRECL=32767, File Size (bytes)=5635168276, Last Modified=11 September 2017 20:15:18 o'clock, Create Time=15 March 2018 15:41:54 o'clock NOTE: Invalid data for appln_id in line 68262946 33-34. NOTE: Invalid data for publn_date in line 68262946 36-44. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- 68262946 390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0 62 pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054 appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1 _N_=68262944 NOTE: Invalid data for appln_id in line 68280355 33-34. NOTE: Invalid data for publn_date in line 68280355 36-44. 68280355 390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26, ,0,0 62 pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466 appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1 _N_=68280353 NOTE: Invalid data for publn_date in line 98594248 33-33. 98594248 184,,A ,930063756,9999-12-31, ,0,0 35 pat_publn_id=184 publn_auth= publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31 appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246 NOTE: Invalid data for publn_date in line 98614935 36-36. 98614935 148740,,A ,930101529,9999-12-31, ,0,0 38 pat_publn_id=148740 publn_auth= publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31 appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933 NOTE: Invalid data for publn_date in line 98619964 36-36. 98619964 165714,,A ,930111753,9999-12-31, ,0,0 38 pat_publn_id=165714 publn_auth= publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31 appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962 NOTE: Invalid data for publn_date in line 98673026 33-33. 98673026 790,,A1,930090062,9999-12-31, ,0,0 35 pat_publn_id=790 publn_auth= publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31 appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024
really thanks for your help.
Whilst its possible as @Reeza mentions to code this in, or possibly do it manually, hardcoding in this way is quite a bad idea. A re-run may not do this, or need something different for example. Go back and fix the sourse, or alter the data specification so the problem fields can contain all the data, so for the first note you have variable app_ln which expects a numeric (and I don't think 29 length is valid for a numeric, so you may have issues there.
appln_id :29.
Now if you change that to be a character variable - which can handle most data, then the data will read in correctly. You could then have code which creates a numeric variant of that variable, and thus be able to check quite easily.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.