DATA Step, Macro, Functions and more

how to locate a specific row and revise observations in '.sas7bdat' file?

Reply
Contributor
Posts: 45

how to locate a specific row and revise observations in '.sas7bdat' file?

hello all,

 

there are some errors in a .sas7bdat file. which shows like below:

 

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355  390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26,  ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248  184,,A ,930063756,9999-12-31,  ,0,0 35
pat_publn_id=184 publn_auth=  publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935  148740,,A ,930101529,9999-12-31,  ,0,0 38
pat_publn_id=148740 publn_auth=  publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964  165714,,A ,930111753,9999-12-31,  ,0,0 38
pat_publn_id=165714 publn_auth=  publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026  790,,A1,930090062,9999-12-31,  ,0,0 35
pat_publn_id=790 publn_auth=  publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024

 

As there is not much error, I'd like to manually modify them. But I cannot revise them in the Viewtable.

therefore, could you please give me some suggestions about how to locate a specific line and revise observations in .sas7bdat file?

Super User
Super User
Posts: 9,441

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

That isn't a sas7bdat dataset file.  You are reading data in from an external file , a comma separated variable (CSV) file it seems to me.  And in this import program you are specifying an informat, length, or format which is incorrect for the given data:

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944

This tells you everything you need to know.  At line 68262946, position 33, contains invalid data for the given variable appln_id.

Much the same as for the other note.  It even prints what it has read in for that line. 

Either fix the import file, or your import program (neither of which we can see).   

Contributor
Posts: 45

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

thanks for your reply. but what should I do to fix the import file if it is too large to open ?

Super User
Super User
Posts: 9,441

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

Its a text file, you should be able to open it in Notepad or something like that.

Alternatively fix the import program, which I can't see so I can't be specific, but maybe if appln is numeric, set it to be character for example.  If your just using proc import or something, then take the code out of the log window which is generated after you run, and paste that into your code window, you can then edit it.

I assume you got a data specification with the datafile?  Otherwise how are you importing it correctly?

As I say, without some specifics its hard to say,

Contributor
Posts: 45

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

it is too large for Notepad, Could you recommend other software to open it please?

besides, I use following code to read the file in,

data S.Publications ;
infile 'R:/Li/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767 firstobs = 3 ;
input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind:$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims:29. ;
format publn_date :YYMMDDd10. ;
run ;

could you please explain more clearly about how to locate and edit a specific observation? as I cannot revise the variable in the Viewtable.

 

Super User
Posts: 23,357

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

Haven't we been through this already a few weeks ago? This is the Patent Stat data correct?
The answers haven't changed.

 

And the log actually tells you exactly which record is problematic, see the _N_ in the log. 

 


@France wrote:

thanks for your reply. but what should I do to fix the import file if it is too large to open ?


 

Contributor
Posts: 45

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

yes I think there are some mistakes in file that I obtained, but not the original database. But currently it is the only file I can use. As you see, there is not too much errors in the file, so I want to revise it. Could you give me some suggestions about it please?

Super User
Posts: 23,357

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 

 

Or review it manually.


Reviewing your errors, note the _N_ as the last item. That tells you what line is wrong in the file. I can't remember if the header is counted, so that's something you'll want to verify. 

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944

 


@France wrote:

yes I think there are some mistakes in file that I obtained, but not the original database. But currently it is the only file I can use. As you see, there is not too much errors in the file, so I want to revise it. Could you give me some suggestions about it please?


 

Contributor
Posts: 45

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

thanks for your reply,  Could I fix the error if I have found the error line manually in the Viewtable? The variable can be selected but cannot be changed. Could I fix it in the Viewtable ?

Super User
Posts: 23,357

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

Even if you could, I wouldn't advise that, to the point where I'll say its possible.
Instead, program it so you have a history and if you ever need to re-run your code (ie you find a different data error) you'll have to manually repeat the steps and have no idea.

Your code would end up like:

if _n_ = 3245345 then do;
<insert code to reassign variables>
end;
else if _n_ = XXXXX then do;
<insert code>
end;
Contributor
Posts: 45

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

Thanks for your suggestion and I'd like to program it, but could you please explain the method more clearly? I have understood the meaning of IF-THEN/ELSE, but I don't know which codes should be used in <insert code to reassign variables> and <insert code>? and where I should put these codes?

 

Besides, is it one step of your previous suggestion or a different method?

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 


the following is the code and results which I have provided, so you don't need to check previous questions.  

1    data S.Publications ;
2    infile 'R:/L/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 3 ;
3    input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind
3  ! :$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims
3  ! :29. ;
4    format publn_date :YYMMDDd10. ;
5    run ;

NOTE: Data file S.PUBLICATIONS.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/Publications.csv' is:
      Filename=R:\L\P\Publications.csv,
      RECFM=V,LRECL=32767,
      File Size (bytes)=5635168276,
      Last Modified=11 September 2017 20:15:18 o'clock,
      Create Time=15 March 2018 15:41:54 o'clock

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355  390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26,  ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248  184,,A ,930063756,9999-12-31,  ,0,0 35
pat_publn_id=184 publn_auth=  publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935  148740,,A ,930101529,9999-12-31,  ,0,0 38
pat_publn_id=148740 publn_auth=  publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964  165714,,A ,930111753,9999-12-31,  ,0,0 38
pat_publn_id=165714 publn_auth=  publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026  790,,A1,930090062,9999-12-31,  ,0,0 35
pat_publn_id=790 publn_auth=  publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024

really thanks for your help.

 

Super User
Posts: 23,357

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

This tells you the records you need to fix. You also want to check if your log has a message towards the end about supressing errors, because the maximum has been reached. If that's the case you need to modify your code.

 

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 

 

I have no idea how you want to fix your data set, I'm assuming you know the records you want to change. Say I found from the steps above that the records 3, 5,6 needed to have age=13 instead of whatever value they have, in the SASHELP.CLASS data set.

 

data want;
set sashelp.class;


if _n_  = 3 then age=13;
if _n_ = 5 then age=13;
if _n_ = 6 then age=13;

*if _n_ in (3, 5, 6) then age=13; *another way of writing that;

run;

@France wrote:

Thanks for your suggestion and I'd like to program it, but could you please explain the method more clearly? I have understood the meaning of IF-THEN/ELSE, but I don't know which codes should be used in <insert code to reassign variables> and <insert code>? and where I should put these codes?

 

Besides, is it one step of your previous suggestion or a different method?

1. Use PROC PRINTTO to direct your log to a file

2. Read the file (it's a text file) to extract the _N_

3. That's the records you need to fix. 


the following is the code and results which I have provided, so you don't need to check previous questions.  

1    data S.Publications ;
2    infile 'R:/L/P/Publications.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 3 ;
3    input pat_publn_id :29. publn_auth :$29. publn_nr :$29. publn_nr_original :$29. publn_kind
3  ! :$29. appln_id :29. publn_date :YYMMDD10. publn_lg :$29. publn_first_grant:29. publn_claims
3  ! :29. ;
4    format publn_date :YYMMDDd10. ;
5    run ;

NOTE: Data file S.PUBLICATIONS.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/Publications.csv' is:
      Filename=R:\L\P\Publications.csv,
      RECFM=V,LRECL=32767,
      File Size (bytes)=5635168276,
      Last Modified=11 September 2017 20:15:18 o'clock,
      Create Time=15 March 2018 15:41:54 o'clock

NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
68262946  390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26,  ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355  390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26,  ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
NOTE: Invalid data for publn_date in line 98594248 33-33.
98594248  184,,A ,930063756,9999-12-31,  ,0,0 35
pat_publn_id=184 publn_auth=  publn_nr=A publn_nr_original=930063756 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98594246
NOTE: Invalid data for publn_date in line 98614935 36-36.
98614935  148740,,A ,930101529,9999-12-31,  ,0,0 38
pat_publn_id=148740 publn_auth=  publn_nr=A publn_nr_original=930101529 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98614933
NOTE: Invalid data for publn_date in line 98619964 36-36.
98619964  165714,,A ,930111753,9999-12-31,  ,0,0 38
pat_publn_id=165714 publn_auth=  publn_nr=A publn_nr_original=930111753 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98619962
NOTE: Invalid data for publn_date in line 98673026 33-33.
98673026  790,,A1,930090062,9999-12-31,  ,0,0 35
pat_publn_id=790 publn_auth=  publn_nr=A1 publn_nr_original=930090062 publn_kind=9999-12-31
appln_id=. publn_date=. publn_lg=0 publn_first_grant=. publn_claims=. _ERROR_=1 _N_=98673024

really thanks for your help.

 


 

Super User
Super User
Posts: 9,441

Re: how to locate a specific row and revise observations in '.sas7bdat' file?

Whilst its possible as @Reeza mentions to code this in, or possibly do it manually, hardcoding in this way is quite a bad idea.  A re-run may not do this, or need something different for example.  Go back and fix the sourse, or alter the data specification so the problem fields can contain all the data, so for the first note you have variable app_ln which expects a numeric (and I don't think 29 length is valid for a numeric, so you may have issues there.

appln_id :29.

 Now if you change that to be a character variable - which can handle most data, then the data will read in correctly.  You could then have code which creates a numeric variant of that variable, and thus be able to check quite easily.

Ask a Question
Discussion stats
  • 12 replies
  • 122 views
  • 0 likes
  • 3 in conversation