BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
France
Quartz | Level 8

hello all,

 

I am reading a simple .csv file in SAS separately by following two codes, but both fail. I don't why I can't do it. could

the first log is:

 

NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M3 MBCS3170)
      Licensed to UNIVERSITY OF NEWCASTLE UPON TYNE, Site 70117723.
NOTE: This session is executing on the X64_7PRO  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.1
      SAS/ETS 14.1
      SAS/OR 14.1
      SAS/IML 14.1
      SAS/QC 14.1

NOTE: Additional host information:

 X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

NOTE: SAS initialization used:
      real time           8.97 seconds
      cpu time            0.82 seconds


NOTE: AUTOEXEC processing beginning; file is C:\Users\B\autoexec.sas.

NOTE: Libref SASDATA was successfully assigned as follows:
      Engine:        V9
      Physical Name: R:\L\SASDATA

NOTE: AUTOEXEC processing completed.

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


the second is:

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


could you please give me some suggestion about how to read pat_publn_id in SAS? thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The reason both of these fail is the same. You are actually attempting to read the data as fixed column input not delimited. And the data shown indicates that the data is semicolon delimited, not CSV. So your delimiter is wrong.

And why the -bleep- do you have all of the informats attempting to read 29 columns???

 

See if this works any better.

data P.Publications2 ;
   infile 'R/L/P/tls211_pat_publn.csv' DLM = ';' DSD missover lrecl = 32767
   firstobs = 2 ;
   informat 
      pat_publn_id 29.
      publn_auth $29.
      publn_nr $29.
      publn_nr_original $29.
      publn_kind $29.
      appln_id 29.
      publn_date YYMMDD10.
      publn_lg $29.
      publn_first_grant 29.
      publn_claims 29. 
   ;
   format publn_date YYMMDDd10. ;
   input 
      pat_publn_id 
      publn_auth $
      publn_nr $
      publn_nr_original $
      publn_kind $
      appln_id 
      publn_date 
      publn_lg $
      publn_first_grant 
      publn_claims  
  ;
run ;

 

View solution in original post

4 REPLIES 4
data_null__
Jade | Level 19

Looks like your TXT file is delimited with simi-colon not comma.

 


@France wrote:

hello all,

 

I am reading a simple .csv file in SAS separately by following two codes, but both fail. I don't why I can't do it. could

the first log is:

 

NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M3 MBCS3170)
      Licensed to UNIVERSITY OF NEWCASTLE UPON TYNE, Site 70117723.
NOTE: This session is executing on the X64_7PRO  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.1
      SAS/ETS 14.1
      SAS/OR 14.1
      SAS/IML 14.1
      SAS/QC 14.1

NOTE: Additional host information:

 X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

NOTE: SAS initialization used:
      real time           8.97 seconds
      cpu time            0.82 seconds


NOTE: AUTOEXEC processing beginning; file is C:\Users\B\autoexec.sas.

NOTE: Libref SASDATA was successfully assigned as follows:
      Engine:        V9
      Physical Name: R:\L\SASDATA

NOTE: AUTOEXEC processing completed.

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


the second is:

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


could you please give me some suggestion about how to read pat_publn_id in SAS? thanks

 


 

ballardw
Super User

The reason both of these fail is the same. You are actually attempting to read the data as fixed column input not delimited. And the data shown indicates that the data is semicolon delimited, not CSV. So your delimiter is wrong.

And why the -bleep- do you have all of the informats attempting to read 29 columns???

 

See if this works any better.

data P.Publications2 ;
   infile 'R/L/P/tls211_pat_publn.csv' DLM = ';' DSD missover lrecl = 32767
   firstobs = 2 ;
   informat 
      pat_publn_id 29.
      publn_auth $29.
      publn_nr $29.
      publn_nr_original $29.
      publn_kind $29.
      appln_id 29.
      publn_date YYMMDD10.
      publn_lg $29.
      publn_first_grant 29.
      publn_claims 29. 
   ;
   format publn_date YYMMDDd10. ;
   input 
      pat_publn_id 
      publn_auth $
      publn_nr $
      publn_nr_original $
      publn_kind $
      appln_id 
      publn_date 
      publn_lg $
      publn_first_grant 
      publn_claims  
  ;
run ;

 

France
Quartz | Level 8
many thanks for your help.
data_null__
Jade | Level 19

The only change you really needed to make was to DLM

 

DLM = ';'

Your original input statement with colon INFORMAT modifiers is a correct alternative to using an INFORMAT statement.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1173 views
  • 3 likes
  • 3 in conversation