DATA Step, Macro, Functions and more

read a .csv file in SAS

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

read a .csv file in SAS

hello all,

 

I am reading a simple .csv file in SAS separately by following two codes, but both fail. I don't why I can't do it. could

the first log is:

 

NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M3 MBCS3170)
      Licensed to UNIVERSITY OF NEWCASTLE UPON TYNE, Site 70117723.
NOTE: This session is executing on the X64_7PRO  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.1
      SAS/ETS 14.1
      SAS/OR 14.1
      SAS/IML 14.1
      SAS/QC 14.1

NOTE: Additional host information:

 X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

NOTE: SAS initialization used:
      real time           8.97 seconds
      cpu time            0.82 seconds


NOTE: AUTOEXEC processing beginning; file is C:\Users\B\autoexec.sas.

NOTE: Libref SASDATA was successfully assigned as follows:
      Engine:        V9
      Physical Name: R:\L\SASDATA

NOTE: AUTOEXEC processing completed.

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


the second is:

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


could you please give me some suggestion about how to read pat_publn_id in SAS? thanks

 


Accepted Solutions
Solution
‎05-11-2018 12:55 PM
Super User
Posts: 13,889

Re: read a .csv file in SAS

[ Edited ]

The reason both of these fail is the same. You are actually attempting to read the data as fixed column input not delimited. And the data shown indicates that the data is semicolon delimited, not CSV. So your delimiter is wrong.

And why the -bleep- do you have all of the informats attempting to read 29 columns???

 

See if this works any better.

data P.Publications2 ;
   infile 'R/L/P/tls211_pat_publn.csv' DLM = ';' DSD missover lrecl = 32767
   firstobs = 2 ;
   informat 
      pat_publn_id 29.
      publn_auth $29.
      publn_nr $29.
      publn_nr_original $29.
      publn_kind $29.
      appln_id 29.
      publn_date YYMMDD10.
      publn_lg $29.
      publn_first_grant 29.
      publn_claims 29. 
   ;
   format publn_date YYMMDDd10. ;
   input 
      pat_publn_id 
      publn_auth $
      publn_nr $
      publn_nr_original $
      publn_kind $
      appln_id 
      publn_date 
      publn_lg $
      publn_first_grant 
      publn_claims  
  ;
run ;

 

View solution in original post


All Replies
Respected Advisor
Posts: 3,866

Re: read a .csv file in SAS

Looks like your TXT file is delimited with simi-colon not comma.

 


@France wrote:

hello all,

 

I am reading a simple .csv file in SAS separately by following two codes, but both fail. I don't why I can't do it. could

the first log is:

 

NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M3 MBCS3170)
      Licensed to UNIVERSITY OF NEWCASTLE UPON TYNE, Site 70117723.
NOTE: This session is executing on the X64_7PRO  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.1
      SAS/ETS 14.1
      SAS/OR 14.1
      SAS/IML 14.1
      SAS/QC 14.1

NOTE: Additional host information:

 X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

NOTE: SAS initialization used:
      real time           8.97 seconds
      cpu time            0.82 seconds


NOTE: AUTOEXEC processing beginning; file is C:\Users\B\autoexec.sas.

NOTE: Libref SASDATA was successfully assigned as follows:
      Engine:        V9
      Physical Name: R:\L\SASDATA

NOTE: AUTOEXEC processing completed.

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


the second is:

1    data P.Publications2 ;
2    infile 'R:/L/P/tls211_pat_publn.csv' DLM = ',' DSD missover lrecl = 32767
2  ! firstobs = 2 ;
3    input pat_publn_id :29.
4    publn_auth :$29.
5    publn_nr :$29.
6    publn_nr_original :$29.
7    publn_kind :$29.
8    appln_id :29.
9    publn_date :YYMMDD10.
10   publn_lg :$29.
11   publn_first_grant:29.
12   publn_claims :29. ;
13   format publn_date :YYMMDDd10. ;
14   run ;

NOTE: Data file P.PUBLICATIONS2.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: The infile 'R:/L/P/tls211_pat_publn.csv' is:
      Filename=R:\L\P\tls211_pat_publn.csv,
      RECFM=V,LRECL=131068,File Size (bytes)=532,
      Last Modified=11 May 2018 16:47:59 o'clock,
      Create Time=11 May 2018 16:47:58 o'clock

NOTE: Invalid data for pat_publn_id in line 2 1-32.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
2         390735978;"HK";"1053433";"09/465,054";"A1";275562685;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=054";"A1";275562685;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=1
NOTE: Invalid data for pat_publn_id in line 3 1-32.
3         390753387;"HK";"1092990";"60/523,466";"A1";275562719;2010-03-26;"  ";0;0 72
pat_publn_id=. publn_auth=466";"A1";275562719;2010-03-2 publn_nr=  publn_nr_original=
publn_kind=  appln_id=. publn_date=. publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=2
NOTE: Invalid data for pat_publn_id in line 4 1-62.
4         900093015;"JP";"1174460";"";"A ";930063756;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=3
NOTE: Invalid data for pat_publn_id in line 5 1-63.
5         900063480;"JP";"03291571";"";"A ";930090062;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=4
NOTE: Invalid data for pat_publn_id in line 6 1-63.
6         900091165;"JP";"61193139";"";"A ";930101529;9999-12-31;"  ";0;0 63
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=5
NOTE: Invalid data for pat_publn_id in line 7 1-62.
7         900122268;"JP";"5861053";"";"A ";930111753;9999-12-31;"  ";0;0 62
pat_publn_id=. publn_auth=  publn_nr=  publn_nr_original=  publn_kind=  appln_id=. publn_date=.
publn_lg=  publn_first_grant=. publn_claims=. _ERROR_=1 _N_=6
NOTE: 6 records were read from the infile 'R:/Lipeng_Wang/PATSTAT/tls211_pat_publn.csv'.
      The minimum record length was 62.
      The maximum record length was 72.
NOTE: The data set P.PUBLICATIONS2 has 6 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.54 seconds
      cpu time            0.07 seconds


could you please give me some suggestion about how to read pat_publn_id in SAS? thanks

 


 

Solution
‎05-11-2018 12:55 PM
Super User
Posts: 13,889

Re: read a .csv file in SAS

[ Edited ]

The reason both of these fail is the same. You are actually attempting to read the data as fixed column input not delimited. And the data shown indicates that the data is semicolon delimited, not CSV. So your delimiter is wrong.

And why the -bleep- do you have all of the informats attempting to read 29 columns???

 

See if this works any better.

data P.Publications2 ;
   infile 'R/L/P/tls211_pat_publn.csv' DLM = ';' DSD missover lrecl = 32767
   firstobs = 2 ;
   informat 
      pat_publn_id 29.
      publn_auth $29.
      publn_nr $29.
      publn_nr_original $29.
      publn_kind $29.
      appln_id 29.
      publn_date YYMMDD10.
      publn_lg $29.
      publn_first_grant 29.
      publn_claims 29. 
   ;
   format publn_date YYMMDDd10. ;
   input 
      pat_publn_id 
      publn_auth $
      publn_nr $
      publn_nr_original $
      publn_kind $
      appln_id 
      publn_date 
      publn_lg $
      publn_first_grant 
      publn_claims  
  ;
run ;

 

Contributor
Posts: 52

Re: read a .csv file in SAS

many thanks for your help.
Respected Advisor
Posts: 3,866

Re: read a .csv file in SAS

The only change you really needed to make was to DLM

 

DLM = ';'

Your original input statement with colon INFORMAT modifiers is a correct alternative to using an INFORMAT statement.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 152 views
  • 3 likes
  • 3 in conversation