<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how to read huge csv file using proc import in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759813#M240185</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/294544"&gt;@RAVI2000&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;And once that error has occurred I have to close that program and create a new program.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;?????&lt;/P&gt;
&lt;P&gt;Did you save the program before running it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm also a tad confused about this:&lt;/P&gt;
&lt;PRE&gt;4061 proc import out = sj.diagnosis
4062 datafile = "/Projects/
4062 ! data/diagnosis.csv"
4063 dbms = csv replace;
4064 delimiter=",";
4065 GUESSINGROWS = MAX;
4066 run;&lt;/PRE&gt;
&lt;P&gt;That ! with two line 4062 indicates a continuation because the submitted line is longer than the log likes and normally means lines in excess of 80 or so characters.&lt;/P&gt;
&lt;P&gt;This sounds like you are runninng a batch job and the previous program is still running.&lt;/P&gt;</description>
    <pubDate>Thu, 05 Aug 2021 21:20:06 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2021-08-05T21:20:06Z</dc:date>
    <item>
      <title>how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759792#M240177</link>
      <description>&lt;P&gt;Hello Everyone,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was trying to import a .csv file which had almost 37000 records.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When i tried to read them using proc import it is giving me the below note.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;NOTE: Invalid data for provider_id in line 5641 36-43.&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;4061       proc import out = diagnosis
4062       	datafile = "/Projects/data/diagnosis.csv"
4063       	dbms = csv replace;
4064           delimiter=",";
4065       
4066       run;

4067        /*******************************************************************
4067     ! ***
4068        *   PRODUCT:   SAS
4069        *   VERSION:   9.4
4070        *   CREATOR:   External File Interface
4071        *   DATE:      05AUG21
4072        *   DESC:      Generated SAS Datastep Code
4073        *   TEMPLATE SOURCE:  (None Specified.)
4074        ********************************************************************
4074     ! ***/
4075           data DIAGNOSIS    ;
4076           %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
4077           infile Projects/
4077     ! data/diagnosis.csv' delimiter = ',' MISSOVER DSD
4077     ! lrecl=32767 firstobs=2 ;
4078              informat patient_mrn best32. ;
4079              informat encounter_id best32. ;
4080              informat enc_type $2. ;
4081              informat dx_date $21. ;
4082              informat provider_id best32. ;
4083              informat provider_name $30. ;
4084              informat provider_title $4. ;
4085              informat dx_name $101. ;
4086              informat dx_code $8. ;
4087              informat dx_type best32. ;
4088              informat dx_source $2. ;
4089              informat dx_origin $2. ;
4090              informat pdx $1. ;
4091              informat raw_pdx $24. ;
4092              informat sourcesystem_cd $11. ;
4093              format patient_mrn best12. ;
4094              format encounter_id best12. ;
4095              format enc_type $2. ;
4096              format dx_date $21. ;
&amp;#12;9 The SAS System

4097              format provider_id best12. ;
4098              format provider_name $30. ;
4099              format provider_title $4. ;
4100              format dx_name $101. ;
4101              format dx_code $8. ;
4102              format dx_type best12. ;
4103              format dx_source $2. ;
4104              format dx_origin $2. ;
4105              format pdx $1. ;
4106              format raw_pdx $24. ;
4107              format sourcesystem_cd $11. ;
4108           input
4109                       patient_mrn
4110                       encounter_id
4111                       enc_type  $
4112                       dx_date  $
4113                       provider_id
4114                       provider_name  $
4115                       provider_title  $
4116                       dx_name  $
4117                       dx_code  $
4118                       dx_type
4119                       dx_source  $
4120                       dx_origin  $
4121                       pdx  $
4122                       raw_pdx  $
4123                       sourcesystem_cd  $
4124           ;
4125           if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR
4125     ! detection macro variable */
4126           run;

NOTE: The infile '/Projects/
      data/diagnosis.csv' is:
      
      Filename=/Projects/ 
     data/diagnosis.csv,
      Owner Name=s_ravi,Group Name=BIOSTATS-USERS,
      Access Permission=-rwxr--r--,
      Last Modified=30Jul2021:07:57:16,
      File Size (bytes)=605197879

NOTE: Invalid data for provider_id in line 5641 36-43.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
5641      00365298,,NI,"2010-03-04 00:00:00",TC110397,"STEPIEN, KIMBERLY E",MD,"
      71  360.21 Progressive high (degenerative) myopia",360.21,09,FI,BI,P,"IDX 
     141  DX_1",DX|BI|IDX 155
patient_mrn=365298 encounter_id= enc_type=NI dx_date=2010-03-04 00:00:00
provider_id= provider_name=STEPIEN, KIMBERLY E provider_title=MD
dx_name=360.21 Progressive high (degenerative) myopia dx_code=360.21 dx_type=9
dx_source=FI dx_origin=BI pdx=P raw_pdx=IDX DX_1 sourcesystem_cd=DX|BI|IDX
_ERROR_=1 _N_=5640
NOTE: Invalid data for provider_id in line 6377 36-43.
6377      00011177,,NI,"2010-04-27 00:00:00",TC100135,"SHERMAN, TRACY L",MD,"290
      71  .40 Vascular dementia, uncomplicated",290.40,09,FI,BI,P,"IDX DX_1",DX|
     141  BI|IDX 146
patient_mrn=11177 encounter_id= enc_type=NI dx_date=2010-04-27 00:00:00
provider_id= provider_name=SHERMAN, TRACY L provider_title=MD
dx_name=290.40 Vascular dementia, uncomplicated dx_code=290.40 dx_type=9
dx_source=FI dx_origin=BI pdx=P raw_pdx=IDX DX_1 sourcesystem_cd=DX|BI|IDX
_ERROR_=1 _N_=6376&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and the log goes on with those above notes for so many subjects.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The provider_id has values like below:&lt;/P&gt;
&lt;P&gt;Patient_mrn&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Provider_id&lt;/P&gt;
&lt;P&gt;00365298&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; TC110397,&lt;/P&gt;
&lt;P&gt;00011177&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; TC100135,&lt;/P&gt;
&lt;P&gt;00365298&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; TC110397&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;while other provider_id for other subjects as&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;provider_id&lt;/P&gt;
&lt;P&gt;41357&lt;/P&gt;
&lt;P&gt;15089&lt;/P&gt;
&lt;P&gt;7682748&lt;/P&gt;
&lt;P&gt;247974928&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;so the provider_id has both char and numeric values. How can I read that variable into my data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 05 Aug 2021 20:08:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759792#M240177</guid>
      <dc:creator>RAVI2000</dc:creator>
      <dc:date>2021-08-05T20:08:56Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759794#M240179</link>
      <description>&lt;P&gt;PROVIDER_ID is a character field but you're trying to read it in as character.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;5641      00365298,,NI,"2010-03-04 00:00:00",&lt;FONT size="4" color="#FF0000"&gt;&lt;STRONG&gt;TC110397&lt;/STRONG&gt;&lt;/FONT&gt;,"STEPIEN, KIMBERLY E",MD,"
      71  360.21 Progressive high (degenerative) myopia",360.21,09,FI,BI,P,"IDX 
     141  DX_1",DX|BI|IDX 155&lt;/PRE&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;4082              informat provider_id best32. ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So you should modify the code to change PROVIDER_ID to be character.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can add GUESSINGROWS=MAX; to your PROC IMPORT code and it will scan the whole file before it guesses at types or you can copy the data step and modify it manually.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-SPOILER&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/294544"&gt;@RAVI2000&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello Everyone,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was trying to import a .csv file which had almost 37000 records.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When i tried to read them using proc import it is giving me the below note.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;NOTE: Invalid data for provider_id in line 5641 36-43.&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;4061       proc import out = diagnosis
4062       	datafile = "/Projects/data/diagnosis.csv"
4063       	dbms = csv replace;
4064           delimiter=",";
4065       
4066       run;

4067        /*******************************************************************
4067     ! ***
4068        *   PRODUCT:   SAS
4069        *   VERSION:   9.4
4070        *   CREATOR:   External File Interface
4071        *   DATE:      05AUG21
4072        *   DESC:      Generated SAS Datastep Code
4073        *   TEMPLATE SOURCE:  (None Specified.)
4074        ********************************************************************
4074     ! ***/
4075           data DIAGNOSIS    ;
4076           %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
4077           infile Projects/
4077     ! data/diagnosis.csv' delimiter = ',' MISSOVER DSD
4077     ! lrecl=32767 firstobs=2 ;
4078              informat patient_mrn best32. ;
4079              informat encounter_id best32. ;
4080              informat enc_type $2. ;
4081              informat dx_date $21. ;
4082              informat provider_id best32. ;
4083              informat provider_name $30. ;
4084              informat provider_title $4. ;
4085              informat dx_name $101. ;
4086              informat dx_code $8. ;
4087              informat dx_type best32. ;
4088              informat dx_source $2. ;
4089              informat dx_origin $2. ;
4090              informat pdx $1. ;
4091              informat raw_pdx $24. ;
4092              informat sourcesystem_cd $11. ;
4093              format patient_mrn best12. ;
4094              format encounter_id best12. ;
4095              format enc_type $2. ;
4096              format dx_date $21. ;
&amp;#12;9 The SAS System

4097              format provider_id best12. ;
4098              format provider_name $30. ;
4099              format provider_title $4. ;
4100              format dx_name $101. ;
4101              format dx_code $8. ;
4102              format dx_type best12. ;
4103              format dx_source $2. ;
4104              format dx_origin $2. ;
4105              format pdx $1. ;
4106              format raw_pdx $24. ;
4107              format sourcesystem_cd $11. ;
4108           input
4109                       patient_mrn
4110                       encounter_id
4111                       enc_type  $
4112                       dx_date  $
4113                       provider_id
4114                       provider_name  $
4115                       provider_title  $
4116                       dx_name  $
4117                       dx_code  $
4118                       dx_type
4119                       dx_source  $
4120                       dx_origin  $
4121                       pdx  $
4122                       raw_pdx  $
4123                       sourcesystem_cd  $
4124           ;
4125           if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR
4125     ! detection macro variable */
4126           run;

NOTE: The infile '/Projects/
      data/diagnosis.csv' is:
      
      Filename=/Projects/ 
     data/diagnosis.csv,
      Owner Name=s_ravi,Group Name=BIOSTATS-USERS,
      Access Permission=-rwxr--r--,
      Last Modified=30Jul2021:07:57:16,
      File Size (bytes)=605197879

NOTE: Invalid data for provider_id in line 5641 36-43.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
5641      00365298,,NI,"2010-03-04 00:00:00",TC110397,"STEPIEN, KIMBERLY E",MD,"
      71  360.21 Progressive high (degenerative) myopia",360.21,09,FI,BI,P,"IDX 
     141  DX_1",DX|BI|IDX 155
patient_mrn=365298 encounter_id= enc_type=NI dx_date=2010-03-04 00:00:00
provider_id= provider_name=STEPIEN, KIMBERLY E provider_title=MD
dx_name=360.21 Progressive high (degenerative) myopia dx_code=360.21 dx_type=9
dx_source=FI dx_origin=BI pdx=P raw_pdx=IDX DX_1 sourcesystem_cd=DX|BI|IDX
_ERROR_=1 _N_=5640
NOTE: Invalid data for provider_id in line 6377 36-43.
6377      00011177,,NI,"2010-04-27 00:00:00",TC100135,"SHERMAN, TRACY L",MD,"290
      71  .40 Vascular dementia, uncomplicated",290.40,09,FI,BI,P,"IDX DX_1",DX|
     141  BI|IDX 146
patient_mrn=11177 encounter_id= enc_type=NI dx_date=2010-04-27 00:00:00
provider_id= provider_name=SHERMAN, TRACY L provider_title=MD
dx_name=290.40 Vascular dementia, uncomplicated dx_code=290.40 dx_type=9
dx_source=FI dx_origin=BI pdx=P raw_pdx=IDX DX_1 sourcesystem_cd=DX|BI|IDX
_ERROR_=1 _N_=6376&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and the log goes on with those above notes for so many subjects.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The provider_id has values like below:&lt;/P&gt;
&lt;P&gt;Patient_mrn&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Provider_id&lt;/P&gt;
&lt;P&gt;00365298&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; TC110397,&lt;/P&gt;
&lt;P&gt;00011177&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; TC100135,&lt;/P&gt;
&lt;P&gt;00365298&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; TC110397&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;while other provider_id for other subjects as&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;provider_id&lt;/P&gt;
&lt;P&gt;41357&lt;/P&gt;
&lt;P&gt;15089&lt;/P&gt;
&lt;P&gt;7682748&lt;/P&gt;
&lt;P&gt;247974928&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;so the provider_id has both char and numeric values. How can I read that variable into my data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;/LI-SPOILER&gt;</description>
      <pubDate>Thu, 05 Aug 2021 20:16:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759794#M240179</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-08-05T20:16:13Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759798#M240180</link>
      <description>When I am using GUESSINGROWS = MAX;&lt;BR /&gt;It's giving me the below error.&lt;BR /&gt;&lt;BR /&gt;4061       proc import out = sj.diagnosis&lt;BR /&gt;4062       	datafile = "/Projects/&lt;BR /&gt;4062     ! data/diagnosis.csv"&lt;BR /&gt;4063       	dbms = csv replace;&lt;BR /&gt;4064           delimiter=",";&lt;BR /&gt;4065       	GUESSINGROWS = MAX;&lt;BR /&gt;4066       run;&lt;BR /&gt;ERROR: according to datain.log, 'sas datain' already in progress&lt;BR /&gt;       job abend Thu Aug  5 15:25:46 CDT 2021&lt;BR /&gt;       if this is incorrect, then 'rm datain.log' and re-submit</description>
      <pubDate>Thu, 05 Aug 2021 20:28:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759798#M240180</guid>
      <dc:creator>RAVI2000</dc:creator>
      <dc:date>2021-08-05T20:28:03Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759800#M240181</link>
      <description>And once that error has occurred I have to close that program and create a new program.</description>
      <pubDate>Thu, 05 Aug 2021 20:33:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759800#M240181</guid>
      <dc:creator>RAVI2000</dc:creator>
      <dc:date>2021-08-05T20:33:29Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759813#M240185</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/294544"&gt;@RAVI2000&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;And once that error has occurred I have to close that program and create a new program.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;?????&lt;/P&gt;
&lt;P&gt;Did you save the program before running it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm also a tad confused about this:&lt;/P&gt;
&lt;PRE&gt;4061 proc import out = sj.diagnosis
4062 datafile = "/Projects/
4062 ! data/diagnosis.csv"
4063 dbms = csv replace;
4064 delimiter=",";
4065 GUESSINGROWS = MAX;
4066 run;&lt;/PRE&gt;
&lt;P&gt;That ! with two line 4062 indicates a continuation because the submitted line is longer than the log likes and normally means lines in excess of 80 or so characters.&lt;/P&gt;
&lt;P&gt;This sounds like you are runninng a batch job and the previous program is still running.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Aug 2021 21:20:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759813#M240185</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-08-05T21:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759825#M240190</link>
      <description>Yes, I did save the program. How can i stop the previous program from running?</description>
      <pubDate>Thu, 05 Aug 2021 22:42:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759825#M240190</guid>
      <dc:creator>RAVI2000</dc:creator>
      <dc:date>2021-08-05T22:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759847#M240201</link>
      <description>&lt;P&gt;Are you in Windows or UNIX or ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Often you can find the Process ID in the log file.&amp;nbsp; If you know the Process ID, then you can "kill" an existing process.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Thu, 05 Aug 2021 23:52:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759847#M240201</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-05T23:52:00Z</dc:date>
    </item>
    <item>
      <title>Re: how to read huge csv file using proc import</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759848#M240202</link>
      <description>&lt;P&gt;Why are you using PROC IMPORT to read a delimited text file?&amp;nbsp; One with only 14 variables.&lt;/P&gt;
&lt;P&gt;Just write your own data step to read the file.&lt;/P&gt;
&lt;P&gt;Then you won't define the variables, like PATIENT_MRN and PROVIDER_ID, using the wrong type.&amp;nbsp; MRN is an identifier, so make it a character variable.&amp;nbsp; The average medical record number has no meaning.&amp;nbsp; Plus the values you are showing have leading zeros that probably want to preserve.&lt;/P&gt;
&lt;P&gt;Something like this.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data DIAGNOSIS ;
  infile "/Projects/data/diagnosis.csv" dsd truncover firstobs=2;
  length          
         patient_mrn $10 
         encounter_id $20
         enc_type $2. 
         dx_date 8
         provider_id $20 
         provider_name $30
         provider_title $10
         dx_name $200
         dx_code $8
         dx_type $2
         dx_source $2
         dx_origin $2
         pdx $2
         raw_pdx $30
         sourcesystem_cd $20
   ;
   informat dx_date anydtdte.;
   format dx_date yymmdd10.;
   input patient_mrn -- sourcesystem_cd ;
 run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You should define the lengths of the character variables based on the documentation of the maximum length they need.&amp;nbsp; But if you don't have such documentation then you could&amp;nbsp;analysis the whole file yourself to figure out the longest string in each field.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_ ;
  infile "/Projects/data/diagnosis.csv" dsd truncover firstobs=2 end=eof;
  array lengths[14];
  do col=1 to 14;
    input string :$32767. @;
    langths[col]=max(lengths[col],lengthn(string));
  end;
  if eof then do col=1 to 14 ;
    put col= lengths[col] ;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 06 Aug 2021 00:02:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-read-huge-csv-file-using-proc-import/m-p/759848#M240202</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-08-06T00:02:51Z</dc:date>
    </item>
  </channel>
</rss>

