Help using Base SAS procedures

Importing CSV with multiple delimiters in one field

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Importing CSV with multiple delimiters in one field

Hi SAS friends,

I want to import a CSV file but I have a field  (CBSA_NAM) that contains multiple comma's as part of the text string but is identified by SAS as a delimiters. Ultimately i get an extra column created (VAR31) for some records. any workarounds for this?

Thanks!

Here is the code i'm using to import and a snippet of the file attached.

data WORK.CLARITAS_2010;

%let _EFIERR_ = 0; /* set the ERROR detection macro variable */

  infile "E:\Jeff B\Open\Data Pull for Paul\CSV Import.txt" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

  informat Geo_Id $1. ;

  informat ID $12. ;

  informat G3 $2. ;

  informat COUNTY $3.;

  informat G34 $5. ;

  informat G32 $6.;

  informat BG $1.;

  informat GCMSA $5. ;

  informat GCCBS $5.;

  informat GCAZP $5.;

  informat G11 $5. ;

  informat DMACODE $5.;

  informat CD $5. ;

  informat ST_F_NAM $30. ;

  informat CTY_FNAM $30. ;

  informat CBSA_NAM $50. ;

  informat GEO_CODE $20.;

  informat LAT $13. ;

  informat LON $13.  ;

  informat HOUSEHLDSC best32. ;

  informat HAGE15_24C best32.  ;

  informat HAGE25_34C best32.  ;

  informat HAGE35_44C best32. ;

  informat HAGE45_54C best32. ;

  informat HAGE55_64C best32. ;

  informat HAGE65_74C best32.  ;

  informat HAGE75_84C best32. ;

  informat HAGE_85P_C best32.  ;

  informat MAGE_HHR_C best32. ;

  informat AVAGE_HH_C best32.  ;

  informat VAR31 best32. ;

  format Geo_Id $1. ;

  format ID $12. ;

  format G3 $2. ;

  format COUNTY $3.;

  format G34 $5. ;

  format G32 $6.;

  format BG $1.;

  format GCMSA $5. ;

  format GCCBS $5.;

  format GCAZP $5.;

  format G11 $5. ;

  format DMACODE $5.;

  format CD $15. ;

  format ST_F_NAM $30. ;

  format CTY_FNAM $30. ;

  format CBSA_NAM $50. ;

  format GEO_CODE $20.;

  format LAT $13. ;

  format LON $13.  ;

  format HOUSEHLDSC best32. ;

  format HAGE15_24C best32.  ;

  format HAGE25_34C best32.  ;

  format HAGE35_44C best32. ;

  format HAGE45_54C best32. ;

  format HAGE55_64C best32. ;

  format HAGE65_74C best32.  ;

  format HAGE75_84C best32. ;

  format HAGE_85P_C best32.  ;

  format MAGE_HHR_C best32. ;

  format AVAGE_HH_C best32.  ;

  format VAR31 best32. ;

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

Attachment

Accepted Solutions
Solution
‎04-05-2013 01:17 PM
Respected Advisor
Posts: 4,649

Re: Importing CSV with multiple delimiters in one field

This might do :

data want;

infile "&sasforum\datasets\CSV Import.txt" MISSOVER DSD lrecl=32767 firstobs=2 ;

  informat Geo_Id $1. ;

  informat ID $12. ;

  informat G3 $2. ;

  informat COUNTY $3.;

  informat G34 $5. ;

  informat G32 $6.;

  informat BG $1.;

  informat GCMSA $5. ;

  informat GCCBS $5.;

  informat GCAZP $5.;

  informat G11 $5. ;

  informat DMACODE $5.;

  informat CD $5. ;

  informat ST_F_NAM $30. ;

  informat CTY_FNAM $30. ;

  informat CBSA_NAM $50. ;

  informat CBSA_NAM1 $50. ;

  informat CBSA_NAM2 $50. ;

  informat GEO_CODE $20.;

  informat LAT $13. ;

  informat LON $13.  ;

  informat HOUSEHLDSC best32. ;

  informat HAGE15_24C best32.  ;

  informat HAGE25_34C best32.  ;

  informat HAGE35_44C best32. ;

  informat HAGE45_54C best32. ;

  informat HAGE55_64C best32. ;

  informat HAGE65_74C best32.  ;

  informat HAGE75_84C best32. ;

  informat HAGE_85P_C best32.  ;

  informat MAGE_HHR_C best32. ;

  informat AVAGE_HH_C best32.  ;

  informat VAR31 best32. ;

  format Geo_Id $1. ;

  format ID $12. ;

  format G3 $2. ;

  format COUNTY $3.;

  format G34 $5. ;

  format G32 $6.;

  format BG $1.;

  format GCMSA $5. ;

  format GCCBS $5.;

  format GCAZP $5.;

  format G11 $5. ;

  format DMACODE $5.;

  format CD $15. ;

  format ST_F_NAM $30. ;

  format CTY_FNAM $30. ;

  format CBSA_NAM $50. ;

  format GEO_CODE $20.;

  format LAT $13. ;

  format LON $13.  ;

  format HOUSEHLDSC best32. ;

  format HAGE15_24C best32.  ;

  format HAGE25_34C best32.  ;

  format HAGE35_44C best32. ;

  format HAGE45_54C best32. ;

  format HAGE55_64C best32. ;

  format HAGE65_74C best32.  ;

  format HAGE75_84C best32. ;

  format HAGE_85P_C best32.  ;

  format MAGE_HHR_C best32. ;

  format AVAGE_HH_C best32.  ;

  format VAR31 best32. ;

input @;

if countc(_infile_,",") = 29 then

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

else do;

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM1 $

  CBSA_NAM2 $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

  CBSA_NAM = catx(", ", CBSA_NAM1, CBSA_NAM2);

  end;

drop CBSA_NAM1 CBSA_NAM2;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

PG

PG

View solution in original post


All Replies
Super User
Posts: 10,500

Re: Importing CSV with multiple delimiters in one field

It is very interesting that the example data set has the variable name (or column header) with the quotes but the data does not.

Preferable would be to get the data source to quote qualify the values instead of the variable name.

If that isn't going to be possible you're going to have to parse the line, probably counting commas backwards from the end of the line to find where your value ends.

Look in the INFILE statement documentation for the _infile_ options and the examples for some hints.

Contributor
Posts: 62

Re: Importing CSV with multiple delimiters in one field

Thanks Ballardw, I added those quotes around the column header to see if it would produce a differn't result, i just forgot to remove them when i posted the example Smiley Happy i will check out the infile documentaiton

Solution
‎04-05-2013 01:17 PM
Respected Advisor
Posts: 4,649

Re: Importing CSV with multiple delimiters in one field

This might do :

data want;

infile "&sasforum\datasets\CSV Import.txt" MISSOVER DSD lrecl=32767 firstobs=2 ;

  informat Geo_Id $1. ;

  informat ID $12. ;

  informat G3 $2. ;

  informat COUNTY $3.;

  informat G34 $5. ;

  informat G32 $6.;

  informat BG $1.;

  informat GCMSA $5. ;

  informat GCCBS $5.;

  informat GCAZP $5.;

  informat G11 $5. ;

  informat DMACODE $5.;

  informat CD $5. ;

  informat ST_F_NAM $30. ;

  informat CTY_FNAM $30. ;

  informat CBSA_NAM $50. ;

  informat CBSA_NAM1 $50. ;

  informat CBSA_NAM2 $50. ;

  informat GEO_CODE $20.;

  informat LAT $13. ;

  informat LON $13.  ;

  informat HOUSEHLDSC best32. ;

  informat HAGE15_24C best32.  ;

  informat HAGE25_34C best32.  ;

  informat HAGE35_44C best32. ;

  informat HAGE45_54C best32. ;

  informat HAGE55_64C best32. ;

  informat HAGE65_74C best32.  ;

  informat HAGE75_84C best32. ;

  informat HAGE_85P_C best32.  ;

  informat MAGE_HHR_C best32. ;

  informat AVAGE_HH_C best32.  ;

  informat VAR31 best32. ;

  format Geo_Id $1. ;

  format ID $12. ;

  format G3 $2. ;

  format COUNTY $3.;

  format G34 $5. ;

  format G32 $6.;

  format BG $1.;

  format GCMSA $5. ;

  format GCCBS $5.;

  format GCAZP $5.;

  format G11 $5. ;

  format DMACODE $5.;

  format CD $15. ;

  format ST_F_NAM $30. ;

  format CTY_FNAM $30. ;

  format CBSA_NAM $50. ;

  format GEO_CODE $20.;

  format LAT $13. ;

  format LON $13.  ;

  format HOUSEHLDSC best32. ;

  format HAGE15_24C best32.  ;

  format HAGE25_34C best32.  ;

  format HAGE35_44C best32. ;

  format HAGE45_54C best32. ;

  format HAGE55_64C best32. ;

  format HAGE65_74C best32.  ;

  format HAGE75_84C best32. ;

  format HAGE_85P_C best32.  ;

  format MAGE_HHR_C best32. ;

  format AVAGE_HH_C best32.  ;

  format VAR31 best32. ;

input @;

if countc(_infile_,",") = 29 then

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

else do;

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM1 $

  CBSA_NAM2 $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

  CBSA_NAM = catx(", ", CBSA_NAM1, CBSA_NAM2);

  end;

drop CBSA_NAM1 CBSA_NAM2;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

PG

PG
Contributor
Posts: 62

Re: Importing CSV with multiple delimiters in one field

Thanks PG! it worked just as expected with the real data! there was a case that had 31 commas and i was able to easily identify and build off your code to accommodate.

you are the man!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 403 views
  • 3 likes
  • 3 in conversation