BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAShole
Pyrite | Level 9

Hi SAS friends,

I want to import a CSV file but I have a field  (CBSA_NAM) that contains multiple comma's as part of the text string but is identified by SAS as a delimiters. Ultimately i get an extra column created (VAR31) for some records. any workarounds for this?

Thanks!

Here is the code i'm using to import and a snippet of the file attached.

data WORK.CLARITAS_2010;

%let _EFIERR_ = 0; /* set the ERROR detection macro variable */

  infile "E:\Jeff B\Open\Data Pull for Paul\CSV Import.txt" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

  informat Geo_Id $1. ;

  informat ID $12. ;

  informat G3 $2. ;

  informat COUNTY $3.;

  informat G34 $5. ;

  informat G32 $6.;

  informat BG $1.;

  informat GCMSA $5. ;

  informat GCCBS $5.;

  informat GCAZP $5.;

  informat G11 $5. ;

  informat DMACODE $5.;

  informat CD $5. ;

  informat ST_F_NAM $30. ;

  informat CTY_FNAM $30. ;

  informat CBSA_NAM $50. ;

  informat GEO_CODE $20.;

  informat LAT $13. ;

  informat LON $13.  ;

  informat HOUSEHLDSC best32. ;

  informat HAGE15_24C best32.  ;

  informat HAGE25_34C best32.  ;

  informat HAGE35_44C best32. ;

  informat HAGE45_54C best32. ;

  informat HAGE55_64C best32. ;

  informat HAGE65_74C best32.  ;

  informat HAGE75_84C best32. ;

  informat HAGE_85P_C best32.  ;

  informat MAGE_HHR_C best32. ;

  informat AVAGE_HH_C best32.  ;

  informat VAR31 best32. ;

  format Geo_Id $1. ;

  format ID $12. ;

  format G3 $2. ;

  format COUNTY $3.;

  format G34 $5. ;

  format G32 $6.;

  format BG $1.;

  format GCMSA $5. ;

  format GCCBS $5.;

  format GCAZP $5.;

  format G11 $5. ;

  format DMACODE $5.;

  format CD $15. ;

  format ST_F_NAM $30. ;

  format CTY_FNAM $30. ;

  format CBSA_NAM $50. ;

  format GEO_CODE $20.;

  format LAT $13. ;

  format LON $13.  ;

  format HOUSEHLDSC best32. ;

  format HAGE15_24C best32.  ;

  format HAGE25_34C best32.  ;

  format HAGE35_44C best32. ;

  format HAGE45_54C best32. ;

  format HAGE55_64C best32. ;

  format HAGE65_74C best32.  ;

  format HAGE75_84C best32. ;

  format HAGE_85P_C best32.  ;

  format MAGE_HHR_C best32. ;

  format AVAGE_HH_C best32.  ;

  format VAR31 best32. ;

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This might do :

data want;

infile "&sasforum\datasets\CSV Import.txt" MISSOVER DSD lrecl=32767 firstobs=2 ;

  informat Geo_Id $1. ;

  informat ID $12. ;

  informat G3 $2. ;

  informat COUNTY $3.;

  informat G34 $5. ;

  informat G32 $6.;

  informat BG $1.;

  informat GCMSA $5. ;

  informat GCCBS $5.;

  informat GCAZP $5.;

  informat G11 $5. ;

  informat DMACODE $5.;

  informat CD $5. ;

  informat ST_F_NAM $30. ;

  informat CTY_FNAM $30. ;

  informat CBSA_NAM $50. ;

  informat CBSA_NAM1 $50. ;

  informat CBSA_NAM2 $50. ;

  informat GEO_CODE $20.;

  informat LAT $13. ;

  informat LON $13.  ;

  informat HOUSEHLDSC best32. ;

  informat HAGE15_24C best32.  ;

  informat HAGE25_34C best32.  ;

  informat HAGE35_44C best32. ;

  informat HAGE45_54C best32. ;

  informat HAGE55_64C best32. ;

  informat HAGE65_74C best32.  ;

  informat HAGE75_84C best32. ;

  informat HAGE_85P_C best32.  ;

  informat MAGE_HHR_C best32. ;

  informat AVAGE_HH_C best32.  ;

  informat VAR31 best32. ;

  format Geo_Id $1. ;

  format ID $12. ;

  format G3 $2. ;

  format COUNTY $3.;

  format G34 $5. ;

  format G32 $6.;

  format BG $1.;

  format GCMSA $5. ;

  format GCCBS $5.;

  format GCAZP $5.;

  format G11 $5. ;

  format DMACODE $5.;

  format CD $15. ;

  format ST_F_NAM $30. ;

  format CTY_FNAM $30. ;

  format CBSA_NAM $50. ;

  format GEO_CODE $20.;

  format LAT $13. ;

  format LON $13.  ;

  format HOUSEHLDSC best32. ;

  format HAGE15_24C best32.  ;

  format HAGE25_34C best32.  ;

  format HAGE35_44C best32. ;

  format HAGE45_54C best32. ;

  format HAGE55_64C best32. ;

  format HAGE65_74C best32.  ;

  format HAGE75_84C best32. ;

  format HAGE_85P_C best32.  ;

  format MAGE_HHR_C best32. ;

  format AVAGE_HH_C best32.  ;

  format VAR31 best32. ;

input @;

if countc(_infile_,",") = 29 then

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

else do;

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM1 $

  CBSA_NAM2 $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

  CBSA_NAM = catx(", ", CBSA_NAM1, CBSA_NAM2);

  end;

drop CBSA_NAM1 CBSA_NAM2;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

PG

PG

View solution in original post

4 REPLIES 4
ballardw
Super User

It is very interesting that the example data set has the variable name (or column header) with the quotes but the data does not.

Preferable would be to get the data source to quote qualify the values instead of the variable name.

If that isn't going to be possible you're going to have to parse the line, probably counting commas backwards from the end of the line to find where your value ends.

Look in the INFILE statement documentation for the _infile_ options and the examples for some hints.

SAShole
Pyrite | Level 9

Thanks Ballardw, I added those quotes around the column header to see if it would produce a differn't result, i just forgot to remove them when i posted the example Smiley Happy i will check out the infile documentaiton

PGStats
Opal | Level 21

This might do :

data want;

infile "&sasforum\datasets\CSV Import.txt" MISSOVER DSD lrecl=32767 firstobs=2 ;

  informat Geo_Id $1. ;

  informat ID $12. ;

  informat G3 $2. ;

  informat COUNTY $3.;

  informat G34 $5. ;

  informat G32 $6.;

  informat BG $1.;

  informat GCMSA $5. ;

  informat GCCBS $5.;

  informat GCAZP $5.;

  informat G11 $5. ;

  informat DMACODE $5.;

  informat CD $5. ;

  informat ST_F_NAM $30. ;

  informat CTY_FNAM $30. ;

  informat CBSA_NAM $50. ;

  informat CBSA_NAM1 $50. ;

  informat CBSA_NAM2 $50. ;

  informat GEO_CODE $20.;

  informat LAT $13. ;

  informat LON $13.  ;

  informat HOUSEHLDSC best32. ;

  informat HAGE15_24C best32.  ;

  informat HAGE25_34C best32.  ;

  informat HAGE35_44C best32. ;

  informat HAGE45_54C best32. ;

  informat HAGE55_64C best32. ;

  informat HAGE65_74C best32.  ;

  informat HAGE75_84C best32. ;

  informat HAGE_85P_C best32.  ;

  informat MAGE_HHR_C best32. ;

  informat AVAGE_HH_C best32.  ;

  informat VAR31 best32. ;

  format Geo_Id $1. ;

  format ID $12. ;

  format G3 $2. ;

  format COUNTY $3.;

  format G34 $5. ;

  format G32 $6.;

  format BG $1.;

  format GCMSA $5. ;

  format GCCBS $5.;

  format GCAZP $5.;

  format G11 $5. ;

  format DMACODE $5.;

  format CD $15. ;

  format ST_F_NAM $30. ;

  format CTY_FNAM $30. ;

  format CBSA_NAM $50. ;

  format GEO_CODE $20.;

  format LAT $13. ;

  format LON $13.  ;

  format HOUSEHLDSC best32. ;

  format HAGE15_24C best32.  ;

  format HAGE25_34C best32.  ;

  format HAGE35_44C best32. ;

  format HAGE45_54C best32. ;

  format HAGE55_64C best32. ;

  format HAGE65_74C best32.  ;

  format HAGE75_84C best32. ;

  format HAGE_85P_C best32.  ;

  format MAGE_HHR_C best32. ;

  format AVAGE_HH_C best32.  ;

  format VAR31 best32. ;

input @;

if countc(_infile_,",") = 29 then

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

else do;

  input

  Geo_Id $

  ID $

  G3 $

  COUNTY $

  G34 $

  G32 $

  BG $

  GCMSA $

  GCCBS $

  GCAZP $

  G11 $

  DMACODE $

  CD $

  ST_F_NAM $

  CTY_FNAM $

  CBSA_NAM1 $

  CBSA_NAM2 $

  GEO_CODE $

  LAT $

  LON $

  HOUSEHLDSC

  HAGE15_24C

  HAGE25_34C

  HAGE35_44C

  HAGE45_54C

  HAGE55_64C

  HAGE65_74C

  HAGE75_84C

  HAGE_85P_C

  MAGE_HHR_C

  AVAGE_HH_C

  VAR31

  ;

  CBSA_NAM = catx(", ", CBSA_NAM1, CBSA_NAM2);

  end;

drop CBSA_NAM1 CBSA_NAM2;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

PG

PG
SAShole
Pyrite | Level 9

Thanks PG! it worked just as expected with the real data! there was a case that had 31 commas and i was able to easily identify and build off your code to accommodate.

you are the man!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1054 views
  • 3 likes
  • 3 in conversation