BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mconover
Quartz | Level 8

/* Just to note - all data enclosed is public access. */

So, I have a CSV file which contains both variable names (1st row) and data (remaining rows) in quotation marks.  For some reason the hospital provider numbers are coded with single quotation marks around them (so, the double quotes that exist around every data point are there, as well as an additional single quotation mark - see "Sample Data" below).

Is there a simple way to remove these quotation marks from my data or, alternatively, to force SAS to ignore them.  Working in Excel is not an option because the file is way too big and doesn't open properly.

Some forums recommended I use the command %let EFI_QUOTED_NUMERICS = yes but that doesn't seem to work.  I still get errors for every provider_number entry (See "My Log" below).  I'm not sure if this is because this function is intended to look for double quotation marks. 

Any help would be greatly appreciated!

Mitch

Sample Data:

"Provider Number","Hospital Name","Address1","Address2","Address3","City","State","ZIP Code","County Name","Phone Number","Hospital Type","Hospital Ownership","Emergency Service"

"'010001'","SOUTHEAST ALABAMA MEDICAL CENTER","1108 ROSS CLARK CIRCLE","","","DOTHAN","AL","36301","HOUSTON","3347938701","Acute Care Hospitals","Government - Hospital District or Authority","Yes"

"'010005'","MARSHALL MEDICAL CENTER SOUTH","2505 U S HIGHWAY 431 NORTH","","","BOAZ","AL","35957","MARSHALL","2565938310","Acute Care Hospitals","Go

My Coding:

%let EFI_QUOTED_NUMERICS = yes; /*CUSTOM PROGRAMMING TO ALLOW NUMERIC VARIABLES IN QUOTATIONS*/

data WORK.HQI_HOSP    ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile 'C:\Users\mconover\Desktop\COURSE WORK\718 Data\Hospital_flatfiles\Hospital_flatfiles\HQI_Hosp.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

       informat Provider_Number  10. ;

       informat Hospital_Name $73. ;

       informat Address1 $51. ;

       informat Address2 $5. ;

       informat Address3 $32. ;

       informat City $4. ;

       informat State $7. ;

       informat ZIP_Code 22. ;

       informat County_Name $38. ;

       informat Phone_Number $38. ;

       informat Hospital_Type $45. ;

       informat Hospital_Ownership $15. ;

    informat Emergency_Service $25. ;

       format Provider_Number 10. ;

       format Hospital_Name $73. ;

       format Address1 $51. ;

       format Address2 $5. ;

       format Address3 $32. ;

       format City $4. ;

       format State $7. ;

       format ZIP_Code 22. ;

       format County_Name $38. ;

       format Phone_Number $38. ;

       format Hospital_Type $45. ;

       format Hospital_Ownership $15. ;

    format Emergency_Service $25. ;

    input

                Provider_Number

                Hospital_Name $

                Address1 $

                Address2 $

                Address3 $

                City $

                State $

                ZIP_Code

                County_Name $

                Phone_Number $

                Hospital_Type $

                Hospital_Ownership $

                 Emergency_Service $    

    ;

    if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

    run;

PROC PRINT DATA=work.HQI_Hosp;

RUN;

My Log:

NOTE: Invalid data for Provider_Number in line 2 1-8.

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-

2         "'010001'","SOUTHEAST ALABAMA MEDICAL CENTER","1108 ROSS CLARK CIRCLE","","","DOTHAN",

      87  "AL","36301","HOUSTON","3347938701","Acute Care Hospitals","Government - Hospital Dist

     173  rict or Authority","Yes" 196

Provider_Number=. Hospital_Name=SOUTHEAST ALABAMA MEDICAL CENTER Address1=1108 ROSS CLARK CIRCLE

Address2=  Address3=  City=DOTH State=AL ZIP_Code=36301 County_Name=HOUSTON

Phone_Number=3347938701 Hospital_Type=Acute Care Hospitals Hospital_Ownership=Government - Ho

Emergency_Service=Yes _ERROR_=1 _N_=1

NOTE: Invalid data for Provider_Number in line 3 1-8.

3         "'010005'","MARSHALL MEDICAL CENTER SOUTH","2505 U S HIGHWAY 431 NORTH","","","BOAZ","

      87  AL","35957","MARSHALL","2565938310","Acute Care Hospitals","Government - Hospital Dist

     173  rict or Authority","Yes" 196

Provider_Number=. Hospital_Name=MARSHALL MEDICAL CENTER SOUTH

Address1=2505 U S HIGHWAY 431 NORTH Address2=  Address3=  City=BOAZ State=AL ZIP_Code=35957

County_Name=MARSHALL Phone_Number=2565938310 Hospital_Type=Acute Care Hospitals

Hospital_Ownership=Government - Ho Emergency_Service=Yes _ERROR_=1 _N_=2

NOTE: Invalid data for Provider_Number in line 4 1-8.

4         "'010006'","ELIZA COFFEE MEMORIAL HOSPITAL","205 MARENGO STREET","","","FLORENCE","AL"

      87  ,"35631","LAUDERDALE","2567688400","Acute Care Hospitals","Government - Hospital Distr

     173  ict or Authority","Yes" 195

Provider_Number=. Hospital_Name=ELIZA COFFEE MEMORIAL HOSPITAL Address1=205 MARENGO STREET

Address2=  Address3=  City=FLOR State=AL ZIP_Code=35631 County_Name=LAUDERDALE

Phone_Number=2567688400 Hospital_Type=Acute Care Hospitals Hospital_Ownership=Government - Ho

Emergency_Service=Yes _ERROR_=1 _N_=3

NOTE: Invalid data for Provider_Number in line 5 1-8.

5         "'010007'","MIZELL MEMORIAL HOSPITAL","702 N MAIN ST","","","OPP","AL","36467","COVING

      87  TON","3344933541","Acute Care Hospitals","Voluntary non-profit - Private","Yes" 165

Provider_Number=. Hospital_Name=MIZELL MEMORIAL HOSPITAL Address1=702 N MAIN ST Address2=

Address3=  City=OPP State=AL ZIP_Code=36467 County_Name=COVINGTON Phone_Number=3344933541

Hospital_Type=Acute Care Hospitals Hospital_Ownership=Voluntary non-p Emergency_Service=Yes

_ERROR_=1 _N_=4

NOTE: Invalid data for Provider_Number in line 6 1-8.



1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

An easy solution is simply to read the data after compressing it.  Try the following:

data WORK.HQI_HOSP    ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile "C:\Users\mconover\Desktop\COURSE WORK\718 Data\Hospital_flatfiles\Hospital_flatfiles\HQI_Hosp.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

       informat Provider_Number  10. ;

       informat Hospital_Name $73. ;

       informat Address1 $51. ;

       informat Address2 $5. ;

       informat Address3 $32. ;

       informat City $4. ;

       informat State $7. ;

       informat ZIP_Code 22. ;

       informat County_Name $38. ;

       informat Phone_Number $38. ;

       informat Hospital_Type $45. ;

       informat Hospital_Ownership $15. ;

    informat Emergency_Service $25. ;

       format Provider_Number 10. ;

       format Hospital_Name $73. ;

       format Address1 $51. ;

       format Address2 $5. ;

       format Address3 $32. ;

       format City $4. ;

       format State $7. ;

       format ZIP_Code 22. ;

       format County_Name $38. ;

       format Phone_Number $38. ;

       format Hospital_Type $45. ;

       format Hospital_Ownership $15. ;

    format Emergency_Service $25. ;

  input @;

  _infile_=compress(compress(_infile_,"'"),'"');

  input

               Provider_Number

                Hospital_Name $

                Address1 $

                Address2 $

                Address3 $

                City $

                State $

                ZIP_Code

                County_Name $

                Phone_Number $

                Hospital_Type $

                Hospital_Ownership $

                 Emergency_Service $   

    ;

  run;

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

An easy solution is simply to read the data after compressing it.  Try the following:

data WORK.HQI_HOSP    ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile "C:\Users\mconover\Desktop\COURSE WORK\718 Data\Hospital_flatfiles\Hospital_flatfiles\HQI_Hosp.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

       informat Provider_Number  10. ;

       informat Hospital_Name $73. ;

       informat Address1 $51. ;

       informat Address2 $5. ;

       informat Address3 $32. ;

       informat City $4. ;

       informat State $7. ;

       informat ZIP_Code 22. ;

       informat County_Name $38. ;

       informat Phone_Number $38. ;

       informat Hospital_Type $45. ;

       informat Hospital_Ownership $15. ;

    informat Emergency_Service $25. ;

       format Provider_Number 10. ;

       format Hospital_Name $73. ;

       format Address1 $51. ;

       format Address2 $5. ;

       format Address3 $32. ;

       format City $4. ;

       format State $7. ;

       format ZIP_Code 22. ;

       format County_Name $38. ;

       format Phone_Number $38. ;

       format Hospital_Type $45. ;

       format Hospital_Ownership $15. ;

    format Emergency_Service $25. ;

  input @;

  _infile_=compress(compress(_infile_,"'"),'"');

  input

               Provider_Number

                Hospital_Name $

                Address1 $

                Address2 $

                Address3 $

                City $

                State $

                ZIP_Code

                County_Name $

                Phone_Number $

                Hospital_Type $

                Hospital_Ownership $

                 Emergency_Service $   

    ;

  run;

mconover
Quartz | Level 8

Amazing.  A very simple and elegant solution indeed.  I didn't know about COMPRESS.  I have the feeling my coding life just got a lot easier.  

Thanks for the help!

art297
Opal | Level 21

Compress is simply a function that will remove specific characters.  I used it twice, once to remove the single quotes, then again to remove the double quotes.  The main "trick" was to use an input statement with an @ symbol first, so that each record would get into the buffer, modify it, then read it as desired.

Glad to hear that it worked for you and, yes, hope it does make your coding life easier.  One of the strong things about SAS is the extremely broad selection of functions that are built in and, now, a procedure (proc fcmp) that lets users build any additional functions they might need.

Tom
Super User Tom
Super User

You only want to remove the single quotes, not the surrounding double quotes.  The DSD option will remove those for you.

_infile_=compress(_infile_,"'");

Programming note: You can compress more than one character from a string at a time. The function will remove all instances of any character in the second argument.

_infile_=compress(_infile_,"'""");

data_null__
Jade | Level 19

I don't know "who" picked 10. as the INFORMAT for provider number it looks like it should be character.  It has a leading zero and while NUMBER is part of the field name I doubt you will ever do arithmetic with it.

I would use the $QUOTE10. INFORMAT.

No need to fiddle with _INFILE_

mconover
Quartz | Level 8

Thanks. I had another question which I now realize was related to this question (thanks to Arthur).  DN, thanks for the advice.  I'm now importing data very smoothly.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1707 views
  • 0 likes
  • 4 in conversation