BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mconover
Quartz | Level 8

For example, if my data set is:

"name" , " address" , "phone number"

"daffy duck" , "123 pizza lane, ohio" , "555-555-5555"

"ronald mcdonald" , "435 awesome street, michigan" , "555-555-5555"

When importing this data, SAS will assume that daffy duck and ronald mcdonald are names, 123 pizza lane and 435 awesome street are addresses, and that ohio and michigan are phone numbers.

All my data is enclosed in quotations.  Is there a way to get SAS to allow commas as long as they fall within the quotation marks?

Thanks for any help!

HERE'S MY CODING - (This is from the PROC IMPORT)

note - the problem occurs with address1

data WORK.HQI_HOSP    ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile "C:\Users\mconover\Desktop\COURSE WORK\718 Data\Hospital_flatfiles\Hospital_flatfiles\HQI_Hosp.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

       informat Provider_Number  10. ;

       informat Hospital_Name $73. ;

       informat Address1 $51. ;

       informat Address2 $5. ;

       informat Address3 $32. ;

       informat City $4. ;

       informat State $7. ;

       informat ZIP_Code 22. ;

       informat County_Name $38. ;

       informat Phone_Number $38. ;

       informat Hospital_Type $45. ;

       informat Hospital_Ownership $15. ;

    informat Emergency_Service $25. ;

       format Provider_Number 10. ;

       format Hospital_Name $73. ;

       format Address1 $51. ;

       format Address2 $5. ;

       format Address3 $32. ;

       format City $4. ;

       format State $7. ;

       format ZIP_Code 22. ;

       format County_Name $38. ;

       format Phone_Number $38. ;

       format Hospital_Type $45. ;

       format Hospital_Ownership $15. ;

    format Emergency_Service $25. ;

  input @;

  _infile_=compress(compress(_infile_,"'"),'"');

  input

               Provider_Number

                Hospital_Name $

                Address1 $

                Address2 $

                Address3 $

                City $

                State $

                ZIP_Code

                County_Name $

                Phone_Number $

                Hospital_Type $

                Hospital_Ownership $

                 Emergency_Service $  

    ;

  run;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why not only compress out the single quote characters that are causing the trouble instead of removing the double quotes that are protecting the embedded commas?

_infile_=compress(_infile_,"'")


Or  if you have some single quotes inside strings then try just removing those that open and close strings.

_infile_=tranwrd(tranwrd(_infile_,"""'",'"'),"'""",'"');


View solution in original post

12 REPLIES 12
Reeza
Super User

Why are you removing the quotation marks with the compress statement?

DSD should allow it to read properly, with the quotations and the comma's, that is the ideal format for embedded delimiters.

Assuming you leave the quotation marks in.

mconover
Quartz | Level 8

I have to remove quotation marks to make sure that the provider number, which has quotations within a quotation mark (e.g. " ' 10000 ' ") reads as a numeric variable rather than a character variable.  I will be merging with other data sets for which the provider number is a numeric variable without quotations.

Reeza
Super User

Does everything else read properly besides, the phone number being a character string?

If so then I'd change that to a numeric afterwards in the same step.

mconover
Quartz | Level 8

Maybe it will be easier if I rephrase my question.

Consider the sample data set with variables X, Y, and Z:

This is what my data looks like:

"X" , "Y" , "Z"

"A" , "3,B" , "C"

"B" , "2,D" , "D"

"C" , "4,F" , "B"

"A" , "4,B" , "C"

Now what I want to import is a table that looks like this

X     Y      Z

A     3,B   C

B     2,D   D

C     4,F   B

A     4,B   C

I don't need the commas in the table but I do need these double data points (3,B and 2,D etc...) to be contained within a single cell.  The problem is that it's assuming anything after the comma falls under the next column.

I'm sorry if I'm not making this very clear.

note: the quotations absolutely must be removed from the provider number in order to make it match the rest of my data (enabling a future merge) so excluding the compress statement is not an option.

Tom
Super User Tom
Super User

Why not only compress out the single quote characters that are causing the trouble instead of removing the double quotes that are protecting the embedded commas?

_infile_=compress(_infile_,"'")


Or  if you have some single quotes inside strings then try just removing those that open and close strings.

_infile_=tranwrd(tranwrd(_infile_,"""'",'"'),"'""",'"');


mconover
Quartz | Level 8

Thanks guys for all your help.  Tom I went with your suggestion since it seemed to be the most simple and easy to understand.  Thanks to Arthur for helping me a second time and thanks to Reeza for staying up with me trying to figure this out.  I can't tell you how helpful all this help is. 

art297
Opal | Level 21

@mconover: You should take a look at DN's comments to the question you initially raised yesterday.  Since provider_number contains leading zeros, not treating it as a number and using the $quote informat will likely be your best and easiest solution.

Regardless, I think the various responses you got should give you ideas that you can apply in the future.

sas_analytics
Calcite | Level 5

I think a mixed of dsd and colon modifier along with LRECL will help you solve the problem.

Here is what I have written for you for the dataset you mentioned.

I think you can read any address line with a embeded comma with this code.

Please let me know if you have any questions.

data mydata;

infile "C:\mydata.csv"

        dlm = ","        /*Specifying the delimeter*/

        dsd                /*reading any missing value within line*/

        missover        /*Reading missing value at the end of the line*/

        LRECL= 32760    /*changing the default length to 32K*/

        firstobs=2         /*reading the data lines only*/

        ;

input    name :$30.         /*specifiying colon modifier for reading data with comma*/

        address :$30.

        phone :$30.;

run;

mconover
Quartz | Level 8

hmmm   still didn't work  I really appreciate your help.  Anybody else have any ideas?  This has to be a common problem. 

data WORK.HQI_HOSP    ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile "C:\Users\mconover\Desktop\COURSE WORK\718 Data\Hospital_flatfiles\Hospital_flatfiles\HQI_Hosp.csv" DLMSTR = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

       informat Provider_Number  10. ;

       informat Hospital_Name $73. ;

       informat Address1 comma51. ;

       informat Address2 $5. ;

       informat Address3 $32. ;

       informat City $4. ;

       informat State $7. ;

       informat ZIP_Code 22. ;

       informat County_Name $38. ;

       informat Phone_Number $38. ;

       informat Hospital_Type $45. ;

       informat Hospital_Ownership $15. ;

    informat Emergency_Service $25. ;

       format Provider_Number 10. ;

       format Hospital_Name $73. ;

       format Address1 $51. ;

       format Address2 $5. ;

       format Address3 $32. ;

       format City $4. ;

       format State $7. ;

       format ZIP_Code 22. ;

       format County_Name $38. ;

       format Phone_Number $38. ;

       format Hospital_Type $45. ;

       format Hospital_Ownership $15. ;

    format Emergency_Service $25. ;

  input @;

  _infile_=compress(compress(_infile_,"'"),'"');

  input

               Provider_Number

                Hospital_Name $

                Address1 $

                Address2 $

                Address3 $

                City $

                State $

                ZIP_Code

                County_Name $

                Phone_Number $

                Hospital_Type $

                Hospital_Ownership $

                 Emergency_Service $  

    ;

  run;

PROC PRINT DATA=work.HQI_Hosp;

RUN;

shivas
Pyrite | Level 9

Hi,

Just try this...

For external file you can just remove cards and external file path.

data have;

infile cards  dlm = ","  dsd;            

  informat x y z $10.;

  format x y z $10.;

input x y z;

cards;

"A" , "3,B" , "C"

"B" , "2,D" , "D"

"C" , "4,F" , "B"

"A" , "4,B" , "C"

;

run;

Thanks,

Shiva

art297
Opal | Level 21

Since this is still the problem you posted yesterday, but with one extra twist, I'd suggest the same approach as I did yesterday, but using a different function.  According to your input statement, the Provider_Number is always as the start of each record and it is the only field that includes single quotes.  As such, why not just rid of the quotes around that one field?  e.g.:

data WORK.HQI_HOSP  (drop=x y) ;

    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

    infile "C:\Users\mconover\Desktop\COURSE WORK\718 Data\Hospital_flatfiles\Hospital_flatfiles\HQI_Hosp.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

    informat Provider_Number  10. ;

    informat Hospital_Name $73. ;

    informat Address1 $51. ;

    informat Address2 $5. ;

    informat Address3 $32. ;

    informat City $4. ;

    informat State $7. ;

    informat ZIP_Code 22. ;

    informat County_Name $38. ;

    informat Phone_Number $38. ;

    informat Hospital_Type $45. ;

    informat Hospital_Ownership $15. ;

    informat Emergency_Service $25. ;

    format Provider_Number 10. ;

    format Hospital_Name $73. ;

    format Address1 $51. ;

    format Address2 $5. ;

    format Address3 $32. ;

    format City $4. ;

    format State $7. ;

    format ZIP_Code 22. ;

    format County_Name $38. ;

    format Phone_Number $38. ;

    format Hospital_Type $45. ;

    format Hospital_Ownership $15. ;

    format Emergency_Service $25. ;

  input @;

    x=find(_infile_,"'",1);

    y=find(_infile_,"'",-255);

  _infile_=catt(substr(_infile_,3,y-3),substr(_infile_,y+2));

  input

                Provider_Number

                Hospital_Name $

                Address1 $

                Address2 $

                Address3 $

                City $

                State $

                ZIP_Code

                County_Name $

                Phone_Number $

                Hospital_Type $

                Hospital_Ownership $

                Emergency_Service $  

    ;

  run;

Surendra_Kandru
Calcite | Level 5

Hey u can use comma5 or comma10 to remove the comma between them and to put in single cell

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 23031 views
  • 0 likes
  • 7 in conversation