BookmarkSubscribeRSS Feed
dtchoi86
Fluorite | Level 6

Hello.

I'm having an issue with importing a CSV file (I believe queried and given to me from a big database) into SAS when every string is in quotes as listed below

( "File","O,H,H","Yes")

However, if I open the CSV in excel and save over it, it results in the follow:

(File,"O,H,H",Yes)

and this imports just fine.

Is there a way to import without having excel involved?

Code:

Data test;

infile 'C:\Users\Daniel\Documents\Line.csv'

TERMSTR=CRLF

firstobs=4

dlm=','

dsd

missover

LRECL=6100

;

input

Case: $8.

Narrative: $4030.

Indication: $575.

Country: $4.

Report_Source: $12.

Image: $31.

Attach: $15.

;

run;

14 REPLIES 14
DBailey
Lapis Lazuli | Level 10

what is the issue in when importing directly from csv?

dtchoi86
Fluorite | Level 6

A dataset with 20,000 observation is only 19 observations long and fields with incorrect information in them.

DBailey
Lapis Lazuli | Level 10

My first thought is a different termstr value that is getting replaced when opened/saved in Excel.

dtchoi86
Fluorite | Level 6

Is there any other value other than: CRLF, CR, and LF?

Tried all three and none worked.  Perhaps the CSV file given to me is in a format SAS cannot read?

DBailey
Lapis Lazuli | Level 10

It's certainly possible to construct a file that uses something else.  At times, I've even had to open the file in a byte editor to look at what's actually there.

dtchoi86
Fluorite | Level 6

I see.

The only question I have left is, which is really just asking the question again:

Can SAS import a CSV where all the strings are in quotation marks.

ex.

"103943","O,H,J","File","Patient blah, blah, blah"

DBailey
Lapis Lazuli | Level 10

That should not be an issue.  You can prove that by creating a sample file and trying it. 

Reeza
Super User

Sorry, remove the dlm=',', the dad option should cover that.

Reeza
Super User

Also, you usually need truncover instead of missover.

dtchoi86
Fluorite | Level 6

Thank you for your input.

I tried all these options with no success.

I'm pretty convinced it's the file itself.  Perhaps the CSV file is coded differently?

I tried creating a sample csv, and it works fine.  However, this file doesn't seem to import.

ballardw
Super User

A dataset with 20,000 observation is only 19 observations long

This may mean that the data needs to have a fixed record length specified when reading.

art297
Opal | Level 21

Can you post a "small" hexdump of the first few records from the file? Mark Terjeson posted the following macro on SAS-L some years back which makes it quite easy to see what a file "really" looks like. You may have to include a max parameter when calling the macro if your file's first 2 or 3 records (after the initial 4 records that your skipping) don't appear:

%macro hexdump(pathfile=,max=20000);

  data _null_;

    length chars $16;

    retain address 0 chars '................';

    infile "&pathfile" lrecl=1 recfm=f end=done;

    counter+1;

    if counter le &max. then do;

      input c $char1.;

      if mod(address,16) eq 0 then do;

        if _N_ eq 1 then put address hex8. 'h: ' @;

        else put ' ; ' chars / address hex8. 'h: ' @;

        chars = '................';

      end;

      put c hex2. ' ' @;

      if rank(c) ge 32 then substr(chars,mod(address,16)+1,1) = c;

      address + 1;

    end;

    if done or counter gt &max. then do;

         do i =mod(address,16)+1 to 16;

          put ' ' @;

        end;

        put ' ; ' chars / ;

      end;

  run;

%mend;

%hexdump(pathfile=C:\Users\Daniel\Documents\Line.csv)

dtchoi86
Fluorite | Level 6

Not sure if I did this correctly, but here it is:

00000000h: 0A 22 54 6F 74 61 6C 20 4E 75 6D 62 65 72 20 6F  ;

00000010h: 66 20 43 61 73 65 73 3A 22 2C 2C 22 32 32 2C 38  ;

00000020h: 39 31 22 0A 0A 22 43 61 73 65 20 23 22 2C 22 56  ;

00000030h: 72 73 6E 22 2C 22 46 44 41 20 49 6E 69 74 69 61  ;

00000040h: 6C 20 52 65 63 64 20 44 61 74 65 22 2C 22 46 44  ;

00000050h: 41 20 52 65 63 64 20 44 61 74 65 22 2C 22 41 6C  ;

00000060h: 6C 20 53 75 73 70 65 63 74 73 22 2C 22 50 72 69  ;

00000070h: 6D 61 72 79 20 53 75 73 70 65 63 74 20 28 50 53  ;

00000080h: 29 22 2C 22 50 53 20 44 6F 73 65 2D 52 6F 75 74  ;

00000090h: 65 2D 46 72 65 71 75 65 6E 63 79 22 2C 22 4F 75  ;

art297
Opal | Level 21

You did it correctly, but didn't output enough characters for one to really know what the file structure might actually look like.

My first guess would be to try:

Data test;

infile 'C:\Users\Daniel\Documents\Line.csv'

firstobs=4

dlm='2C0A'x

dsd

missover

LRECL=6100

;

input

Case: $8.

Narrative: $4030.

Indication: $575.

Country: $4.

Report_Source: $12.

Image: $31.

Attach: $15.

;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1264 views
  • 0 likes
  • 5 in conversation