BookmarkSubscribeRSS Feed
1239
Calcite | Level 5

Hi, 

 

I am trying to import the attached "Sample File.csv" however I am unable to import it successfully. I noticed that there are line feed values and possible carriage return values hence unable to import the data.

 

Please help in import this .csv file successfully.

 

filename test 'C:\temp\Sample File.csv' termstr=CRLF;

proc import datafile=test
out=sample
dbms=csv
replace;
getnames=yes;
guessingrows=max;
run;
6 REPLIES 6
andreas_lds
Jade | Level 19
This topic has been discussed many times, please use the search-box.
1239
Calcite | Level 5

Please note after searching in google for the solution I have posted this query as I didn't any get solution.

1239
Calcite | Level 5

I have tried the below two different codes however still I am unable to import it successfully hence please help me.

 

FILENAME REFFILE 'C:\temp\Sample File.csv' TERMSTR=LF;

PROC IMPORT DATAFILE=REFFILE
	DBMS=CSV
	OUT=WORK.TEST;
	GETNAMES=YES;
RUN;
DATA WORK.TEST;
    LENGTH
        PATIENTNUMBER    $ 13
        SEQ                8
        TERM             $ 5
        TESTSPEC_        $ 200
        LN               $ 1000
        NL               $ 990 ;
    FORMAT
        PATIENTNUMBER    $CHAR13.
        SEQ              BEST1.
        TERM             $CHAR5.
        TESTSPEC_        $CHAR200.
        LN               $CHAR1000.
        NL               $CHAR990. ;
    INFORMAT
        PATIENTNUMBER    $CHAR13.
        SEQ              BEST1.
        TERM             $CHAR5.
        TESTSPEC_        $CHAR200.
        LN               $CHAR1000.
        NL               $CHAR990. ;
    INFILE 'C:\temp\Sample File.csv'
  		LRECL=32767 TERMSTR=LF DLM=',' MISSOVER DSD FIRSTOBS = 2 PAD;
    INPUT
        PATIENTNUMBER    : $CHAR13.
        SEQ              : ?? BEST1.
        TERM             : $CHAR5.
        TESTSPEC_        : $CHAR200.
        LN               : $CHAR1000.
        NL               : $CHAR990. ;

run;
Tom
Super User Tom
Super User

You need to first fix the file to not have embedded end of line markers in the middle of the values of fields.  You should be able to do that by just counting the number of quotes and removing any end of line markers that are inside quotes (when the number of quotes seen is odd).

 

Example:

%let dl=C:\Downloads;
filename fixed temp;
data _null_;
  infile "&dl/Sample File.csv" end=eof ;
  file fixed ;
  input;
  nq+countc(_infile_,'"');
  nq=mod(nq,2);
  put _infile_ @;
  if eof or not nq then put; 
  else put '|' @;
run;


data test (compress=yes);
  infile fixed dsd truncover firstobs=2;
  input PATIENTNUMBER :$20. SEQ  TERM :$20. (TESTSPEC_ LN NL) (:$2500.);
run;

proc freq ;
  tables seq term;
run;
The FREQ Procedure

                                Cumulative    Cumulative
SEQ    Frequency     Percent     Frequency      Percent
--------------------------------------------------------
  1          32       32.00            32        32.00
  2          20       20.00            52        52.00
  3          15       15.00            67        67.00
  4          12       12.00            79        79.00
  5           8        8.00            87        87.00
  6           5        5.00            92        92.00
  7           4        4.00            96        96.00
  8           2        2.00            98        98.00
  9           2        2.00           100       100.00


                                  Cumulative    Cumulative
TERM     Frequency     Percent     Frequency      Percent
----------------------------------------------------------
AE             59       59.00            59        59.00
OTHER          41       41.00           100       100.00
Ksharp
Super User

Here is. 

Did you just pick up the first 100 records ?

 

 

options compress=yes;
data temp;
infile 'c:\temp\Sample File.csv' termstr=crlf length=len firstobs=2 lrecl=32767;
input temp $varying4000. len;
if prxmatch('/^X\d+,|^\d+-\d+,/',temp) then group+1;
run;
data temp1;
 length x $ 10000;
 do until(last.group);
   set temp;
   by group;
   x=cats(x,temp);
 end;
run;
data want;
 set temp1;
 PATIENTNUMBER=scan(x,1,',','qm');
 SEQ=scan(x,2,',','qm');
 TERM=dequote(scan(x,3,',','qm'));
 TESTSPEC_LN=dequote(scan(x,4,',','qm'));
 NL=dequote(scan(x,5,',','qm'));
 z=	dequote(scan(x,6,',','qm'));
drop x temp;
run;
Ksharp
Super User
By the way, Maybe you want change this .

TESTSPEC_=dequote(scan(x,4,',','qm'));
LN=dequote(scan(x,5,',','qm'));
NL= dequote(scan(x,6,',','qm'));


And if yours was SAS9.2 , Try add LENGTH before scan():

data want;
set temp1;
length PATIENTNUMBER SEQ ......... $ 2000 ;
PATIENTNUMBER=scan(x,1,',','qm');
SEQ=scan(x,2,',','qm');

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2200 views
  • 0 likes
  • 4 in conversation