BookmarkSubscribeRSS Feed
1239
Calcite | Level 5

Hi, 

 

I am trying to import the attached "Sample File.csv" however I am unable to import it successfully. I noticed that there are line feed values and possible carriage return values hence unable to import the data.

 

Please help in import this .csv file successfully.

 

filename test 'C:\temp\Sample File.csv' termstr=CRLF;

proc import datafile=test
out=sample
dbms=csv
replace;
getnames=yes;
guessingrows=max;
run;
6 REPLIES 6
andreas_lds
Jade | Level 19
This topic has been discussed many times, please use the search-box.
1239
Calcite | Level 5

Please note after searching in google for the solution I have posted this query as I didn't any get solution.

1239
Calcite | Level 5

I have tried the below two different codes however still I am unable to import it successfully hence please help me.

 

FILENAME REFFILE 'C:\temp\Sample File.csv' TERMSTR=LF;

PROC IMPORT DATAFILE=REFFILE
	DBMS=CSV
	OUT=WORK.TEST;
	GETNAMES=YES;
RUN;
DATA WORK.TEST;
    LENGTH
        PATIENTNUMBER    $ 13
        SEQ                8
        TERM             $ 5
        TESTSPEC_        $ 200
        LN               $ 1000
        NL               $ 990 ;
    FORMAT
        PATIENTNUMBER    $CHAR13.
        SEQ              BEST1.
        TERM             $CHAR5.
        TESTSPEC_        $CHAR200.
        LN               $CHAR1000.
        NL               $CHAR990. ;
    INFORMAT
        PATIENTNUMBER    $CHAR13.
        SEQ              BEST1.
        TERM             $CHAR5.
        TESTSPEC_        $CHAR200.
        LN               $CHAR1000.
        NL               $CHAR990. ;
    INFILE 'C:\temp\Sample File.csv'
  		LRECL=32767 TERMSTR=LF DLM=',' MISSOVER DSD FIRSTOBS = 2 PAD;
    INPUT
        PATIENTNUMBER    : $CHAR13.
        SEQ              : ?? BEST1.
        TERM             : $CHAR5.
        TESTSPEC_        : $CHAR200.
        LN               : $CHAR1000.
        NL               : $CHAR990. ;

run;
Tom
Super User Tom
Super User

You need to first fix the file to not have embedded end of line markers in the middle of the values of fields.  You should be able to do that by just counting the number of quotes and removing any end of line markers that are inside quotes (when the number of quotes seen is odd).

 

Example:

%let dl=C:\Downloads;
filename fixed temp;
data _null_;
  infile "&dl/Sample File.csv" end=eof ;
  file fixed ;
  input;
  nq+countc(_infile_,'"');
  nq=mod(nq,2);
  put _infile_ @;
  if eof or not nq then put; 
  else put '|' @;
run;


data test (compress=yes);
  infile fixed dsd truncover firstobs=2;
  input PATIENTNUMBER :$20. SEQ  TERM :$20. (TESTSPEC_ LN NL) (:$2500.);
run;

proc freq ;
  tables seq term;
run;
The FREQ Procedure

                                Cumulative    Cumulative
SEQ    Frequency     Percent     Frequency      Percent
--------------------------------------------------------
  1          32       32.00            32        32.00
  2          20       20.00            52        52.00
  3          15       15.00            67        67.00
  4          12       12.00            79        79.00
  5           8        8.00            87        87.00
  6           5        5.00            92        92.00
  7           4        4.00            96        96.00
  8           2        2.00            98        98.00
  9           2        2.00           100       100.00


                                  Cumulative    Cumulative
TERM     Frequency     Percent     Frequency      Percent
----------------------------------------------------------
AE             59       59.00            59        59.00
OTHER          41       41.00           100       100.00
Ksharp
Super User

Here is. 

Did you just pick up the first 100 records ?

 

 

options compress=yes;
data temp;
infile 'c:\temp\Sample File.csv' termstr=crlf length=len firstobs=2 lrecl=32767;
input temp $varying4000. len;
if prxmatch('/^X\d+,|^\d+-\d+,/',temp) then group+1;
run;
data temp1;
 length x $ 10000;
 do until(last.group);
   set temp;
   by group;
   x=cats(x,temp);
 end;
run;
data want;
 set temp1;
 PATIENTNUMBER=scan(x,1,',','qm');
 SEQ=scan(x,2,',','qm');
 TERM=dequote(scan(x,3,',','qm'));
 TESTSPEC_LN=dequote(scan(x,4,',','qm'));
 NL=dequote(scan(x,5,',','qm'));
 z=	dequote(scan(x,6,',','qm'));
drop x temp;
run;
Ksharp
Super User
By the way, Maybe you want change this .

TESTSPEC_=dequote(scan(x,4,',','qm'));
LN=dequote(scan(x,5,',','qm'));
NL= dequote(scan(x,6,',','qm'));


And if yours was SAS9.2 , Try add LENGTH before scan():

data want;
set temp1;
length PATIENTNUMBER SEQ ......... $ 2000 ;
PATIENTNUMBER=scan(x,1,',','qm');
SEQ=scan(x,2,',','qm');

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2806 views
  • 0 likes
  • 4 in conversation