BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
melassiri
Fluorite | Level 6

I tried to import a file in csv format in sas but not without problem.
First I did it normally like:
proc import datafile=reffile
out=test_Eems
dbms=csv
replace;
getnames=yes;
run;
without success and I saw in the log that there could be talk of delimiter ';'.
So I added delimiter ';'.

proc import datafile=reffile
out=test_Eems
dbms=csv
replace;
delimiter=";" ;
getnames=yes;
run;

I get a better result but not always correct because a column (called Bericht) contains all kinds of characters also the semicolon.
Of course I get an error in the log.
What surprises me is that I could open the same file without problem in Excel or a google sheet but not in Sas.
I have converted the same file to an excel file using excel and I could easily read and open it in sas.

My question to you is if I could open the file in Excel and google sheet then it should also work in Sas.
I would like your help with that.
Below are the file that I could not import in csv format but in excel and the log that I got when I tried with csv format

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I suspect your file as embedded LF characters?

Try telling SAS to use TERMSTR=CRLF when reading the file.

data test;
  infile reffile dsd dlm=';' firstobs=2 termstr=crlf truncover;
  input (var1-var12) (:$200.);
run;

If you want to continue to use PROC IMPORT to make GUESSES about what the variables are and what types of data the variables have then you can add the TERMSTR= option to the FILENAME statement that defined the fileref REFFILE you used in your code:

filename reffile 'physical filename' termstr=crlf ;

If I use CSV2DS macro to guess how to read the file it defines the variables this way:

1089 +data test2;
1090 +  infile CSV termstr=crlf dlm=';' dsd truncover firstobs=2 ;
1091 +  length Datum 8 Url $341 Sentiment $8 Type $7 Discussie_lengte 8 Views 8
1092 +    Auteur $81 Volgers 8 Bron $9 Titel $172 Bericht $9010 Labels $1
1093 +  ;
1094 +  informat Datum anydtdtm. ;
1095 +  format Datum datetime19. ;
1096 +  label Discussie_lengte='Discussie lengte' ;
1097 +  input Datum -- Labels ;
1098 +run;

 

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

I suspect your file as embedded LF characters?

Try telling SAS to use TERMSTR=CRLF when reading the file.

data test;
  infile reffile dsd dlm=';' firstobs=2 termstr=crlf truncover;
  input (var1-var12) (:$200.);
run;

If you want to continue to use PROC IMPORT to make GUESSES about what the variables are and what types of data the variables have then you can add the TERMSTR= option to the FILENAME statement that defined the fileref REFFILE you used in your code:

filename reffile 'physical filename' termstr=crlf ;

If I use CSV2DS macro to guess how to read the file it defines the variables this way:

1089 +data test2;
1090 +  infile CSV termstr=crlf dlm=';' dsd truncover firstobs=2 ;
1091 +  length Datum 8 Url $341 Sentiment $8 Type $7 Discussie_lengte 8 Views 8
1092 +    Auteur $81 Volgers 8 Bron $9 Titel $172 Bericht $9010 Labels $1
1093 +  ;
1094 +  informat Datum anydtdtm. ;
1095 +  format Datum datetime19. ;
1096 +  label Discussie_lengte='Discussie lengte' ;
1097 +  input Datum -- Labels ;
1098 +run;

 

melassiri
Fluorite | Level 6

Thans for your answer.
I like it and it all works.
But I am still a beginner and the original file(Eems_Dollard.csv) had a starting column called "Zoekopdracht".
When i apply your answer to the original file it does not work and I do not know why.
If you succeed I would like to hear from you.
Thanks again for the effort.

JOL
SAS Employee JOL
SAS Employee

The best way to read this file into SAS is to use the Data Step.

 

data test;
infile "c:\temp\test_Eems.csv" dlm=';' dsd missover firstobs=2;
input Datnum : $15.  Url : $100. Sentiment : $10. Type : $10.
Discussie_Lengte : 8. Views : 8.  Auteur : $25.  Volgers : 8.
Bron : $15. Title : $200. Bericht : $300.  Labels : $10.;
run;
 
proc print data=test;
run;
 
The above method uses the Modified List Input method SAS Help Center: INPUT Statement: List

 

 

 

melassiri
Fluorite | Level 6

Thank you for your answer.
I like it and it all works.
When i apply your answer to the original file it does not work and I do not know why.
If you succeed I would like to hear from you.
Thanks again for the effort.

 

JOL
SAS Employee JOL
SAS Employee

You have an additional column in this file , here's the revised code:

 

data test;
infile "c:\temp\Eems_Dollard.csv" dlm=';' dsd missover firstobs=2;
input Zoekopdracht : $ 200. Datnum : $15. Url : $100. Sentiment : $10. Type : $10.
Discussie_Lengte : 8. Views : 8. Auteur : $25. Volgers : 8.
Bron : $15. Title : $200. Bericht : $300. Labels : $10.;
run;

proc print data=test;
run;

 

melassiri
Fluorite | Level 6

Thanks for your quick reply.
But it doesn't work completely, look at the screen shot(in pdf formaat).
Thanks again.

Patrick
Opal | Level 21

Using the SAS EG import wizard to generate data step code appears to work.

Below the generated code with some manual changes applied (like longer character variables so they hopefully are also sufficiently long for all your data and not only the sample).

DATA WORK.test_Eems;
    LENGTH
        Datum              8
        Url              $ 500
        Sentiment        $ 8
        Type             $ 7
        Discussie_lengte   8
        Views              8
        Auteur           $ 250
        Volgers            8
        Bron             $ 9
        Titel            $ 250
        Bericht          $ 1500
        Labels           $ 10 ;
    LABEL
        Discussie_lengte = "Discussie lengte" ;
    FORMAT
        Datum            date9.
        Url              $500.
        Sentiment        $8.
        Type             $7.
        Discussie_lengte best32.
        Views            best32.
        Auteur           $250.
        Volgers          best32.
        Bron             $9.
        Titel            $250.
        Bericht          $1500.
        Labels           $10. ;
    INFORMAT
        Datum            ddmmyy20.
        Url              $500.
        Sentiment        $8.
        Type             $7.
        Discussie_lengte best32.
        Views            best32.
        Auteur           $250.
        Volgers          best32.
        Bron             $9.
        Titel            $250.
        Bericht          $1500.
        Labels           $10. ;
    INFILE 'C:\temp\test_Eems.csv'
        LRECL=30000
        ENCODING="UTF-8"
        TERMSTR=CRLF
        DLM=';'
        truncover
        DSD ;
    INPUT
        Datum            
        Url              
        Sentiment        
        Type             
        Discussie_lengte 
        Views            
        Auteur           
        Volgers          
        Bron             
        Titel            
        Bericht          
        Labels   
        ; 
RUN;
melassiri
Fluorite | Level 6

Thank you for your answer.
I like it and it all works.
But I am still a beginner and the original file(Eems_Dollard.csv) had a starting column called "Zoekopdracht".
When i apply your answer to the original file it does not work and I do not know why.
If you succeed I would like to hear from you.
Thanks again for the effort.

Tom
Super User Tom
Super User

Where are you getting these files?  Perhaps you should talk to the providers and get them to clarify what type of files they are producing?

 

If you look at the first few bytes of that file:

data _null_;
  infile "C:\downloads\Eems_Dollard.csv" lrecl=100 recfm=f obs=5;
  input;
  list;
run;

It seems to be using LF only as the end of line marker.  Which means that if there are ANY other embedded LF characters (like you had in the other file) then there is no way for SAS to tell the difference between a real END OF LINE marker and these other LF characters.

 

You might try seeing if the extra LF characters are at least nested inside of double quotes by running this macro, %replace_crlf  ,to make a new version of the file where such embedded LF (and or CR) characters are either removed or replaced with some other character.

 

That seems to work fine for the supplied file.

filename csv "C:\downloads\Eems_Dollard.csv";
filename csv2 temp;
%replace_crlf(csv,csv2,lf=' ',cr=)
%csv2ds(csv2,dlm=';',out=want,replace=YES)

Making a file that can be read by SAS.

5   +data want;
6   +  infile CSV2 dlm=';' dsd truncover firstobs=2 ;
7   +  length Zoekopdracht $34 Datum 8 Url $341 Sentiment $8 Type $7
8   +    Discussie_lengte 8 Views 8 Auteur $81 Volgers 8 Bron $9 Titel $172
9   +    Bericht $9010 Labels $1
10  +  ;
11  +  informat Datum anydtdtm. ;
12  +  format Datum datetime19. ;
13  +  label Discussie_lengte='Discussie lengte' ;
14  +  input Zoekopdracht -- Labels ;
15  +run;

NOTE: The infile CSV2 is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 2124 records were read from the infile (system-specific pathname).
      The minimum record length was 175.
      The maximum record length was 9208.
NOTE: The data set WORK.WANT has 2124 observations and 13 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

 

melassiri
Fluorite | Level 6

Thanks for your quick reply.
As I said I am still a beginner so it will be hard enough for me.
But I have already learned something from you so thanks again.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1663 views
  • 6 likes
  • 4 in conversation