BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bayesian1701
Fluorite | Level 6

I am trying to read in a csv (test.csv) file that looks roughly like this (but >10k rows and 6 columns)

-----------

|"id" ,var1, var2    |

|"0000",12,14|

|"1234",45, 10|

|"5678", 12,10|

----------

When I import the data with the below code the quote stay and I have a character variable of length 6.

proc import datafile = "test.csv"
out = test
dbms = csv
run;

proc contents test = want varnum; run;

My end goal is to merge this dataset with another dataset using a left join.  But in that data set the id don't have quotes and the length of the character variable is 4. But the left join is failing because I'm joining on things of different length. The dataset is too long for me to manually delete the quotes in the csv. 

 

I've tried this and it didn't really work

https://communities.sas.com/t5/SAS-Programming/Removing-quoted-text-from-value/m-p/778749#M247927

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

Hi @Bayesian1701,

 

In the data step, before you first use idnew, you can insert a length statement, e.g.:

 

length idnew $ 4;

 

 

Kind regards,

Amir.

View solution in original post

9 REPLIES 9
Reeza
Super User
Do you have the actual pipe characters in your file?

PROC IMPORT datafile = 'test.csv' out=test dbms=csv replace; guessingrows=max;
run;

What does the log show?
Amir
PROC Star

Hi @Bayesian1701,

 

Would using a data step and the dequote() function work for the variable you are trying to rectify, as in the example in the documentation:

 

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lefunctionsref/n0v1da94165bxen1oj003r1vx2m...

 

 

Kind regards,

Amir.

Bayesian1701
Fluorite | Level 6

There are no pipe characters in the file.

 

Dequote works to remove the quotes. But it leaves the character length as 6. 

 

Revised code"

proc import datafile = "~\test.csv"
out = test
dbms = csv
replace;
run;
data want;
set test;
idnew = dequote(id);
run;
proc contents data = want varnum; run;
proc print data = want noobs; run;

 

Here is the output.

Bayesian1701_0-1642199627048.png

How do a create a id variable that is now only 4 characters.

 

Amir
PROC Star

Hi @Bayesian1701,

 

In the data step, before you first use idnew, you can insert a length statement, e.g.:

 

length idnew $ 4;

 

 

Kind regards,

Amir.

Reeza
Super User

You really should fix this while reading in the data not after the fact. 
FYI -  if your length is truncated it won't matter if you increase it after the import. It will still show as truncated as it didn't import the full value. 

 

Try adding GUESSINGROWS=MAX to your PROC IMPORT code or show the log from the proc import code and we can show you to modify that to read in your file correctly. If you can attach a sample of the file, even better. 

Bayesian1701
Fluorite | Level 6

GUESSINGROWS=MAX didn't work because the raw data without the quotes removed is of length 6.  But I need it of length 4 to merge with the other data file where id is a character vector of length 4.. 

@Amir 's solution works. I'd be interested in seeing a different way to do it but I don't know how to modify proc import to unquote the data for the variable and then modify the size.

Reeza
Super User

Show us the log from PROC IMPORT and we can show you the code. 

 

Reasons why this is a bad practice- it works for homework/single use so if that's your use case go ahead. But for practical purposes this isn't a recommended approach. 

  • PROC IMPORT doesn't guarantee to read a file the same each time. If you receive a new file and repeat this, it likely won't work the same
  • You have no guarantee's the data was read in correctly - in your case it read in 6, but if the length changed it would now be truncated in your clean up code
  • It's inefficient - you do a bunch of unneeded work to clean up a file that could be fixed in one step. 

 

Tom
Super User Tom
Super User

PROC IMPORT will guess the length of a character variable is maximum length of value of the field on the line, including and quoting.

 

Try using this macro instead. https://github.com/sasutils/macros/blob/master/csv2ds.sas It will calculate the maximum length using the actual values, not the quoted values.  But if you want that first variable to be defined as character instead of numeric you might need need to use the override feature to set the type and length.

 

options parmcards=csv;
filename csv temp;

parmcards;
"id",var1,var2
"0000",12,14
"1234",45,10
"5678",12,10
;

proc import dbms=csv datafile=csv out=import replace; run;
%csv2ds(csv,out=csv2ds,replace=1);

proc compare data=import compare=csv2ds;
run;
The COMPARE Procedure
Comparison of WORK.IMPORT with WORK.CSV2DS
(Method=EXACT)

Data Set Summary

Dataset               Created          Modified  NVar    NObs

WORK.IMPORT  14JAN22:18:50:19  14JAN22:18:50:19     3       3
WORK.CSV2DS  14JAN22:18:50:19  14JAN22:18:50:19     3       3


Variables Summary

Number of Variables in Common: 3.
Number of Variables with Conflicting Types: 1.
Number of Variables with Differing Attributes: 2.


Listing of Common Variables with Conflicting Types

Variable  Dataset      Type  Length  Format  Informat

id        WORK.IMPORT  Char       6  $6.     $6.
          WORK.CSV2DS  Num        8  Z4.


Listing of Common Variables with Differing Attributes

Variable  Dataset      Type  Length  Format   Informat

var1      WORK.IMPORT  Num        8  BEST12.  BEST32.
          WORK.CSV2DS  Num        8
var2      WORK.IMPORT  Num        8  BEST12.  BEST32.
          WORK.CSV2DS  Num        8

So here is how you might force a variable to be character using %CSV2DS() macro.

Run it once and then find the maximum length from the generated _TYPES_ dataset and use that to create an OVERRIDES dataset. Setting FORMAT to a single underscore will remove any format that the macro might guess to attach.

Example:

%csv2ds(csv,out=csv2ds,replace=1);
data override;
  set _types_;
  where upcase(name)='ID';
  length=cats('$',maxlength);
  format='_';
  keep varnum length format;
run;
%csv2ds(csv,out=csv2ds,replace=1,overrides=override);

So for this little example the macro ends up running this code to create the dataset from the CSV file.

1097  +data csv2ds;
1098  +  infile CSV dlm=',' dsd truncover firstobs=2 ;
1099  +  length id $4 var1 8 var2 8 ;
1100  +  input id -- var2 ;
1101  +run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1166 views
  • 4 likes
  • 5 in conversation