I am trying to read in a csv (test.csv) file that looks roughly like this (but >10k rows and 6 columns)
-----------
|"id" ,var1, var2 |
|"0000",12,14|
|"1234",45, 10|
|"5678", 12,10|
----------
When I import the data with the below code the quote stay and I have a character variable of length 6.
proc import datafile = "test.csv"
out = test
dbms = csv
run;
proc contents test = want varnum; run;
My end goal is to merge this dataset with another dataset using a left join. But in that data set the id don't have quotes and the length of the character variable is 4. But the left join is failing because I'm joining on things of different length. The dataset is too long for me to manually delete the quotes in the csv.
I've tried this and it didn't really work
https://communities.sas.com/t5/SAS-Programming/Removing-quoted-text-from-value/m-p/778749#M247927
Hi @Bayesian1701,
In the data step, before you first use idnew, you can insert a length statement, e.g.:
length idnew $ 4;
Kind regards,
Amir.
Hi @Bayesian1701,
Would using a data step and the dequote() function work for the variable you are trying to rectify, as in the example in the documentation:
Kind regards,
Amir.
Hi Bayesian1701,
Please see the following blog post: How to unquote SAS character variable values.
Hope this helps.
There are no pipe characters in the file.
Dequote works to remove the quotes. But it leaves the character length as 6.
Revised code"
proc import datafile = "~\test.csv"
out = test
dbms = csv
replace;
run;
data want;
set test;
idnew = dequote(id);
run;
proc contents data = want varnum; run;
proc print data = want noobs; run;
Here is the output.
How do a create a id variable that is now only 4 characters.
Hi @Bayesian1701,
In the data step, before you first use idnew, you can insert a length statement, e.g.:
length idnew $ 4;
Kind regards,
Amir.
You really should fix this while reading in the data not after the fact.
FYI - if your length is truncated it won't matter if you increase it after the import. It will still show as truncated as it didn't import the full value.
Try adding GUESSINGROWS=MAX to your PROC IMPORT code or show the log from the proc import code and we can show you to modify that to read in your file correctly. If you can attach a sample of the file, even better.
GUESSINGROWS=MAX didn't work because the raw data without the quotes removed is of length 6. But I need it of length 4 to merge with the other data file where id is a character vector of length 4..
@Amir 's solution works. I'd be interested in seeing a different way to do it but I don't know how to modify proc import to unquote the data for the variable and then modify the size.
Show us the log from PROC IMPORT and we can show you the code.
Reasons why this is a bad practice- it works for homework/single use so if that's your use case go ahead. But for practical purposes this isn't a recommended approach.
PROC IMPORT will guess the length of a character variable is maximum length of value of the field on the line, including and quoting.
Try using this macro instead. https://github.com/sasutils/macros/blob/master/csv2ds.sas It will calculate the maximum length using the actual values, not the quoted values. But if you want that first variable to be defined as character instead of numeric you might need need to use the override feature to set the type and length.
options parmcards=csv;
filename csv temp;
parmcards;
"id",var1,var2
"0000",12,14
"1234",45,10
"5678",12,10
;
proc import dbms=csv datafile=csv out=import replace; run;
%csv2ds(csv,out=csv2ds,replace=1);
proc compare data=import compare=csv2ds;
run;
The COMPARE Procedure Comparison of WORK.IMPORT with WORK.CSV2DS (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.IMPORT 14JAN22:18:50:19 14JAN22:18:50:19 3 3 WORK.CSV2DS 14JAN22:18:50:19 14JAN22:18:50:19 3 3 Variables Summary Number of Variables in Common: 3. Number of Variables with Conflicting Types: 1. Number of Variables with Differing Attributes: 2. Listing of Common Variables with Conflicting Types Variable Dataset Type Length Format Informat id WORK.IMPORT Char 6 $6. $6. WORK.CSV2DS Num 8 Z4. Listing of Common Variables with Differing Attributes Variable Dataset Type Length Format Informat var1 WORK.IMPORT Num 8 BEST12. BEST32. WORK.CSV2DS Num 8 var2 WORK.IMPORT Num 8 BEST12. BEST32. WORK.CSV2DS Num 8
So here is how you might force a variable to be character using %CSV2DS() macro.
Run it once and then find the maximum length from the generated _TYPES_ dataset and use that to create an OVERRIDES dataset. Setting FORMAT to a single underscore will remove any format that the macro might guess to attach.
Example:
%csv2ds(csv,out=csv2ds,replace=1);
data override;
set _types_;
where upcase(name)='ID';
length=cats('$',maxlength);
format='_';
keep varnum length format;
run;
%csv2ds(csv,out=csv2ds,replace=1,overrides=override);
So for this little example the macro ends up running this code to create the dataset from the CSV file.
1097 +data csv2ds;
1098 + infile CSV dlm=',' dsd truncover firstobs=2 ;
1099 + length id $4 var1 8 var2 8 ;
1100 + input id -- var2 ;
1101 +run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.