Data discrepancy converting csv to SAS dataset

Reply
Occasional Contributor
Posts: 16

Data discrepancy converting csv to SAS dataset

Hi all,

I have a very large csv file that I wish to convert to SAS dataset. Below is the code I used:

LIBNAME perm "D:\permanent_folder";

proc import datafile="D:\temp_folder\have.csv"

     out=perm.want

     dbms=csv

     replace;

     getnames=yes;

run;

I ran the code and obtained my 'want' dataset, permanently stored in the permanent_folder. But the issue is that there are discrepancies. The discrepancies are within several particular variables. Can someone give me some pointers on how to fix this please.

example:

CSV file

TypeExPriceVolumeBuyer IDBid Price
QuoteADF27.169985NYS27.16
QuoteADF27.193885NYS27.16
TradeBTY27.16100THM27.17
TradeDEA27.165200NYS27.16

SAS file obtained

TypeExPriceVolumeBuyer IDBid Price
QuoteA28NYS27.16
QuoteA28NYS27.16
TradeB21THM27.17
TradeD22NYS27.16
Frequent Contributor
Posts: 130

Re: Data discrepancy converting csv to SAS dataset

Posted in reply to Brian_Chee

I would recommend reading your CSV file in with a data step and not the import procedure.  In a data step, you can assign the formats that you want SAS to read the variables in as (informat); as well as, assign the formats you want to see displayed in the SAS data set (format).  With the import procedure, SAS will look through a few observations for each variable and assign what it assumes the format should be.  For the data set and variables you provided in your example, try the following:

data perm.want;

infile "D:\temp_folder\have.csv"

dsd missover firstobs=2;

informat Type $10. Ex $3. Price best10. Volume 6. BuyerID $3. BidPrice best10.;

format Type $10. Ex $3. Price best10. Volume 6. BuyerID $3. BidPrice best10.;

input Type Ex Price Volume BuyerID BidPrice;

run;

Occasional Contributor
Posts: 16

Re: Data discrepancy converting csv to SAS dataset

Hi all,

I think I have figured out the problem. Sharing this in hope that it might help others in the future. In the csv file, variables such as "Buyer ID" and "Bid Price" has a space in between, while SAS does not allow for that (ie should have been Buyer_ID and Bid_Price.)

What I end up doing was in the proc import statement, change getnames = yes to getnames=NO. Once I have my SAS dataset, I did the following

data work.want(drop =tempvar tempvar2 tempvar3);

  set perm.want(rename = (var2=tempvar var3=tempvar2 var4=tempvar3));

  var2 = input(tempvar, $4.);

  var3= input (tempvar2, 8.);

  var4= input (tempvar3, 8.);

  format var2 $7.

  var3 best12.

  var3 best12.;

  informat  var2 $7.

  var3 best32.

  var4 best32.;

run;

This might not be the most effective method, but it's working. In case anyone encounters similar issue in the future, CHECK THE VARIABLE NAMES.

Super User
Posts: 10,018

Re: Data discrepancy converting csv to SAS dataset

Posted in reply to Brian_Chee

Add option:

proc import datafile="D:\temp_folder\have.csv"

     out=perm.want

     dbms=csv

     replace;

     getnames=yes;

guessingrows=32767;

run;

Ask a Question
Discussion stats
  • 3 replies
  • 214 views
  • 3 likes
  • 3 in conversation