BookmarkSubscribeRSS Feed
Kayomole
Calcite | Level 5

SAS is importing date variable into numbers as "1" or "2" instead of using the right date format as YYMMDD8.. Also, it has some missing values in the beginning.  The code I used is given below. Could anyone suggest what I am doing wrong? Thank you.

 

PROC IMPORT OUT= WORK.df
DATAFILE= "work.csv"
DBMS=CSV REPLACE;
delimiter=",";
GETNAMES=YES;
RUN;

14 REPLIES 14
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, simple answer is that your using a guessing procedure to guess what the data should be like.  If you have a CSV file, and by that I mean a Comma Separated Variable file which has:

<headers>

<values separated by commas>

Then you read this file in by using a datastep:

data want;
  infile "thefile.csv";
  length var1 8 var2 20;
  format var1 date9. var2 $20.;
  informat var1 yymmdd10.;
  input var1 var2 $;
run;

Note I use informat yymmdd10. to read the date value.

Kayomole
Calcite | Level 5
Thanks for the reply. I tried this but the output gives missing values
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Some kind of information as to what you ran and what you ran it on would help.

Kayomole
Calcite | Level 5

Following is the output SAS is giving. Instead I expect the prdate to be the same as fdate or rdate. 


Untitled.png
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, I cannot see your computer.  Follow the guidance you will find by the Post button on new questions:

Capture.PNG

Post test data (normally in a datastep, but in this case just a copy paste of the first few rows of the file in a code window {i} would be fine)

Then post the code you have used.

Kayomole
Calcite | Level 5

Following is the output SAS is giving. Instead I expect the prdate to be the same as rdate. 

 

fdatemgrnamemgrnordateprdatecusipshareschangeticker
19981231A R ASSET MANAGEMENT INC11019981231117651011500NAAMR
19981231A R ASSET MANAGEMENT INC1101998123111957104212NAT
19981231A R ASSET MANAGEMENT INC1101998123112824103500NAABT
19981231A R ASSET MANAGEMENT INC1101998123114930208333NAATV
19981231A R ASSET MANAGEMENT INC1101998123127924102641NAAEG
19981231A R ASSET MANAGEMENT INC110199812312NA1764NAATI
19981231A R ASSET MANAGEMENT INC110199812312NA7NAAIR
19981231A R ASSET MANAGEMENT INC11019981231217175101739NAY
19981231A R ASSET MANAGEMENT INC11019981231217475101078NAAEH
19981231A R ASSET MANAGEMENT INC110199812312NA105040NAALL

 

PROC IMPORT OUT= WORK.df
DATAFILE= "work.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I am sorry, I can't help any further, I have already pointed out in a previous post that using proc import will guess what your data should be.  Also your not providing the information I request, specifically what the CSV file looks like.  For all I know prdate is just one number and proc import is working correctly.

Kayomole
Calcite | Level 5

Foolowing is the original csv file which I could open it in "R". 

fdatemgrnameCountrymgrnotypecoderdateprdate
19990930UNITED STATES TRUST CO OF NYUNITED STATES8931011999093019990630
19990930WEISS, PECK & GREER, L.L.C.UNITED STATES9178051999093019990630
19990930WORLD ASSET MANAGEMENTUNITED STATES9383051999093019990630
19990930BARCLAYS BANK PLCUNITED KINGDOM790041999093019990630
19990930JMG CAPITAL PTNR LPUNITED STATES4810041999093019990331
19990930LEGG MASON WOOD WALKER, INC.UNITED STATES5018051999093019990630
19990930SSI INVESTMENT MANAGEMENT INC.UNITED STATES8165051999093019990630
19990930STROME SUSSKIND INV MGMT, L.P.UNITED STATES8209051999093019990630
19990930AMERICAN GENERAL CORPORATIONUNITED STATES234021999093019990630
19990930ARTISAN PTNR L.P.UNITED STATES471951999093019990630
19990930BANK ONE CORPORATIONUNITED STATES595511999093019990630

As you can see prdate is formated the same way as rdate. But, sas output is different.  

RW9
Diamond | Level 26 RW9
Diamond | Level 26

That output is not what the underlying CSV file looks like.  Right click on the csv file and open it in Notepad, not Excel.  This code, although reading from datalines operates exactly the same as if it read from csv file - the bit just after datalines; shows what the CSV file actually looks like (although yours will have a list of variable names in the first row), remember CSV is not an Excel file.

data want;
  infile datalines dlm="," dsd;
  length country mgrname $200;
  informat fdate rdate prdate yymmdd8.;
  format fdate rdate prdate date9.;
  input fdate	mgrname $	Country $	mgrno	typecode	rdate	prdate;
datalines;
19990930,"UNITED STATES TRUST CO OF NY","UNITED STATES",89310,1,19990930,19990630
19990930,"WEISS, PECK & GREER, L.L.C.","UNITED STATES",91780,5,19990930,19990630
19990930,"WORLD ASSET MANAGEMENT","UNITED STATES",93830,5,19990930,19990630
;
run;
Kayomole
Calcite | Level 5

Thank you very much for reply. The problem I figured is that due to missing values in the begining of prdate variable proc import forces the length of that variable equal to "1" and then when the actual values come it takes 1 from 1900's and 2 from 2000's and converts the values to "1" or "2". I am still trying to solve this problem using guessingrows option. I am grateful for your input  

Kurt_Bremser
Super User

The simplest solution: don't use proc import for the real import, but only as a helper to get a "first shot".

Take the data step that proc import wrote to the log, and adapt it to your needs.

 

Or drop proc import altogether and start writing the import data step on your own. It's quite simple, as the code provided by @RW9 shows.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Again, its because your letting it guess.  Define it yourself and it doesn't matter, exact same code but with some missings:

data want;
  infile datalines dlm="," dsd;
  length country mgrname $200;
  informat fdate rdate prdate yymmdd8.;
  format fdate rdate prdate date9.;
  input fdate	mgrname $	Country $	mgrno	typecode	rdate	prdate;
datalines;
19990930,"UNITED STATES TRUST CO OF NY","UNITED STATES",89310,1,,
19990930,"UNITED STATES TRUST CO OF NY","UNITED STATES",89310,1,,
19990930,"UNITED STATES TRUST CO OF NY","UNITED STATES",89310,1,19990930,19990630
19990930,"WEISS, PECK & GREER, L.L.C.","UNITED STATES",91780,5,19990930,19990630
19990930,"WORLD ASSET MANAGEMENT","UNITED STATES",93830,5,19990930,19990630
;
run;
Tom
Super User Tom
Super User

Either tell PROC IMPORT that it should check more of the file before deciding for you what the variables are.

https://support.sas.com/documentation/cdl/en/proc/70377/HTML/default/viewer.htm#n1qn5sclnu2l9dn1w61i...

Just add this statement into your PROC IMPORT code.

guessingrows=max;

Or better still you could just read the file yourself instead of letting PROC IMPORT guess at what is in the file.  You are probably going to be able to make a better decision about what each column represents than SAS can guess.  You will understand the meaning of the columne and SAS can only look at the single example input file you have given it.  So you might have a sample file with really short strings, but you would know to define the variables longer to handle other data files of the same type.

 

It is really easy to write a data step to read a delimited file. Follow these simple steps.

First write a DATA statement and an INFILE statement.  Use the DSD option to indicate that you want to properly handle missing values. The default delimiter for DSD is a comma, but you can use the DLM= option to change that. Use the FIRSOBS=2 option to skip the header line (if your file has one).  Use the TRUNCOVER option in case some lines do not have all of the columns.

data want ;
  infile 'myfile.csv' dsd dlm=',' truncover firstobs=2;

Then copy the first line from your delimited file and paste it into your program and convert it to a LENGTH statement. This will let you can define the variable's position, name, type and length.  If the headers are not valid SAS variable names you will need to edit them to make them into valid names.  You could even take this opportunity to convert the column headers into variable names that might work better for you.  Lengths with $ in front mean the variable is character. Otherwise use 8 for the length as SAS stores all numbers (dates and times are numbers also) as 64bit floating point numbers.

length fdate 8 mgrname $40 Country $20 mgrno $10 typecode $10 rdate 8 prdate 8;

Then for any variables that REQUIRE them add INFORMAT and/or FORMAT statement. So if your "date" variables are actual dates then you will probably want to use an INFORMAT and a FORMAT so that SAS will treat them as dates and not numbers or strings. There is no need to apply informat or formats to character variables or even most numeric variables.  There is no need to add a length to the INFORMAT since SAS will ignore it anyway when using the DSD option on the INFILE statement.

informat fdate rdate prdate yymmdd.;
format fdate rdate prdate yymmdd10.;

Then for the input statement you just need to list the variables.  If you have defined them in the right order in the LENGTH statement then you can use a variable list in the INPUT statement to make it even easier.

input fdate -- prdate ;
run;

So the PROC IMPORT program will be 5 statements over 7 lines and the DATA step program is 7 statements over 7 lines.  5 statements if you just want to read the dates as strings.

Kurt_Bremser
Super User

@Kayomole wrote:

Foolowing is the original csv file which I could open it in "R". 

fdate mgrname Country mgrno typecode rdate prdate
19990930 UNITED STATES TRUST CO OF NY UNITED STATES 89310 1 19990930 19990630
19990930 WEISS, PECK & GREER, L.L.C. UNITED STATES 91780 5 19990930 19990630
19990930 WORLD ASSET MANAGEMENT UNITED STATES 93830 5 19990930 19990630
19990930 BARCLAYS BANK PLC UNITED KINGDOM 7900 4 19990930 19990630
19990930 JMG CAPITAL PTNR LP UNITED STATES 48100 4 19990930 19990331
19990930 LEGG MASON WOOD WALKER, INC. UNITED STATES 50180 5 19990930 19990630
19990930 SSI INVESTMENT MANAGEMENT INC. UNITED STATES 81650 5 19990930 19990630
19990930 STROME SUSSKIND INV MGMT, L.P. UNITED STATES 82090 5 19990930 19990630
19990930 AMERICAN GENERAL CORPORATION UNITED STATES 2340 2 19990930 19990630
19990930 ARTISAN PTNR L.P. UNITED STATES 4719 5 19990930 19990630
19990930 BANK ONE CORPORATION UNITED STATES 5955 1 19990930 19990630

As you can see prdate is formated the same way as rdate. But, sas output is different.  


This is NOT, I repeat, NOT "the original csv file"!!

This is just how R DISPLAYS the csv file.

Open the csv file with a TEXT EDITOR(!) and then copy/paste the contents into a {i} window. Or attach the csv file AS IS (or a portion of it) to a post here. Everything else is useless in conveying the csv file structure.

 

I hope I made it clearer to you what @RW9 wanted from you all along.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1814 views
  • 3 likes
  • 4 in conversation