07-03-2014 09:17 AM
I have been trying to write a program that will read in multiple csv files with a similar data structure into a single data set. I know that the Filevar statement will let me do this, but the problem that I am having is with the completeness of the data. Some of the observations have blank data so I have been using a missover option to read in the data. The problem seems to be when the program gets to the end of csv file and comes across missing observations or variables. When the last few variable on the last observation has a missing record, the program does not move on to the next csv file. Is there a way to fix this? Here is my program:
length myfilename $100;
filepath = "&flpth"||flnm;
infile dummy filevar=filepath end=done missover length=reclen dlm=',' dsd firstobs=5;
do while(not done);
myfilename = filepath;
input errors $ editdoc $ Doctype $ CO1 $ Linenbr $ Blank1 $ Blank2 $
Matlnbr $ Qty $ ExtAmt $;
input Blank3 $ Ordnbr $ Ordtyp $ CO2 $
Ordline $ Soldto $ Shipto $ Itmdes $ Blank5 $ Blank6 $;
drop blank1 blank2 blank3 blank5 blank6 flnm;
And here is the information from the log:
49 data Errors;
50 length myfilename $100;
51 set all_file_info1;
52 filepath = "&flpth"||flnm;
53 infile dummy filevar=filepath end=done missover length=reclen dlm=',' dsd firstobs=5;
54 do while(not done);
55 myfilename = filepath;
56 input errors $ editdoc $ Doctype $ CO1 $ Linenbr $ Blank1 $ Blank2 $
57 Matlnbr $ Qty $ ExtAmt $;
58 input Blank3 $ Ordnbr $ Ordtyp $ CO2 $
59 Ordline $ Soldto $ Shipto $ Itmdes $ Blank5 $ Blank6 $;
60 drop blank1 blank2 blank3 blank5 blank6 flnm;
NOTE: The infile DUMMY is:
File Name=V:\Copy of backup\R47011_070114_100332.csv,
NOTE: LOST CARD.
myfilename=V:\Copy of backup\R47011_070114_100332.csv fldt=07/01/2014
flnm=R47011_070114_100332.csv filepath=V:\Copy of backup\R47011_070114_100332.csv done=1 reclen=0
errors= editdoc= Doctype= CO1= Linenbr= Blank1= Blank2= Matlnbr= Qty= ExtAmt=
Blank3= Ordnbr=752824 Ordtyp=SZ CO2=02000 Ordline= Soldto=51418594 Shipto=TARGET # Itmdes=
Blank5= Blank6= _ERROR_=1 _N_=1
Can someone help?
07-03-2014 09:26 AM
I would suggest that you have more variables on the input statement than you have sections in your input file. e.g:
input first $ second $ third $ fourth $. The file should look like:
And for any other missings the comma should be there e.g 16 variables:
07-03-2014 10:03 AM
I have removed the last two "blank" observations from the 2nd input statement since they were unnecessary.
Upon further investigation, I think I have found the issue...
I have dumped the data in to a text file so that you can see it. I am including the last 2 lines of data:
,752823,SZ,02000,,11033211,TARGET #2042/E ORDERS ONLY,SPITFIRE SC PWR CLN 1X5L RTD
It appears that in some of the files, the Itmdes (which should be the last observation) is blank and there is no comma. When the file has an itmdes, the program will move on to the next file. Is there any way to fix this? Btw... I am not sure what all the blocks at the end of the data mean. Is it possible they could be causing some issues as well?
Thanks for your help
07-03-2014 10:23 AM
Yes, they look like special characters and could cause some problems. I would do a find/replace on these in the file before using to import. Maybe someone has a good option for this on the infile, I can't think of one off the top of my head though. I would check with what generated this file as that looks odd.
Have a look at:
07-03-2014 10:40 AM
You can use the LIST statement to see what is really in your file. If there are any unprintable characters on a line then the LIST statement will also display the character codes as two digit hexadecimal numbers.
infile 'V:\Copy of backup\R47011_070114_100332.csv' lrecl=20000 ;
07-03-2014 02:37 PM
Before your DO WHILE loop I would suggest setting DONE=0; to ensure the second and subsequent files are read. That END=DONE results in the DONE variable being RETAINed.
If you have base SAS windowing environment (aka display manager) you could use the DATA STEP DEBUGGER to watch how values of these control variables change as you read through the data. pity there is no adequate substitute in EG, DI or SAS Studio
If those special characters at end of file are all the same you might get the input to ignore and treat them as blanks by adding them to the DLM infile option which had default value of blank
where ×× is the hex value you want to ignore.and 20 is hex for blank