09-24-2012 10:11 PM
Good Evening All,
I am trying to load several files, all should be the same format. If I pull the smallest one into excel with a comma delimiter I get the desired results. If I pull it directly into sas from a flat file I do not get the results I expect, there are many variables that do not load. Below is a small example from the file, headers only, below that is the code I'm using. If there is any additional information that would be helpful let me know and I'll get back to you quickly.
filename file1 '/product/users6/twr0cxp/TWC_File1_Final_delimited.txt';
infile file1 dlm=',' dsd missover firstobs=5;
There is over 1000 variables so I put everything in this format in an attempt at making it easier. In excel I went through and shortened everything to 32 characters and made everything character. I also went through and double checked that there aren't any extra 's or periods.
There is one field that I see that I can tell is incorrect but i'm not sure why it is. The field should be HUDSON but is "HUDSO
Any help will be appreciated.
09-24-2012 11:18 PM
I used ' 'n because there are over 1000 variables and not all of them are in the correct format, I used concatenate in excel to put them all into this format to save time. I'm not interested in going through each variable and didn't think this would cause any problems.
09-25-2012 07:54 AM
Just creating a variable list to be used in the input statement is not enough. You must also define if the variables are character or numeric (so a INFORMAT) and how long character variables need to be. The input statement in your code would only work if all variables are numeric.
A lazy approach you could take:
1. Concatenate all your data into a single file and save it with extension .csv
2. Use the EG import wizard (File/Import Data).
Having all data in a single file is necessary for the EG import wizard to create character variables which can hold the longest string per column in all of your files. If you import file by file then the variables will have different lengths per resulting dataset.
You can of course accept this difference and then write some code to align the lengths - but easiest is to just copy everything into a single text file.
09-25-2012 05:30 AM
I think a better (bigger) sample of the data would help including a highlighted sample record where the variables do not load. In the example, please specify which variables do not load.
I assume the firstobs=5 option in your code is what you want as this would miss out the first 4 records including the header sample you provided.
Perhaps also elaborate with the HUDSON example with the data and the code used to read it.
09-25-2012 09:33 AM
If the format is the same I am not sure what the issue is.
You just need to create the program to read one of the files then you can use for all of them by just changing the filename.
If you are having trouble creating the code why not just run the import GUI tool and recall the generated code.
Is the issue with "HUDSO... an issue with the content of a variable or the name of the variable.
If it is the content then I see two issues.
1) Without defining a length all of your variables in your example will be length $8.
2) Are you sure those are really quotes in the data and not microsoft "smart" quotes?
If the issue is the name of the variable then how long are the lines of code in your program? Perhaps the line is truncated or wrapped in the wrong place.
09-25-2012 10:39 AM
I followed this line of thought, imported one of the smaller files and copied the code. I neatened it up, removed the label but it is still not working. Attached is the code that I copied and pasted, I'm not sure why this will not work. It pulls in the correct amount of columns and rows but no data, any thoughts?
09-25-2012 10:59 AM
I hope that you are not trying to submit a spreadsheet as a program. That wouldn't work!
If you only put the code in a spreadsheet to show it to the forum, then there appears to be one thing missing. In your infile statement you say that the data is in file1, but you never assign any file to file1. You could replace "file1" with the actual path and name of the file within quotes, or use a filename statement to accomplish the same thing.
BTW. What is going on! That is 2 losses in a row. Fortunately, sort of, I also have Michigan State to root for.
09-25-2012 11:14 AM
I left out one piece of the code:
filename File1 '/product/users6/twr0cxp/TWC_File1_Final_delimited.txt';
filename File2 '/product/users6/twr0cxp/TWC_File2_Final_delimited.txt';
I plan on setting up macro variables once I get everything running correctly. I imported this file using the import wizard, that's where I got the code from.
09-25-2012 11:14 AM
I guess one thing that really throws me off is that the empty dataset has the correct amount of columns and rows, why would it have the right amount of empty rows?
09-25-2012 11:59 AM
Are you sure that the delimiter is really a '7f''x ???
That could be the problem in itself.
What kind of errors or warnings are you getting in your log?
09-25-2012 12:06 PM
There were no errors or warnings in the log. I changed the DLM to ',' and added firstobs=2 and it seems to have worked. It's hard to convey my joy via this message but I'm very happy to have this complete, now onto more important things, like reading what the Steelers are going to do during their bye week.
Thanks for the help!!!!
09-26-2012 01:10 PM
I thought this issue was resolved but it is not, it seems like there are several hundred variables that are loading correctly, at some point they stop loading. The naming conventions in the file I'm getting are no good, when sas shortens them a lot of them end up having the same name. Could this be causing an issue?
I'm using almost exactly the same code as when I use the import wizard but I'm not getting the same results. I change the infile to a correct file name, the delimiter and add firstobs=. I'm baffled and not even really sure what questions to ask, do you have any ideas?
09-26-2012 01:39 PM
We would have to see the code you are actually submitting. The variables, as shown in the spreadsheet, all appear to be 32 characters or less and there doesn't appear to be any duplicates. If those are the ones you are using, variable name length shouldn't be a problem.
An easy test of that would be to identify them as var1-var1120 on your input statement.