03-12-2012 01:25 PM
When reading in the txt / csv file, we have to use input and length to specify the name of var, length, char or numeric etc.
Is there a process to create a sas data set dynamically without using input statement?
Often times using input statement is pain, slight change in the data, sas data set can not be created.
The way I can think of is the to read the first row and break it using ',' as delimiter. This way we get the name of variables.
But how do I set the length, char or numeric, those two things?
03-12-2012 02:05 PM
I agree with Tom! But, since you ask, yes, you CAN write code that does all of that. Plus you give me an opportunity to plug a paper that I will be presenting at this year's SGF. Take a look at the code and paper at:
It uses a datastep to accomplish those tasks for data that is in one's clipboard. It wouldn't take much to modify the code to work on data that exists in a file.
03-12-2012 03:37 PM
Really appreciate the code Art297, I have to study your code little bit.
If I just want to read in csv/txt, what parameters do I need to modify?
03-12-2012 04:25 PM
Depending on what your "slight change in the data, sas data set can not be created" may be you may still have problems. Changed variable names being the worst followed by the guessing routines in PROC IMPORT having a variable as numeric in one set and text in another and then the different lengths of string variables.
When I worked with a client that couldn't provide data in the same format consistently they got hit with high code maintenance charges. This involved things like order of variables, changing names and changing formats for date fields. If this is inside your own organization some pressure as to documenting standards and using them may be in order.
If the only issue you are actually seeing is the length of text fields I suggest either making noticeably longer text length assignments or more characters in the informat instead of "just enough". I often use PROC IMPORT to generate the base code to read one file and then examine the informats, variable names and formats assigned as well as the input statement. I often add 10 to 50% to the length of text variables and specify use of my variable names.
Proc import has difficultly using long strings as variable names. If the first 32 characters repeat so you may get variables with names like 'THIS_WAS_A_REALLY___LONG___COLUMN_HEADER', 'THIS_WAS_A_REALLY___LONG___COLUMN_HEADE1' 'THIS_WAS_A_REALLY___LONG___COLUMN_HEADE3'
03-12-2012 04:27 PM
If your input file is tab delimited, the easiest way to use the code would not be to modify it but, rather, simply first read the file and write it out to your system's clipboard and then run the paper's code.
e.g., I just wrote a tab-delimited file called "c:\art\testfile.txt". Before running the code described in the paper, I ran:
filename clippy clipbrd;
infile "c:\art\testfile.txt" lrecl=32767;
filename clippy clear;
/*then I ran the code shown in the paper */
03-12-2012 08:22 PM
Hello Art297, would you shed some light on those nested scans? arrays and %filarray?
Your code is so powerful, which is beyond my comprehension, sorry about my foolishness.
03-12-2012 10:47 PM
Rick, If you look at the powerpoint you'll see that the code was designed to do more than the typical proc import.
It was designed to import data from normal files as well as files that need to be transposed and/or are from forms rather than tables.
One of the macros was designed to transpose a file, and the other was to set up SAS processing of the macro variables to make the program easier to use.
If you review the code, carefully, you will discover that it is trying to accomplish the very things you asked about in your original post.
I'd be glad to answer any questions you have, but it would take an entire book to fully explan all of the code.
03-12-2012 06:28 PM
The INPUT statement is actually extremely powerful and flexible and I find that it makes SAS a very useful tool for exploring unknown file formats.
If your file is delimited then you can quickly read it in a check the values.
For example if the number of variables changes (but you know they should all be numeric) then you can just read in some overly large number than analyze the dataset to see where they stop.
infile 'myfile' dsd truncover firstobs=2;
input x1-x100 ;
proc means data=sample; run;
If the file is totally unknown then read in a series of character strings and look at the results.
infile 'myfile' dsd truncover firstobs=2;
length x1-x100 $200;