Are you sure data1,data2,data3,data4 isn't really "data1","data2, data3", "data4" ?
I have a situation where a data field has embedded column delimiters and even carriage returns in it. This makes the original data file appear to have a variable number of fields, but it actually doesn't.
Another thing, how often does header1,header2,header3 occur? Is it simply the first observation (record) in the file? or are they intermingled.
I have another situation where the first 5 fields in a record (observation) are "headers", but there is a variable number of fields with variable formats following. One of the header fields can be, and is, used to identify the type of the record; so, we parse that field to identify the record type and then output that type of record to its own dataset.
You might be able to use either PROC IMPORT or a DATA step program to read your data.
For example, let's say that you had the following CSV file stored in the location c:\temp\wronghdr.csv and it looked like:
Note how there are only 4 "headers" or variable names on line 1, but there are sometimes 5 data items on a line. Note also, how the data starts on line 2; and how at least 2 of the rows have values for all 5 columns, but some of the rows do NOT have values for all 5 columns.
If you did use PROC IMPORT to read the above file, one possible code method would be:
proc import datafile="C:\temp\wronghdr.csv"
and the output would be:
what happens with proc import
Obs VAR1 VAR2 VAR3 VAR4 VAR5
1 alan 11 12 13 14
2 bob 21 22 . 24
3 carl . 32 33 34
4 dave 41 42 43 .
5 ed 51 52 53 54
But, what if you REALLY want the NAME column to be NAME and not VAR1 -- you can see how PROC IMPORT names the variables when you skip over line 1.
For the most control over naming variables, especially in a tricky situation like yours, DATA step program code does give you the most control.
SAS has 4 different ways to read "flat files" or "text files" into a SAS dataset -- using DATA step program code. The code methods all involve the INPUT statement and if you look in the documentation for a topic entitled, "Statements: INPUT Statement", you will see a description of how to use each type:
You can use whatever variable names you want, if you use an INPUT statement. In addition to using only one INPUT method or another, you can also mix different types of INPUT statements in the same program. If you write your own program, you can skip over line 1 (presumably the one with the headers), so it won't matter that the data has 22 headers on line 1, but 23 data fields, because in the INPUT statement, you get to name the variables/columns yourself, so if you KNOW that there are 23 data fields, you can list 23 variable names on your INPUT statement. Then if a single row does not have all 23 variables listed, you can use MISSOVER or TRUNCOVER (whichever is appropriate to your type of INPUT statement) to set the variable values to missing.
A SAS DATA step program to skip over line 1 and start looking for the first observation on line 2 in order to read the above data file and use variable names of your own choosing would be:
infile 'c:\temp\wronghdr.csv' missover dsd firstobs=2;
input name $ var1 var2 var3 var4;
proc print data=My_Col_Names;
title 'Correct column names no matter what was in line 1';
and the result of the PROC PRINT would look like (note how the variable names from the INPUT statement were used, since line 1 was skipped over -- using the FIRSTOBS=2 option):
Correct column names no matter what was in line 1
Obs name var1 var2 var3 var4
1 alan 11 12 13 14
2 bob 21 22 . 24
3 carl . 32 33 34
4 dave 41 42 43 .
5 ed 51 52 53 54
Your data may not look like what's above in the example of "wronghdr.csv", but in SAS there is usually a way to read just about ANY kind of file. You may not be able to use the EG automated wizards, but with code, you can read almost any text file into SAS format.
So, this is a perfect example of how you could get absolutely on-target help from Tech Support. They could take a look at your data file (if you sent them a sample) and make recommendations about the most appropriate form of INPUT statement to use.
However another unfortunate thing is that my .COL (SAS DATA) file is over 100Gigabytes... I can't do the above step as it's to large to use a temporary dataset.
If i had a statement such as if date eq '17507' then output;
It then does output but then the next step ignores the VAR i'm putting in as the SET statement only identifies 34 vars and cuts the rest of data out for the temp dataset.
If your data are in a SAS dataset, then the INFILE/INPUT approach is entirely WRONG and PROC IMPORT is entirely WRONG. So ignore just about everything in my previous post.
If your data are in a SAS dataset, then the SET statement is the right way to go.
One thing you can try to see what's up with your file is run a PROC CONTENTS to see the specific list of column names. Even if you have a big file, PROC CONTENTS is not reading the file directly, but only the descriptor portion of the file to produce host-specific information, data-set information (sort order, index info, etc) and a list of variables:
proc contents data=INDN.COL;
title 'what is this file';
I don't understand what you mean by this statement: "It then does output but then the next step ignores the VAR i'm putting in as the SET statement only identifies 34 vars and cuts the rest of data out for the temp dataset." What does your LIBNAME statement look like for INDN??? What are the data set attributes for INDN -- do you have a LIBNAME statement or is INDN a JCL DD statement?? The SAS companion for your operating system (z/OS or MVS/TSO) should have information on how you can distinguish the difference between a SAS library dataset on the mainframe and a flat file on the mainframe.
If your big file is a VSAM file or a DB2 table or a SoftwareAG Adabas table (or Oracle, or Sybase, or ...) and you want to read it with SAS, then you have to make sure that you are using the SAS/Access product to directly read that kind of file type or you are using the correct syntax to treat the 3rd party file as though it were a SAS dataset.
Try this experiment (which should work on the mainframe, too) to get an idea of what SAS dataset information looks like:
proc contents data=sashelp.class;
title 'SASHELP.CLASS BEFORE adding variable';
proc print data=sashelp.class noobs;
title 'Print all obs in SASHELP.CLASS';
age_in_5_yrs = age + 5;
if sex = 'M' then projected_height = height * 1.05;
else if sex = 'F' then projected_height = height * 1.025;
proc contents data=work.newdata;
title 'Compare this Contents listing with the first one to see the new variables';
proc print data=work.newdata;
title 'Compare this report to the previous report';
If you run the above code, you will see that in the "before" print of the SAS dataset, the variables are NAME, SEX, AGE, HEIGHT, WEIGHT. In the second dataset -- WORK.NEWDATA, the variables are those 5, plus AGE_IN_5_YRS and PROJECTED_HEIGHT. Because SASHELP.CLASS is already a SAS dataset, I can use it in a SET statement. The new file WORK.NEWDATA will be based on SASHELP.CLASS, but I am adding 2 new variables or columns to the new file -- AGE_IN_5_YRS is calculated based on AGE + 5 and PROJECTED_HEIGHT is set based on gender -- figuring that boys will grow more than girls over 5 years -- it really doesn't matter -- the variables are really arbitrary. What is important to note is that I can READ SASHELP.CLASS using a SET statement. INFILE/INPUT statements would be completely irrelevant in this context.
I apologize for misunderstanding the nature of your data. PROC IMPORT and INFILE/INPUT are not techniques you should use to read a SAS dataset or any file which can be read through the LIBNAME engine for a product (like Oracle, DB2, VSAM, etc). You only have 2 choices ... if your file is a "sequential" file then you have to parse and read it with INFILE/INPUT statements; if your file is a SAS dataset or a table that can be read with a SAS/Access product (and you have that SAS/Access product), then you should be able to use a SET statement. You would NOT mix SET and INFILE/INPUT statements to read a SAS dataset.
I think you should contact Tech Support for more in-depth help.
We have two major data sets.
Log Data (all data of the environments) and a SAS.DATASET
Someone in our database team runs there script which copies data from log data to the sas dataset for us to all use.
I've located his script and found out that for the variable 'Pathname' (the variable i wanted all the data for) anything after a ',' he basically deletes.
Therefore no matter WHAT I was doing i wasn't going to get my data
I've got permission and i'm now going to just use my flat files so therefore the infile/input statements I can now use and am successfully.
My raw data is TAB delimited so DLM='05'X works a charm