Hi,
I have a text file with the header information (all the variable names in one column) and another text file with the data (all the observations).
How do I create a SAS data set from these two files?
Please help!
Thanks,
Krishna.
One way is to read the data using generic names like VAR1-VAR100 and then use the names from the other file to generate rename statements.
Another is read in the names and use them to generate the code.
Let's take the simple case of 10 columns of data where every column is a number.
data names;
infile 'two.txt' truncover ;
input name $32.;
oldname = cats('var',_n_);
run;
proc sql noprint ;
select catx('=',oldname,name)
into :rename separated by ' '
from names
;
quit;
data numbers ;
infile 'two.txt' dsd dlm=',' truncover ;
input var1-var10;
rename &rename ;
run;
One way is to read the data using generic names like VAR1-VAR100 and then use the names from the other file to generate rename statements.
Another is read in the names and use them to generate the code.
Let's take the simple case of 10 columns of data where every column is a number.
data names;
infile 'two.txt' truncover ;
input name $32.;
oldname = cats('var',_n_);
run;
proc sql noprint ;
select catx('=',oldname,name)
into :rename separated by ' '
from names
;
quit;
data numbers ;
infile 'two.txt' dsd dlm=',' truncover ;
input var1-var10;
rename &rename ;
run;
Wow, it worked!
Thank you very much, Tom. You rock!
Regards,
Krishna.
As a more direct approach, you could skip the 'rename' as to (stem code was stolen shamelessly from Tom's code):
data _null_;
infile 'varname.txt' truncover end=last;
length temp $ 2000;
retain temp;
input name $32.;
temp = catx(' ',temp,name);
if last then call symputx('name',temp);
run;
data numbers ;
infile 'data.txt' dsd dlm=',' truncover ;
input &name;
run;
Haikuo
Hi Hai.kuo,
How does this code account for character variables?
I can see how it works for numeric, but not character.
Regards,
Scott
Scott,
I would try this:
data numbers ;
infile 'data.txt' dsd dlm=',' truncover ;
input (&name) (:$200.);
run;
So you want to set a large enough length to cover the maximum length of any possible variables. But again, this would be all Char. if mixed with num and char, then obviously more information will be needed.
Haikuo
Thank you, Hai.Kuo.
I have a follow-up question: how do I read column input data without a header from a text file into a SAS data set?
Regards,
Krishna.
You can let PROC IMPORT read the file for you. It has an option to tell not to look for a header line.
proc import datafile="C:\My Documents\myfiles\delimiter.txt"
dbms=dlm
out=mydata
replace;
delimiter=',';
getnames=NO;
run;
You could use this instead of the data step in my original answer and then use the trick to generate a RENAME statement to change the variable names.
Watch out though as PROC IMPORT will have to guess at what type of data is in each column. If you already know the data types then write your own data step so you can control the types yourself. Personally for your example if the number of columns is small (<30 perhaps) I would just pull the list of names into a program editor and use it to create the data step. For example you could use the list of names in a LENGTH statement and assign the length (numeric variables should always use length of 8). If you have any formatted data like DATE or TIME variables then also add an INFORMAT and FORMAT statement for them. Then in the actual INPUT statement you can either list all the variables again or just use variable list using double hyphen so that you only need to retype the first and last variable name.
data want ;
infile 'mydata.txt' dsd trucnover;
length
NAME $20
AGE 8
SEX $1
;
input NAME -- AGE ;
run;
Thank you, Tom, for your quick response.
My datafile.txt has about 60000 columns and does not have any delimiter (column input data). Does PROC IMPORT work on data without a delimiter?
Thanks,
Krishna.
Hi Krishna, you need to declare your delimiter in this line.
"delimiter=',';" ,in this case the delimiter is ( , ). At the end of each column, you need a comma in your text file.
proc import datafile="C:\My Documents\myfiles\delimiter.txt"
dbms=dlm
out=mydata
replace;
delimiter=',';
getnames=NO;
run;
In case you don´t have any delimiter, you need to declare the beginning and the end of each column or your size.
data want ;
infile 'mydata.txt' dsd trucnover;
length
NAME $20 <-- This number represents the number of columns that your variable has.
AGE 8
SEX $1
;
input NAME -- AGE ;
run;
Is that correct Tom ?
Thanks,
Wagner.
Thank you for your response, Wagner. I still have an issue.
All the values in the data set are 1s , 0s or blanks, except for the ID column. I tried the following ways but the data are not getting read correctly:
(1)
LENGTH ID $10
VAR1 $1
VAR2 $1;
INPUT ID -- VAR2;
(2)
LENGTH ID $10
VAR1 $1
VAR2 $1;
INPUT @1 ID $
@11 VAR1 $
@12 VAR2 $;
(3)
LENGTH ID $10
VAR1 3
VAR2 3;
INPUT @1 ID $
@11 VAR1
@12 VAR2 ;
Could you tell me which length and input statements work best for data with 1s, 0s and blanks?
Thanks!
Krishna.
Never mind.
I figured this out, finally!
Thank you, all.
Regards,
Krishna.
So if the data is in fixed columns then that is also simple with SAS.
You could tell SAS to read from specific columns (input ID $ 1-10) or tell it to use a specific format (input ID $10.).
I would use formatted mode so that I did not need to type all of the column numbers.
If you input your non-id variables as numeric then it will treat '1' as the number one , '0' as the number zero, and ' ' as a missing value.
So if you have a 10 character ID value and then 20 one character variables then you could write:
data want ;
infile 'myfile.txt' truncover ;
input id $10. (var1-var20) (1.) ;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.