BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
krishmar1
Fluorite | Level 6

Hi,

I have a text file with the header information (all the variable names in one column) and another text file with the data (all the observations).

How do I create a SAS data set from these two files?

Please help!

Thanks,

Krishna.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

One way is to read the data using generic names like VAR1-VAR100 and then use the names from the other file to generate rename statements.

Another is read in the names and use them to generate the code.

Let's take the simple case of 10 columns of data where every column is a number.

data names;

  infile 'two.txt' truncover ;

  input name $32.;

  oldname = cats('var',_n_);

run;

proc sql noprint ;

  select catx('=',oldname,name)

    into :rename separated by ' '

from names

;

quit;

data numbers ;

  infile 'two.txt' dsd dlm=',' truncover ;

  input var1-var10;

  rename &rename ;

run;

View solution in original post

12 REPLIES 12
Tom
Super User Tom
Super User

One way is to read the data using generic names like VAR1-VAR100 and then use the names from the other file to generate rename statements.

Another is read in the names and use them to generate the code.

Let's take the simple case of 10 columns of data where every column is a number.

data names;

  infile 'two.txt' truncover ;

  input name $32.;

  oldname = cats('var',_n_);

run;

proc sql noprint ;

  select catx('=',oldname,name)

    into :rename separated by ' '

from names

;

quit;

data numbers ;

  infile 'two.txt' dsd dlm=',' truncover ;

  input var1-var10;

  rename &rename ;

run;

krishmar1
Fluorite | Level 6

Wow, it worked!

Thank you very much, Tom. You rock!

Regards,

Krishna.

Haikuo
Onyx | Level 15

As a more direct approach, you could skip the 'rename' as to (stem code was stolen shamelessly from Tom's code):

data _null_;

infile 'varname.txt' truncover end=last;

length temp $ 2000;

retain temp;

input name $32.;

temp = catx(' ',temp,name);

if last then call symputx('name',temp);

run;

data numbers ;

infile 'data.txt' dsd dlm=',' truncover ;

input &name;

run;

Haikuo

Scott_Mitchell
Quartz | Level 8

Hi Hai.kuo,

How does this code account for character variables?

I can see how it works for numeric, but not character.

Regards,

Scott

Haikuo
Onyx | Level 15

Scott,

I would try this:

data numbers ;

infile 'data.txt' dsd dlm=',' truncover ;

input (&name) (:$200.);

run;

So you want to set a large enough length to cover the maximum length of any possible variables. But again, this would be all Char. if mixed with num and char, then obviously more information will be needed.

Haikuo

krishmar1
Fluorite | Level 6

Thank you, Hai.Kuo.

I have a follow-up question: how do I read column input data without a header from a text file into a SAS data set?

Regards,

Krishna.

Tom
Super User Tom
Super User

You can let PROC IMPORT read the file for you.  It has an option to tell not to look for a header line.

proc import datafile="C:\My Documents\myfiles\delimiter.txt"

  dbms=dlm

  out=mydata

  replace;

  delimiter=',';

  getnames=NO;

run;

You could use this instead of the data step in my original answer and then use the trick to generate a RENAME statement to change the variable names.

Watch out though as PROC IMPORT will have to guess at what type of data is in each column.  If you already know the data types then write your own data step so you can control the types yourself.  Personally for your example if the number of columns is small (<30 perhaps) I would just pull the list of names into a program editor and use it to create the data step.  For example you could use the list of names in a LENGTH statement and assign the length (numeric variables should always use length of 8).  If you have any formatted data like DATE or TIME variables then also add an INFORMAT and FORMAT statement for them.  Then in the actual INPUT statement you can either list all the variables again or just use variable list using double hyphen so that you only need to retype the first and last variable name.

data want ;

   infile 'mydata.txt' dsd trucnover;

length

  NAME $20

  AGE 8

  SEX $1

;

input NAME -- AGE ;

run;

krishmar1
Fluorite | Level 6

Thank you, Tom, for your quick response.

My datafile.txt has about 60000 columns and does not have any delimiter (column input data). Does PROC IMPORT work on data without a delimiter?

Thanks,

Krishna.

wamorgao
Calcite | Level 5

Hi Krishna, you need to declare your delimiter in this line.

"delimiter=',';"  ,in this case the delimiter is ( , ). At the end of each column, you need a comma in your text file.

proc import datafile="C:\My Documents\myfiles\delimiter.txt"

  dbms=dlm

  out=mydata

  replace;

  delimiter=',';

  getnames=NO;

run;

In case you don´t have any delimiter, you need to declare the beginning and the end of each column or your size.

data want ;

   infile 'mydata.txt' dsd trucnover;

length

  NAME $20  <-- This number represents the number of columns that your variable has.

  AGE 8      

  SEX $1

;

input NAME -- AGE ;

run;

Is that correct Tom ?

Thanks,

Wagner.

krishmar1
Fluorite | Level 6

Thank you for your response, Wagner. I still have an issue.

All the values in the data set are 1s , 0s or blanks, except for the ID column. I tried the following ways but the data are not getting read correctly:

(1)

LENGTH ID $10

         VAR1 $1

         VAR2 $1;

INPUT ID -- VAR2;

(2)

LENGTH ID $10

         VAR1 $1

         VAR2 $1;

INPUT @1 ID $

           @11 VAR1 $

           @12 VAR2 $;

(3)

LENGTH ID $10

         VAR1 3

         VAR2 3;

INPUT @1 ID $

           @11 VAR1

           @12 VAR2 ;

Could you tell me which length and input statements work best for data with 1s, 0s and blanks?

Thanks!

Krishna.

krishmar1
Fluorite | Level 6

Never mind.

I figured this out, finally!

Thank you, all.

Regards,

Krishna.

Tom
Super User Tom
Super User

So if the data is in fixed columns then that is also simple with SAS.

You could tell SAS to read from specific columns (input ID $ 1-10) or tell it to use a specific format (input ID $10.).

I would use formatted mode so that I did not need to type all of the column numbers.

If you input your non-id variables as numeric then it will treat '1' as the number one , '0' as the number zero, and ' ' as a missing value.

So if you have a 10 character ID value and then 20 one character variables then you could write:

data want ;

  infile 'myfile.txt' truncover ;

  input id $10. (var1-var20) (1.) ;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 7152 views
  • 7 likes
  • 5 in conversation