Desktop productivity for business analysts and programmers

Creating a dataset by reading files with different variables

Reply
New Contributor
Posts: 3

Creating a dataset by reading files with different variables

I have to create a macro where txt or fixed width files from a folder has to read in with sas and a universal dataset is to be created.

But the first half has 46 variable and 2nd half has 48 variables. How to put into one dataset?

Super User
Posts: 10,530

Re: Creating a dataset by reading files with different variables

Read both with the same datastep, and use the truncover option with the infile. Variables not present in the first half will be set to missing. 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
New Contributor
Posts: 3

Re: Creating a dataset by reading files with different variables

Posted in reply to KurtBremser

It did not work. Data of length of n-1 field appears in Nth field.

Regular Contributor
Posts: 226

Re: Creating a dataset by reading files with different variables

And you don't need a macro to import all files, you can use * in the infile-statement to read all files in a directory.

Super User
Posts: 23,937

Re: Creating a dataset by reading files with different variables

[ Edited ]

That depends, are you trying to make the file wide (add columns to make a really wide file) or append them together?

 

For a fixed width file you have no choice but to write a data step code to read it in. 

For other files you can try with PROC IMPORT, though you’re likely to run into compatibility issues with types, ie read a file as character when its numeric in another and then when you try and append them you’ll run into issues. 

 

So first, figure out how many different file types you have, then how you want to combine them. If you can explain that process to us, we can better recommend a solution. 

 

If the variables are mostly the same and you’re appending to make a long file, it’s a bit easier of a process.

 


@rshrv wrote:

I have to create a macro where txt or fixed width files from a folder has to read in with sas and a universal dataset is to be created.

But the first half has 46 variable and 2nd half has 48 variables. How to put into one dataset?


 

New Contributor
Posts: 3

Re: Creating a dataset by reading files with different variables

So, I have different folders that contain more than 300 files: Some folders have tab delimited and some have fixed width.

Suppose file 1 has 

 

jane  45 4RT  M

john 23 4GJT M

 

file2:           george 23 3DFE      1J M

                   jake     54 4KL         1K M

 

I want to append File 1 and file 2 where I want to add 4th column which shows null for file 1 but 1J, 1K for file 2.

My codes looks like this:

 

data new;

infile " path* " delimiter = '09'x MISSOVER DSD;

 

informat 

name $10.

age 2.

res_num $7.

idnum $2.

gender $1.

 

format 

name $10.

age 2.

res_num $7.

idnum $2.

gender $1.

 

input

name $10.

age 2.

res_num $7.

idnum $2.

gender $1.

;

run;

 

If I run this, there's no separate column created instead the 5 column is copied and breaks further column's data.

How should I approach this?

 

Thanks

 

Super User
Posts: 10,530

Re: Creating a dataset by reading files with different variables

For delimited files, do not use informats in the input statement. You already specified them in the informat statement.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 8,261

Re: Creating a dataset by reading files with different variables

You might be able to make a program smart enough to read both file formats.

First define ALL of the variables.  Just attaching a format or an informat to variable will only DEFINE the variable as a side effect.  None of the variables in your example need to have either formats or informats attached to them. SAS knows how to read and write simple numbers and character strings.  Normally you only need to add formats and informats for date and time variables.

data want;
  length name $10 age 8 res_num $7 idnum $2 gender $1 ;

If the line is designed to be fixed width then read it using formatted input.  If it is delimited by tabs then read it using list mode. This is actually a good example of why you should be using the newer (less than 30 years old) TRUNCOVER option instead of the older less useful MISSOVER option.  When you read with list mode it makes no difference, but in formatted mode it can cause short values to be set to missing.

How can you tell which files have the fifth variable? Let's assume that the tab delimited files have the fifth variable and the fixed format files do not.

So something like this to read the lines conditionally might work.

  infile "filename" dsd dlm='09'x truncover ;
  input @;
  if index(_infile_,'09'x) then input name age res_num  idnum gender ;
  else input name $10. age 2. res_num $7. gender $1. ;

 

 

 

 

Ask a Question
Discussion stats
  • 7 replies
  • 116 views
  • 0 likes
  • 5 in conversation