DATA Step, Macro, Functions and more

Importing and Processing Tables that May or May not have Certain Fields

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Importing and Processing Tables that May or May not have Certain Fields

I am trying to create a macro that will import an input from one type of analysis and transform it into an input for another type of analysis. I am experiencing two problems with this:

 

1) When importing the original input table I would like to specify the type of some of the fields but not all.

 

This is because I have one field that is mostly number with occasional characters and so I want it imported as character to account for those characters and allow other operations on it for characters. If I use a proc import it usually classifies it as a number, but for the purpose of processing it afterwards and because it may sometimes have a character in it I would like to specify it as a character. 

 

But the inputs I will be importing have some fields which will always be in them and some fields which will only be there occasionally. So it would be good to be able to specify the type for the fields that will always be there and then have it guess the type of the fields that are not always there.

 

2) Once I have imported this table it can have one of several type of deductible (as well as other types of fields which won't always be there), this deductible will have its own unique field, which for the other analysis needs to be mapped to another field name. 

 

So for this I would like a way of testing if a field is present in a table and if it is then processing that field and putting it into a new field and otherwise leaving the field for the new analysis blank.

 

Thanks

Super User
Super User
Posts: 7,997

Re: Importing and Processing Tables that May or May not have Certain Fields

Right lets step back a bit here as your post is confusing.  First off what do you start with, as you say you import a file, what type of file, and how are you importing it?  This is a very important step as it defines everything from there on in.  If you are using Excel - this is a really bad data forma.  If you are using proc import which is a guessing procedure then again your not going to get what you expect or not get it consistently as it guesses.  If you are using proc import with Excel then you will get garbage half the time.  So sort this step out first, if its not Excel then write a datastep import based on the spec for the data, this fixes the import to always import exactly the same and to your specifications.  If its Excel then you will likely have to do a layer of checking and processing to get it into a useable format.

 

Once you have your data fixed down, then write the code to do your process once, as you are writing a macor or general code, you will want to write a user document file or functional design specifcation to show inputs/outputs processing, testing and such like - all of this is far more important than the code.  These documents will pretty much write the program for you, and macrotising that single run should be pretty simple from that point in.  Oh, and you can check variables by querying the sashelp.vcolumns table.

Frequent Contributor
Posts: 119

Re: Importing and Processing Tables that May or May not have Certain Fields

[ Edited ]

Ok, I have had this exact same problem in the past where the table varied from client to client, server to server. Same basic source but table would vary all over the place.

 

The way I handled it was to create a dynamic read. Read in enough info to determine file layout. Write the SAS code back out using a data step (pseudo coming), then %include it back:

 

data  _null_ ;

    file 'c:\temp\test.sas' ;

    put   'data test.MyData' 

          / '     infile ....'

          / '     input ....

          ;

    ....put field definitions here....

          ;

run;

 

%include 'c:\temp\test.sas' ;

 

HTH

 

EDIT: similar to what RWS said, I kept all of my possible field definitions in a table (dataset) and used that as a metadata store to dynamically determine what the attrib and inputs would be.

Occasional Contributor
Posts: 15

Re: Importing and Processing Tables that May or May not have Certain Fields

Thanks for your help RWS and AlanC !

 

Just to answer your questions RWS I am trying to import tab delimited text files.

 

Thanks to RWS I have managed to be able to check if columns exists and solve the 2nd part of my question so thanks for that

 

I am still stuck on importing currently based on my understanding of the the two answers:

 

1) I use a proc import with only a few rows, to get the field names.

2) I use a proc contents keeping just the names column, to get a list of column names to be imported

3) I merge my list of column names with an existing table the has the name and format of most potential field names I could get (master list of column names).

 

I am still unable to work out how to use this table of field names and types to import data dynamically as AlanC says?

 

Another thing I thought of is there may be fields in the input files that are not on my master list of column names. I am not really interested in these fields and don't mind if they are not imported.

 

Thanks

Frequent Contributor
Posts: 119

Re: Importing and Processing Tables that May or May not have Certain Fields

Pseudo-code to help clear it up (I don't have SAS available right now so
using UltraEdit. Double-check syntax but this is the general idea):

Data _null_ ;

set myColumnNames end=eof;

file 'c:\temp\GetData.sas' ;

if _n_ = 1 then

do ;

put 'data test;'

/ ' infile 'rawData.txt dlm="09"x '

/ ' input '

;

end;

put columnName ;

if eof then

do;

put ';'

/ 'run;'

;

end;

run;



%include 'c:\temp\GetData.sas' ;
Occasional Contributor
Posts: 15

Re: Importing and Processing Tables that May or May not have Certain Fields

Hey AlanC

 

Thanks again for your help, still having a little trouble working out what of my stuff needs to slot in where, below is my adaptation of your code (hopefully I haven't made too big a mess of it).

 

 

data _null_;
	set A4_fieldsimported end=eof;
	file "drive:\Path\of\raw\data\file\imported\rawdatafile.txt";
	if _n_ = 1 then do;
		put 'Data Test';
		/ ' infile 'drive:\Path\of\raw\data\file\imported\rawdatafile.txt dlm="09"x '

		/ ' input '

		;

	end;

	put name;
	if eof then do;
	put ';'
	/'run,';
	end;
run;

 

 

There are two tables in this the table A4_fieldsimported has two fields name, a list of all fields in the raw data table I am trying to import and format, the SAS format that field should be.The next table being imported from drive:\Path\of\raw\data\file\imported\rawdatafile.txt is the raw data file I am trying to import with its field names in A4_fieldsimported.

 

Thanks

Frequent Contributor
Posts: 119

Re: Importing and Processing Tables that May or May not have Certain Fields

1. Export sashelp.class to a tab-delimited text file: sashelpClass.txt

2. Sample code to do illustrate what I suggested:

 

filename test 'c:\temp\test.sas' ;

%let rawData =c:\temp\sashelpClass.txt ;

 

data myCols;

   input colName $;

datalines4;

name

sex

age

;;;;

run;

 

data _null_;

   file test ;

   set myCols end=eof;

   if _n_ =1 then

      do;

           put   'data mySubset; '

                 / "   infile '&rawData.' dlm='09'x ;"

             / 'input'

                   ;

      end;

   put colName '$';

   if eof then

      do ;

         put  ';'

             / 'run;'

                   ;

      end;

run;

 

%include test ;

 

Super User
Super User
Posts: 7,080

Re: Importing and Processing Tables that May or May not have Certain Fields

[ Edited ]

chris_e wrote:

Thanks for your help RWS and AlanC !

 

Just to answer your questions RWS I am trying to import tab delimited text files.

 

Thanks to RWS I have managed to be able to check if columns exists and solve the 2nd part of my question so thanks for that

 

I am still stuck on importing currently based on my understanding of the the two answers:

 

1) I use a proc import with only a few rows, to get the field names.

2) I use a proc contents keeping just the names column, to get a list of column names to be imported

3) I merge my list of column names with an existing table the has the name and format of most potential field names I could get (master list of column names).

 

I am still unable to work out how to use this table of field names and types to import data dynamically as AlanC says?

 

Another thing I thought of is there may be fields in the input files that are not on my master list of column names. I am not really interested in these fields and don't mind if they are not imported.

 

Thanks


If you have tab delimited files with a header row then you can read the header row to determine what fields this instance has.

Then you can write a data step that creates a constant structure, but only populates the fields that exist in the current file.

Your later process can just deal with the missng data or perhaps look at the list of fields your particular file had and adjust based on to process based on whether a particullar field was present or not.

%let newfile=myfile.txt ;
data fields ;
  infile "&newfile" obs=1 dsd dlm='09'x ;
  length field $32 ;
  input field @@ ;
run;
proc sql noprint ;
  select field into :fieldlist separated by ' ' 
  from fields 
  ;
quit;
data new_data ;
  length field1 $10 field2 8 ..... ;
  infile "&newfile" firstobs=2 dsd dlm='09'x truncover ;
  input &fieldlist;
run;

Fixed typo in first data step.  2017-09-18:21:32 EDT

Occasional Contributor
Posts: 15

Re: Importing and Processing Tables that May or May not have Certain Fields

Thanks for your answer Tom, it looks good but contains some SAS I haven't used before so I was having some trouble implementing it. Below is my implementation of the first part:

 

 

%let newfile=drive:\Path\of\raw\data\file\imported\rawdatafile.txt;

data fields;
 input "&newfile" obs=1 dsd dlm='09'x ;
 length field $32;
 input field @@;
run;

 

 

The file I want to import is drive:\Path\of\raw\data\file\imported\rawdatafile.txt 

 

Below is the log

 

 

1
2     %let newfile=drive:\Path\of\raw\data\file\imported\rawdatafile.txt;
3
4     data fields ;
5       input "&newfile" obs=1 dsd dlm='09'x ;
                                       -----
                                       79
NOTE: Line generated by the macro variable "NEWFILE".
1      "drive:\Path\of\raw\data\file\imported\rawdatafile.txt
       ----------------------------------------------------------------------------------
       79
ERROR 79-322: Expecting a @.

6       length field $32 ;
7       input field @@ ;
8     run;

ERROR: No DATALINES or INFILE statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FIELDS may be incomplete.  When this step was stopped there were 0
         observations and 4 variables.

 

I am unsure of what the @ does.

 

Thanks

Super User
Super User
Posts: 7,080

Re: Importing and Processing Tables that May or May not have Certain Fields

That is because the first statement should be INFILE and not INPUT. My mistake.

Super User
Super User
Posts: 7,080

Re: Importing and Processing Tables that May or May not have Certain Fields

[ Edited ]

Don't use INFORMAT as if it was intended to define your variables. It will only define the variables as a side effect of it being the first place the variable is mentioned. Instead explicitly define them using LENGTH or ATTRIB statement.

 

For example in your data step to read your metadata file into a dataset.

 

data A3_ANAFIELDS ;
  infile 'C:\StandarMapping\ANA_Fields.txt' dsd dlm='09'x truncover firstobs=2 ;
  length Name $11 Format $10 Format2 $2 ;
  input Name Format Format2 ;
run;

Correspondingly in your metadata keep the information about the definition (type/length), informat and format as separate fields. So you might change your metadata to instead look more like this:

data A3_ANAFIELDS ;
  infile 'C:\StandarMapping\ANA_Fields.txt' dsd dlm='09'x truncover firstobs=2 ;
  length Order 8 Name $32 Length $6 Format Informat $40 Label $256 ;
  input order -- label ;
run;

Order will be order that you want the variable to appear in the resulting dataset (not the order that it appears in the tab delimited file you are trying to read.) Length will have value you can use in a LENGTH statement. 

So for example the set of values to describe the table above might look like (using | instead of tab as delimiter so you can see them here). Notice how none of these variables need either an informat or a format since none of them are dates, times or have specially display rules that need to be applied to them like you might want with currency values.

ORDER|NAME|LENGTH|FORMAT|INFORMAT|LABEL
1|Order|8|||Order in output table
2|Name|$32|||Variable Name
3|Length|$6|||Length specification string
4|Format|$40|||Format (optional)
5|Informat|$40|||Informat (optional)
6|Label|$256|||Variable Label

Let's say you have metadata that says your files can have 4 variables VAR1 to VAR4.

ORDER|NAME|LENGTH|FORMAT|INFORMAT|LABEL
1|Var1|$10|||
2|Var2|8|yymmdd10.|yymmdd.|
3|Var3|8|||
4|Var4|$20|||

And your current file to be read has these fields.

VAR2|VAR3|VAR1
2017-01-01|1|First Month
2017-02-01|2|Last Month

Then your code generation step could be setup to generate code that looks like this:

data mySubset;
  infile '&File_name' dsd dlm='09'x firstobs=2 ;
  length var1 $10 var2 8 var3 $20 var4 8 ;
  informat var2 yymmdd. ;
  format var2 yymmdd10. ;
  input var2 var3 var1 ;
run;

And so create a output table that looks like this:

VAR1|VAR2|VAR3|VAR4
First Month|2017-01-01|1|
Last Month|2017-02-01|2|

 

Occasional Contributor
Posts: 15

Re: Importing and Processing Tables that May or May not have Certain Fields

Thanks updated it with length. Although my sample code seems to have dissappeared. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 184 views
  • 4 likes
  • 4 in conversation