Concatenate characteristic and numeric variables for multiple files and numerous variables

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Concatenate characteristic and numeric variables for multiple files and numerous variables

Hi,

 

Unfortunately I have a rather non user friendly data structure. It consists of 200 txt files (dlm = ',') that have to be concatenated. Each files has +/- 2 observations and 453 variables. I have written some code to read the 200 data files like this:

 

%symdel start eind; 
%let start=1;
%let eind=9;
%macro loop_attr_2017_10;
%let i=&start;
%do %while (&i <= &eind );
proc import datafile="$FSROOT/.../attributes.long.55769e3d9a318c4312000009.csv/part-0000&i."
     out=attr_2017_&i.
     dbms=dlm
     replace;
DELIMITER = ','; getnames=yes; GUESSINGROWS=2147483647; run; %let i=%eval(&i+1); %end; %mend; %loop_attr_2017_10;

After I have read the data, I want to concatenate the 200 files (attr_2017_0 -> attr_2017_199) to one file (attributes_2017).

data attributes_2017;
	set ATTR_2017_0;
run;
%let conBegin=1;
%let conEind=199;
%macro concet_attr_2017;
%let i = &conBegin;
%do %while (&i <= &conEind);
data attributes_2017;
	set attributes_2017 attr_2017_&i.;
run;
%let i=%eval(&i+1);
%end;
%mend;
%concet_attr_2017;

 

But this is where the troubles arise. For some files , some values of the 453 variables are missing for all the subjects in that file. So SAS will read that variable by default as a character. But, when in the next file the value is not missing and it is a numeric variable, it becomes impossible to concatenate the two files.

 

ERROR: Variable activity_first_time_value has been defined as both character and numeric.

For example 'activity_first_time_value' in the file 'ATTR_2017_3' is characteristic, while in 'ATTR_2017_4' it is numeric.

c1.PNGc2.PNG

 

Since I have 453 variables and 200 files to concatenate, it is too much work to account for it individually.

So my question is; is there a way the program this concatenate is such a way that whenever SAS  detects in the  next file the variables as numeric, the variable in the previous file is also changed into numeric? Or another work around?

 

Thanks in advance!

Kind Regards,

Leonard

 

PS: I also consulted this page, but as mentioned before, I believe there a too much variables to account for it manually.

 

 


Accepted Solutions
Solution
Thursday
Super User
Super User
Posts: 8,086

Re: Concatenate characteristic and numeric variables for multiple files and numerous variables

[ Edited ]
Posted in reply to Leonard32

Read all of the files in one data step and avoid the problems caused by letting SAS guess at how your variables are defined.

If it is too hard for you make decisions about whether column 212 should be read as numeric or character then why not read them all as character?  You can use wildcard in the filename and then test when you start an new file so you can skip the header rows.

data attributes_2017;
  length var1-var453 $50 ;
  length filename $200 ;
  infile "$FSROOT/.../attributes.long.55769e3d9a318c4312000009.csv/part-0000*" 
       dsd dlm=',' filename=filename truncover 
  ;
  input @ ;
  if filename ne lag(filename) then input ;
  input var1-var453 ;
run;

 You can make it better by using the real variable names and types/lengths in the LENGTH statement. For the INPUT statement just use positional variable list.  input firstvar -- lastvar ;

View solution in original post


All Replies
Super User
Posts: 10,217

Re: Concatenate characteristic and numeric variables for multiple files and numerous variables

Posted in reply to Leonard32

Since all those files should have the same structure anyway, you must not use proc import, but a data step, which guarantees consistent structure.

 

Proc import makes guesses about the structure, and therefore the results cannot be consistent.

 

Take the data step from one of your proc import runs from the log and adapt it to your need, then apply it to all input files.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Valued Guide
Posts: 564

Re: Concatenate characteristic and numeric variables for multiple files and numerous variables

Posted in reply to Leonard32

As @KurtBremser already said: you have to use a data-step to import the files to avoid the problems you have. This means, of course, that you have to name all variables including type and length. Seems to be an annoying job, but it is by far easier, than solving the conflicts created by proc import.

 

If you have the specification in a text-file, you could import that one and use sas to create the importing data-step.

Super User
Posts: 23,700

Re: Concatenate characteristic and numeric variables for multiple files and numerous variables

Posted in reply to Leonard32

Since I have 453 variables and 200 files to concatenate, it is too much work to account for it individually.

So my question is; is there a way the program this concatenate is such a way that whenever SAS  detects in the  next file the variables as numeric, the variable in the previous file is also changed into numeric? Or another work around?

 

 

 


 

You're going to have to write the data step but just once. Once you have it working you can use a wild card or macro approach. After your program for one file is written, the number of files or observations becomes irrelevant. An efficient way is to use PROC IMPORT, copy the code from the log and modify it to be what you need. 

 

Do you have any sort of naming conventions for your files or are they all in one folder? This type of approach may work. 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 

 

Solution
Thursday
Super User
Super User
Posts: 8,086

Re: Concatenate characteristic and numeric variables for multiple files and numerous variables

[ Edited ]
Posted in reply to Leonard32

Read all of the files in one data step and avoid the problems caused by letting SAS guess at how your variables are defined.

If it is too hard for you make decisions about whether column 212 should be read as numeric or character then why not read them all as character?  You can use wildcard in the filename and then test when you start an new file so you can skip the header rows.

data attributes_2017;
  length var1-var453 $50 ;
  length filename $200 ;
  infile "$FSROOT/.../attributes.long.55769e3d9a318c4312000009.csv/part-0000*" 
       dsd dlm=',' filename=filename truncover 
  ;
  input @ ;
  if filename ne lag(filename) then input ;
  input var1-var453 ;
run;

 You can make it better by using the real variable names and types/lengths in the LENGTH statement. For the INPUT statement just use positional variable list.  input firstvar -- lastvar ;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 72 views
  • 5 likes
  • 5 in conversation