BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Leonard32
Fluorite | Level 6

Hi,

 

Unfortunately I have a rather non user friendly data structure. It consists of 200 txt files (dlm = ',') that have to be concatenated. Each files has +/- 2 observations and 453 variables. I have written some code to read the 200 data files like this:

 

%symdel start eind; 
%let start=1;
%let eind=9;
%macro loop_attr_2017_10;
%let i=&start;
%do %while (&i <= &eind );
proc import datafile="$FSROOT/.../attributes.long.55769e3d9a318c4312000009.csv/part-0000&i."
     out=attr_2017_&i.
     dbms=dlm
     replace;
DELIMITER = ','; getnames=yes; GUESSINGROWS=2147483647; run; %let i=%eval(&i+1); %end; %mend; %loop_attr_2017_10;

After I have read the data, I want to concatenate the 200 files (attr_2017_0 -> attr_2017_199) to one file (attributes_2017).

data attributes_2017;
	set ATTR_2017_0;
run;
%let conBegin=1;
%let conEind=199;
%macro concet_attr_2017;
%let i = &conBegin;
%do %while (&i <= &conEind);
data attributes_2017;
	set attributes_2017 attr_2017_&i.;
run;
%let i=%eval(&i+1);
%end;
%mend;
%concet_attr_2017;

 

But this is where the troubles arise. For some files , some values of the 453 variables are missing for all the subjects in that file. So SAS will read that variable by default as a character. But, when in the next file the value is not missing and it is a numeric variable, it becomes impossible to concatenate the two files.

 

ERROR: Variable activity_first_time_value has been defined as both character and numeric.

For example 'activity_first_time_value' in the file 'ATTR_2017_3' is characteristic, while in 'ATTR_2017_4' it is numeric.

c1.PNGc2.PNG

 

Since I have 453 variables and 200 files to concatenate, it is too much work to account for it individually.

So my question is; is there a way the program this concatenate is such a way that whenever SAS  detects in the  next file the variables as numeric, the variable in the previous file is also changed into numeric? Or another work around?

 

Thanks in advance!

Kind Regards,

Leonard

 

PS: I also consulted this page, but as mentioned before, I believe there a too much variables to account for it manually.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Read all of the files in one data step and avoid the problems caused by letting SAS guess at how your variables are defined.

If it is too hard for you make decisions about whether column 212 should be read as numeric or character then why not read them all as character?  You can use wildcard in the filename and then test when you start an new file so you can skip the header rows.

data attributes_2017;
  length var1-var453 $50 ;
  length filename $200 ;
  infile "$FSROOT/.../attributes.long.55769e3d9a318c4312000009.csv/part-0000*" 
       dsd dlm=',' filename=filename truncover 
  ;
  input @ ;
  if filename ne lag(filename) then input ;
  input var1-var453 ;
run;

 You can make it better by using the real variable names and types/lengths in the LENGTH statement. For the INPUT statement just use positional variable list.  input firstvar -- lastvar ;

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Since all those files should have the same structure anyway, you must not use proc import, but a data step, which guarantees consistent structure.

 

Proc import makes guesses about the structure, and therefore the results cannot be consistent.

 

Take the data step from one of your proc import runs from the log and adapt it to your need, then apply it to all input files.

andreas_lds
Jade | Level 19

As @Kurt_Bremser already said: you have to use a data-step to import the files to avoid the problems you have. This means, of course, that you have to name all variables including type and length. Seems to be an annoying job, but it is by far easier, than solving the conflicts created by proc import.

 

If you have the specification in a text-file, you could import that one and use sas to create the importing data-step.

Reeza
Super User

Since I have 453 variables and 200 files to concatenate, it is too much work to account for it individually.

So my question is; is there a way the program this concatenate is such a way that whenever SAS  detects in the  next file the variables as numeric, the variable in the previous file is also changed into numeric? Or another work around?

 

 

 


 

You're going to have to write the data step but just once. Once you have it working you can use a wild card or macro approach. After your program for one file is written, the number of files or observations becomes irrelevant. An efficient way is to use PROC IMPORT, copy the code from the log and modify it to be what you need. 

 

Do you have any sort of naming conventions for your files or are they all in one folder? This type of approach may work. 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 

 

Tom
Super User Tom
Super User

Read all of the files in one data step and avoid the problems caused by letting SAS guess at how your variables are defined.

If it is too hard for you make decisions about whether column 212 should be read as numeric or character then why not read them all as character?  You can use wildcard in the filename and then test when you start an new file so you can skip the header rows.

data attributes_2017;
  length var1-var453 $50 ;
  length filename $200 ;
  infile "$FSROOT/.../attributes.long.55769e3d9a318c4312000009.csv/part-0000*" 
       dsd dlm=',' filename=filename truncover 
  ;
  input @ ;
  if filename ne lag(filename) then input ;
  input var1-var453 ;
run;

 You can make it better by using the real variable names and types/lengths in the LENGTH statement. For the INPUT statement just use positional variable list.  input firstvar -- lastvar ;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2268 views
  • 5 likes
  • 5 in conversation