BookmarkSubscribeRSS Feed
mmm7
Calcite | Level 5

I'm reading in multiple files that need to be stacked into a single dataset. In some files, column A contains all numeric values. In other files,  an * appears when numeric values are missing, defining the column type as character. I have a data step that sets * to missing and then creates a new numeric column using the input function. Is there a way to reference column type as part of a do statement where the data step executes only when the column type is character? Alternatively, is there a way to suppress "Invalid numeric data, '*'" message from the log?

 

data one;                     
   input ID A;     
   datalines;          
2477 195 
2431 220 
2456 173 
2412 135 
; 
run;
data two;                     
   input ID A $;     
   datalines;          
3777  *  
9731  *  
7756  110
2712  * 
;  
run;
data stacked;
	set _null_;
run;
%macro make_num(set);
	data &set;
		set &set;
		if A = "*" then A = "";
		A_Num = input(A, best12.);
	run;
	data stacked;
		set stacked &set (drop = A);
	run;
%mend;
%make_num(one);
%make_num(two);


  

4 REPLIES 4
jklaverstijn
Rhodochrosite | Level 12

In my 35 years of datastep programming I never considered an infile option for alternative missing characters. But now I see a need for that. But there is none.

 

Why would you have two sets of code? You can use your code for dataset TWO for ONE as well. Much easier. I gues the computing overhead would be minimal and not outweigh the ease of having one piece of code. and no need for going over your data twice.

 

data one_two;
	length id 8 a_char $12 a 8;
	input id a_char $;
	drop a_char;
	if a_char = '*' then a = .;
	else a = input(a_char, best12.);
	datalines;
...
run;      
s_lassen
Meteorite | Level 14

I would use an informat to take care of that, e.g.:

proc format;
  invalue asterix
    '*'=.
    other=best12.;
run;
data one; 
   informat a asterix.;                    
   input ID A;     
   datalines;          
2477 195 
2431 220 
2456 173 
2412 135 
; 
run;
data two;                     
   informat a asterix.;  
   input ID A ;     
   datalines;          
3777  *  
9731  *  
7756  110
2712  * 
;  
run;

You wouldn't need more than one datastep, though. Just assign the infiles you want to read with a single filename, e.g.

filename data_in ('c:\indata\*.dat','d:\test.dat');

or just use wildcards, as shown, if that is enough. And then just this:

data one; 
   informat a asterix.; 
   infile data_in;                   
   input ID A;     
run;
jklaverstijn
Rhodochrosite | Level 12

Yeah I 100% support the use of an informat. I completely overlooked the obvious solution in my earlier reply.

s_lassen
Meteorite | Level 14

On a second thought, my first attempt permanently assigns an informat to the A variable. And you may have problems with that later, if you do not save the informat in a permanent library, which you probably do not want to, as it is just for this special purpose.

 

So you would probably rather want to use something like

data one; 
   infile data_in;                   
   input ID A : asterix.;     
run;

in the final data step.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1407 views
  • 3 likes
  • 3 in conversation