BookmarkSubscribeRSS Feed
mmm7
Calcite | Level 5

I'm reading in multiple files that need to be stacked into a single dataset. In some files, column A contains all numeric values. In other files,  an * appears when numeric values are missing, defining the column type as character. I have a data step that sets * to missing and then creates a new numeric column using the input function. Is there a way to reference column type as part of a do statement where the data step executes only when the column type is character? Alternatively, is there a way to suppress "Invalid numeric data, '*'" message from the log?

 

data one;                     
   input ID A;     
   datalines;          
2477 195 
2431 220 
2456 173 
2412 135 
; 
run;
data two;                     
   input ID A $;     
   datalines;          
3777  *  
9731  *  
7756  110
2712  * 
;  
run;
data stacked;
	set _null_;
run;
%macro make_num(set);
	data &set;
		set &set;
		if A = "*" then A = "";
		A_Num = input(A, best12.);
	run;
	data stacked;
		set stacked &set (drop = A);
	run;
%mend;
%make_num(one);
%make_num(two);


  

4 REPLIES 4
jklaverstijn
Rhodochrosite | Level 12

In my 35 years of datastep programming I never considered an infile option for alternative missing characters. But now I see a need for that. But there is none.

 

Why would you have two sets of code? You can use your code for dataset TWO for ONE as well. Much easier. I gues the computing overhead would be minimal and not outweigh the ease of having one piece of code. and no need for going over your data twice.

 

data one_two;
	length id 8 a_char $12 a 8;
	input id a_char $;
	drop a_char;
	if a_char = '*' then a = .;
	else a = input(a_char, best12.);
	datalines;
...
run;      
s_lassen
Meteorite | Level 14

I would use an informat to take care of that, e.g.:

proc format;
  invalue asterix
    '*'=.
    other=best12.;
run;
data one; 
   informat a asterix.;                    
   input ID A;     
   datalines;          
2477 195 
2431 220 
2456 173 
2412 135 
; 
run;
data two;                     
   informat a asterix.;  
   input ID A ;     
   datalines;          
3777  *  
9731  *  
7756  110
2712  * 
;  
run;

You wouldn't need more than one datastep, though. Just assign the infiles you want to read with a single filename, e.g.

filename data_in ('c:\indata\*.dat','d:\test.dat');

or just use wildcards, as shown, if that is enough. And then just this:

data one; 
   informat a asterix.; 
   infile data_in;                   
   input ID A;     
run;
jklaverstijn
Rhodochrosite | Level 12

Yeah I 100% support the use of an informat. I completely overlooked the obvious solution in my earlier reply.

s_lassen
Meteorite | Level 14

On a second thought, my first attempt permanently assigns an informat to the A variable. And you may have problems with that later, if you do not save the informat in a permanent library, which you probably do not want to, as it is just for this special purpose.

 

So you would probably rather want to use something like

data one; 
   infile data_in;                   
   input ID A : asterix.;     
run;

in the final data step.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 486 views
  • 3 likes
  • 3 in conversation