I'm reading in multiple files that need to be stacked into a single dataset. In some files, column A contains all numeric values. In other files, an * appears when numeric values are missing, defining the column type as character. I have a data step that sets * to missing and then creates a new numeric column using the input function. Is there a way to reference column type as part of a do statement where the data step executes only when the column type is character? Alternatively, is there a way to suppress "Invalid numeric data, '*'" message from the log?
data one; input ID A; datalines; 2477 195 2431 220 2456 173 2412 135 ; run; data two; input ID A $; datalines; 3777 * 9731 * 7756 110 2712 * ; run; data stacked; set _null_; run; %macro make_num(set); data &set; set &set; if A = "*" then A = ""; A_Num = input(A, best12.); run; data stacked; set stacked &set (drop = A); run; %mend; %make_num(one); %make_num(two);
In my 35 years of datastep programming I never considered an infile option for alternative missing characters. But now I see a need for that. But there is none.
Why would you have two sets of code? You can use your code for dataset TWO for ONE as well. Much easier. I gues the computing overhead would be minimal and not outweigh the ease of having one piece of code. and no need for going over your data twice.
data one_two;
length id 8 a_char $12 a 8;
input id a_char $;
drop a_char;
if a_char = '*' then a = .;
else a = input(a_char, best12.);
datalines;
...
run;
I would use an informat to take care of that, e.g.:
proc format;
invalue asterix
'*'=.
other=best12.;
run;
data one;
informat a asterix.;
input ID A;
datalines;
2477 195
2431 220
2456 173
2412 135
;
run;
data two;
informat a asterix.;
input ID A ;
datalines;
3777 *
9731 *
7756 110
2712 *
;
run;
You wouldn't need more than one datastep, though. Just assign the infiles you want to read with a single filename, e.g.
filename data_in ('c:\indata\*.dat','d:\test.dat');
or just use wildcards, as shown, if that is enough. And then just this:
data one;
informat a asterix.;
infile data_in;
input ID A;
run;
Yeah I 100% support the use of an informat. I completely overlooked the obvious solution in my earlier reply.
On a second thought, my first attempt permanently assigns an informat to the A variable. And you may have problems with that later, if you do not save the informat in a permanent library, which you probably do not want to, as it is just for this special purpose.
So you would probably rather want to use something like
data one;
infile data_in;
input ID A : asterix.;
run;
in the final data step.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.