data WORK.cholesterol;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'path' delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=2 ;
format variant $70. ;
format minor_allele $2. ;
format minor_AF best12. ;
format expected_case_minor_AC best12. ;
format low_confidence_variant $5. ;
format n_complete_samples best12. ;
format AC best12. ;
format ytx best12. ;
format beta $12. ;
format se $11. ;
format tstat $12. ;
format pval $11. ;
input
variant $
minor_allele $
minor_AF
expected_case_minor_AC
low_confidence_variant $
n_complete_samples
AC
ytx
beta $
se $
tstat $
pval ;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable*/
run;
I imported data from codes above.
And I changed 'NaN' to 0 so that I can convert these columns to numeric.
data cholesterol;
set cholesterol;
array Var _character_;
do over Var;
if Var='NaN' then Var=0;
end;
run;
And I tried to convert these variables to numeric, but doesn't work.....
data cholesterol;
set cholesterol;
beta = input(se, best32.);
se = input(se, best32.);
tstat = input(tstat, best32.);
pval = input(pval, best32.);
run;
Appreciate if you could give any solution for this.
@monona wrote:
Oh this works! Lastly, can you explain brief logic behind the last statement?
input variant--ytx ( beta se tstat pval ) (:??32.) ;
Appreicate!!
It is just an INPUT statement. There are two variable lists. The first one by position. The second one just space delimited, but it could have been shortened to beta--pval.
The (...) (...) syntax is for listing a set of informats to apply the preceding set of variables. If the informat list is too short they are recycled. Using the (...)(...) means that I only have to type the informat specification once instead of repeating it for each variable in the list.
The double ? says don't generate any warnings or errors if the text is not compatible with the informat.
The : says to continue to use list mode (only read as many characters as there are before the next delimiter) instead of formatted mode even though there is an explicit informat in the INPUT statement. The 32. is the informat to use. It means to read the value using the normal numeric informat. The width of 32 is the max allowed, but not really needed since in list mode SAS will ignore the width that you set on your informat and instead just read the full set of characters found. You probably do NOT need the :32. at all since SAS already knows how to read numbers. But if your text values have dollar signs and/or commas then you would want to use the COMMA informat instead.
Hi @monona
I guess your log says something about variable has alreadu be defined as numeric. You must assign the numeric values to (new) numeric variables. If you want to preserve the original names, you could use rename:
data new (drop ctstat), set old (rename=(tstat=ctstat);
tstat = input(ctstat,best32.);
run;
Hi monona, my suggestion would be to keep it simple, just read the character values in, check if it's "NaN" then convert them if they are not.
data input ;
input charValue $ ;
cards;
Nan
0
1
2
Nan
4
5
;
run ;
data output ;
set input ;
if charValue="Nan" then
do ;
numValue=0 ;
end ;
else do ;
numValue=inputn(charValue,"8.") ;
end ;
put charValue= numValue= ;
run ;
I intend to replace NaN to 0 over all variables not single variable. How can I do that?
Why are you reading them in as character to begin with?
Looks like you might have sub-contracted coding your data step to PROC IMPORT?
Why does the file have character strings in numeric variables?
Did you generate that file from R perhaps? Can you teach R to not do that?
It is easier to convert the NaN (and any other non numeric strings) to missing instead of zero. Also probably more accurate.
You can use the ?? informat modifier to suppress the error messages.
data cholesterol;
infile 'path' delimiter='09'x TRUNCOVER DSD lrecl=32767 firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
Oh this works! Lastly, can you explain brief logic behind the last statement?
input variant--ytx ( beta se tstat pval ) (:??32.) ;
Appreicate!!
@monona wrote:
Oh this works! Lastly, can you explain brief logic behind the last statement?
input variant--ytx ( beta se tstat pval ) (:??32.) ;
Appreicate!!
It is just an INPUT statement. There are two variable lists. The first one by position. The second one just space delimited, but it could have been shortened to beta--pval.
The (...) (...) syntax is for listing a set of informats to apply the preceding set of variables. If the informat list is too short they are recycled. Using the (...)(...) means that I only have to type the informat specification once instead of repeating it for each variable in the list.
The double ? says don't generate any warnings or errors if the text is not compatible with the informat.
The : says to continue to use list mode (only read as many characters as there are before the next delimiter) instead of formatted mode even though there is an explicit informat in the INPUT statement. The 32. is the informat to use. It means to read the value using the normal numeric informat. The width of 32 is the max allowed, but not really needed since in list mode SAS will ignore the width that you set on your informat and instead just read the full set of characters found. You probably do NOT need the :32. at all since SAS already knows how to read numbers. But if your text values have dollar signs and/or commas then you would want to use the COMMA informat instead.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.