BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
monona
Obsidian | Level 7
data WORK.cholesterol;
	%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
	infile 'path' delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=2 ;
	format variant $70. ;
	format minor_allele $2. ;
	format minor_AF best12. ;
	format expected_case_minor_AC best12. ;
	format low_confidence_variant $5. ;
	format n_complete_samples best12. ;
	format AC best12. ;
	format ytx best12. ;
	format beta $12. ;
	format se $11. ;
	format tstat $12. ;
	format pval $11. ;

	input
		variant  $
		minor_allele  $
		minor_AF
		expected_case_minor_AC
		low_confidence_variant  $
		n_complete_samples
		AC
		ytx
		beta  $
		se  $
		tstat  $
		pval ;
	if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable*/ 
run;

I imported data from codes above.

Capture.PNG

 

And I changed 'NaN' to 0 so that I can convert these columns to numeric.

data cholesterol;
	set cholesterol;
	array Var _character_;
		do over Var;
			if Var='NaN' then Var=0;
		end;
run;

Capture.PNG

 

 

And I tried to convert these variables to numeric, but doesn't work.....

data cholesterol;
	set cholesterol;
	beta  = input(se, best32.);
	se    = input(se, best32.);
	tstat = input(tstat, best32.);
	pval  = input(pval, best32.);
run;

Capture.PNG

 

Appreciate if you could give any solution for this.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@monona wrote:

Oh this works! Lastly, can you explain brief logic behind the last statement?

input variant--ytx ( beta se tstat pval ) (:??32.) ;

 Appreicate!!


It is just an INPUT statement. There are two variable lists. The first one by position.  The second one just space delimited, but it could have been shortened to beta--pval.

 

The (...) (...) syntax is for listing a set of informats to apply the preceding set of variables. If the informat list is too short they are recycled.  Using the (...)(...) means that I only have to type the informat specification once instead of repeating it for each variable in the list.

 

The double ? says don't generate any warnings or errors if the text is not compatible with the informat. 

 

The : says to continue to use list mode (only read as many characters as there are before the next delimiter) instead of formatted mode even though there is an explicit informat in the INPUT statement.   The 32. is the informat to use. It means to read the value using the normal numeric informat.  The width of 32 is the max allowed, but not really needed since in list mode SAS will ignore the width that you set on your informat and instead just read the full set of characters found.  You probably do NOT need the :32. at all since SAS already knows how to read numbers.  But if your text values have dollar signs and/or commas then you would want to use the COMMA informat instead.

View solution in original post

7 REPLIES 7
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @monona 

I guess your log says something about variable has alreadu be defined as numeric. You must assign the numeric values to  (new) numeric variables. If you want to preserve the original names, you could use rename:

 

data new (drop ctstat), set old (rename=(tstat=ctstat);

tstat = input(ctstat,best32.);

run;

 

 

ErikLund_Jensen
Rhodochrosite | Level 12
- of course double pranthesis as the end: (rename=(tstat=ctstat xx=cxx));
AMSAS
SAS Super FREQ

Hi monona, my suggestion would be to keep it simple, just read the character values in, check if it's "NaN" then convert them if they are not.

 

data input ;
	input charValue $ ;
cards;
Nan
0
1
2
Nan
4
5
;
run ;

data output ;
	set input ;
	if charValue="Nan" then
	do ;
		numValue=0 ;
	end ;
	else do ;
		numValue=inputn(charValue,"8.") ;
	end ;
	put charValue= numValue= ;
run ;
monona
Obsidian | Level 7

I intend to replace NaN to 0 over all variables not single variable. How can I do that?

Tom
Super User Tom
Super User

Why are you reading them in as character to begin with? 

Looks like you might have sub-contracted coding your data step to PROC IMPORT?

 

Why does the file have character strings in numeric variables? 

Did you generate that file from R perhaps?  Can you teach R to not do that?

 

It is easier to convert the NaN (and any other non numeric strings) to missing instead of zero.  Also probably more accurate.

You can use the ?? informat modifier to suppress the error messages.

data cholesterol;
  infile 'path' delimiter='09'x TRUNCOVER DSD lrecl=32767 firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

 

 

 

 

monona
Obsidian | Level 7

Oh this works! Lastly, can you explain brief logic behind the last statement?

input variant--ytx ( beta se tstat pval ) (:??32.) ;

 Appreicate!!

Tom
Super User Tom
Super User

@monona wrote:

Oh this works! Lastly, can you explain brief logic behind the last statement?

input variant--ytx ( beta se tstat pval ) (:??32.) ;

 Appreicate!!


It is just an INPUT statement. There are two variable lists. The first one by position.  The second one just space delimited, but it could have been shortened to beta--pval.

 

The (...) (...) syntax is for listing a set of informats to apply the preceding set of variables. If the informat list is too short they are recycled.  Using the (...)(...) means that I only have to type the informat specification once instead of repeating it for each variable in the list.

 

The double ? says don't generate any warnings or errors if the text is not compatible with the informat. 

 

The : says to continue to use list mode (only read as many characters as there are before the next delimiter) instead of formatted mode even though there is an explicit informat in the INPUT statement.   The 32. is the informat to use. It means to read the value using the normal numeric informat.  The width of 32 is the max allowed, but not really needed since in list mode SAS will ignore the width that you set on your informat and instead just read the full set of characters found.  You probably do NOT need the :32. at all since SAS already knows how to read numbers.  But if your text values have dollar signs and/or commas then you would want to use the COMMA informat instead.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1911 views
  • 4 likes
  • 4 in conversation