Hello,
I'd like to read in a variable and have it change from character to numeric - retaining the same name. The issue I'm facing is when a character value of .U or .N is inputted into numeric - I lose the missing type and all become just dot (.)
If this isn't possible I can create new variables and drop the character variables - but if there is any way ....
I'm reading in variables as character and applying the format:
proc format;
invalue chtono
'Y'= 1
'N'= 2
'X'= .N
other= .U;
Which results in character values of '1', '2', 'N', and 'U'. When I attempt to change these to numbers - I lose the Ns and Us and only have 'generic' missings for them. Regardless of an initial character length of 1 or 2.
I change them via:
data new(drop=x );
birthdata01 (rename=(&name. =x));
&name.=input(x, 2.);
run;
Most likely there isn't a way for SAS to translate a character 'U' to a missing numeric value of .U but thought to ask.
Yes, the 1 indicated length and could either be specified as the default when you create the informat or, like I did, when you use the informat.
I had to guess at the new informat you introduced, but the following reads the kind of data that I think you are dealing with:
proc format;
invalue chtono
Y=1
N=2
X=.N
other=.U;
invalue degf
H=1
A=2
B=3
M=4
X=.N
other=.U;
run;
data want;
input @1 (fdobmo fdobdy) (2.)
(ever_mar married pat_ack) (chtono1.)
mat_deg degf1.;
cards;
0315YYYH
0617XXUB
1002NN2M
12142NXX
;
run;
Wouldn't you get what you want, directly, if you apply the informat you created when you input the data? e.g.,
proc format;
invalue chtono
'Y'= 1
'N'= 2
'X'= .N
other= .U;
run;
data test;
informat x chtono.;
input x;
cards;
1
Y
X
N
2
;
I can try - but the cards would be 'Y', 'N', 'X'.
The raw data are character and I'm trying to use the same variable name and end up with 1,2,.N and .U.
Thank you
I'm not sure what you mean. If you are saying that the value in the datalines have quotes around them, just include an infile statement with a dsd option. e.g.,
proc format;
invalue chtono
'Y'= 1
'N'= 2
'X'= .N
other= .U;
run;
data test;
informat x chtono.;
infile cards dsd;
input x;
cards;
'1'
'Y'
'X'
'N'
'2'
;
Thank you. I think that will work - just need to test it with how I'm reading in the raw data. I think I need to avoid infiling the data as character initially.
ag
Hi,
If I need to input the data and include the variables' lengths - how can I incorporate the suggested solution?
355 data stout.&state._RawBirthData ; /* data read in to become formatted data */
356 infile stbirth LRECL=&birthreclength linesize=&birthlinesize
357 N=&birthLinesPerObs missover;
358 %create_code_statements(birthfmt, pramsvars) /*informats for formatting birth file */
...
MPRINT(CREATE_CODE_STATEMENTS): ever_mar = input(ever_mar, chtono.);
MPRINT(CREATE_CODE_STATEMENTS): married = input(married, chtono.);
...
MPRINT(CREATE_INPUT_STRING_D3): #1 @0090 ever_mar 1.
MPRINT(CREATE_INPUT_STRING_D3): #1 @0091 married 1.
...
NOTE: Invalid data for ever_mar in line 2 90-90.
NOTE: Invalid data for married in line 2 91-91.
Thank you,
Anjali
It looks like you are only reading in 1 character for ever_mar and 1 character for married. As such, I would go back to my originally suggested proc format. As for your errors, you macro appears to be assigning informats of 1. when you probably need them to be the name of the informat you create (e.g., chtono1.).
Thank you for the response. Why is the length of 1 problematic? I have put the formats in a file that I include at the start of the program. I think you're suggesting #1 @0090 ever_mar chotno. ? Is that correct? I will do so. Guess I need a bit more clarity - sorry for the hassle.
Anjali
A length of 1, in itself, isn't problematic. However, since you don't have delimiters between fields, you have to account for that fact. There is probably a modifier I can't think of at the moment thus, if I were pressed to read the data immediately, I would read them in as characters and convert them. E.g., the following would provide (I think) what you expect:
proc format;
invalue chtono
'Y'= 1
'N'= 2
'X'= .N
other= .U;
run;
data want (drop=in_:);
input in_x $ 1-1 in_y $ 2-2;
x=input(in_x,chtono.);
y=input(in_y,chtono.);
cards;
12
YN
XY
NX
21
;
run;
It *seems* to be working without designating the length. Will test more.
Previously, I was reading in the raw data with the $char1. or 1. informat. Then I was trying to apply my user-defined formats and ran into the char vs numeric issue.
Your suggestion to use my user-defined informats makes so much sense. And much less code - as I can read and format the vars in 1 step:
MPRINT(CREATE_INPUT_STRING_D3): #1 @0090 ever_mar chtono.
MPRINT(CREATE_INPUT_STRING_D3): #1 @0091 married chtono.
MPRINT(CREATE_INPUT_STRING_D3): #1 @0092 pat_ack chtono.
Fingers crossed - seems to work.
Many thanks.
I think that your code will produce the wrong results except for the last field and any field that is followed by a delimiter (e.g., a blank).
Thanks for the heads up. I'll check. Maybe the @0090 etc will help designate the start of a new var ... probably not!
Just trying to avoid the double name convention - reading in 1 var as character and renaming it to a numeric.
Yeah - back to the drawing board. Argh. I do have the start and end column documented for each variable and can subtract them +1 to get the length.
Basically - am I correct that as soon as you designate length you also imply if it is char or numeric? And then my attempt falls apart? I've been struggling with this for weeks - maybe time to quit the attempt for finesse.
It's so close - the initially char variables are indeed ending up as numeric. Just an issue with reading in too much data due to the lack of a length.
Anjali
I think that the following might work for you:
proc format;
invalue chtono
Y= 1
N= 2
X= .N
other= .U;
run;
data want;
input #1 @1 (x y) (chtono1.)
#2 z $;
cards;
12
xx
YN
xx
XY
xx
NX
xx
21
xx
;
run;
Would you mind dissecting/explaining it a bit? It chtono1. equivalent to chtono above? The '1' doesn't indicate length, does it? Are the lengths implied in the code above?
My raw data is all on 1 line per client, non delimited. And the SAS code is created from a spreadsheet containing:
varName | fromData | startPointer | endPointer | Row | nformat |
fdobmo | birth | 0085 | 0086 | 1 | 2. |
fdobdy | birth | 0087 | 0088 | 1 | 2. |
ever_mar | birth | 0090 | 0090 | 1 | chtono. |
married | birth | 0091 | 0091 | 1 | chtono. |
pat_ack | birth | 0092 | 0092 | 1 | chtono. |
mat_deg | birth | 0093 | 0093 | 1 | degf. |
At least that's from my last attempt at changing nformat to include some created informats as well as lengths with char and numeric designations.
Ever grateful,
Anjali
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.