HI Folks,
I am wondering how to format this type of character data.
The data is:
data test;
input result $20. ;
datalines;
< 20
101
Target Not Detected
1.05E+3 (3.02)
322286
162
;
run;
I want to decode this value like that:
1 = viral load suppressed (<1000 copies / mL)
2 = not viral load suppressed
99 = Missing
Anybody gives me some idea that how can I do that?
Thanks
A format is used to convert values into text. So you could think of them like a decoder ring.
So in your case it looks you want a format to DECODE 1 into 'viral load suppressed' etc.
So something like this (it makes no sense to code missing values as a valid number like 99).
proc format ;
value viral_load
1='viral load suppressed (<1000 copies / mL)'
2='not viral load suppressed'
.='Missing'
;
run;
But the real problem it looks like you are having is how to ENCODE the strings you have into the appropriate codes.
What is the logic for handling those example strings?
You posted examples of the free text field you have, but did not provide any rules (or even a sketch of the rules) for categorizing them. You did not even provide the categories you want for the example text you provided.
Keep the text as one variable. You can then convert that into one or two more variables to help you categorize them.
For example you could start by handling the text that can actually be interpreted as a number.
So if you have the dataset HAVE with the character variable RESULT.
data HAVE;
input result $20. ;
datalines;
.
< 20
101
Target Not Detected
1.05E+3 (3.02)
322286
162
;
You could create the numeric variable RESULTN and then categorize that.
data want;
set have;
resultn=input(result,??32.);
if missing(result) then category=.;
else if missing(resultn) then category=.U;
else if resultn < 1000 then category=1;
else category=2;
run;
Results:
Obs result resultn category 1 . . 2 < 20 . U 3 101 101 1 4 Target Not Detected . U 5 1.05E+3 (3.02) . U 6 322286 322286 2 7 162 162 1
You can then work on making the logic smarter for the .U category values. Perhaps you can also test if RESULT starts with '<' and take that in consideration when tying to assign a category. Or you could just build up a list of all of the unknown RESULTS and give them to a human to categorize. Then use that list to make a dataset or perhaps an INFORMAT that can convert some of those in the appropriate 1 or 2 category.
@ZahidHussain wrote:
Hi, Thanks for your reply. I got your point. One question this data file originally from CSV. and the one value of the variable is 1.05E+3 (3.02) which is character format. Now my question is how I convert it to numeric format and without scientific notation (E) in SAS?
Thanks
And what number do you think that string means?
It would be advisable to check with the provider of the data regarding the correct treatment of the various values. If the exponential value is unique, then setting it to missing sounds reasonable. However if you are getting a range of exponential values you might be missing some real data.
You could try this:
data want;
set have;
exponential_num = input(substr(exponential_char,1,7), 7.);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.