BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ZahidHussain
Calcite | Level 5

HI Folks,

 

I am wondering how to format this type of character data.

 

The data is:

data test;

input result $20. ;


datalines;

< 20
101
Target Not Detected
1.05E+3 (3.02)
322286
162
;
run;

 

I want to decode this value like that:

1 = viral load suppressed (<1000 copies / mL)
2 = not viral load suppressed
99 = Missing

 

Anybody gives me some idea that how can I do that? 

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Sajid01
Meteorite | Level 14
Great. If that solves your question, pleas do mark the question as resolved

View solution in original post

18 REPLIES 18
Tom
Super User Tom
Super User

A format is used to convert values into text.  So you could think of them like a decoder ring.  

So in your case it looks you want a format to DECODE 1 into 'viral load suppressed' etc.

So something like this (it makes no sense to code missing values as a valid number like 99).

proc format ;
  value viral_load 
  1='viral load suppressed (<1000 copies / mL)'
  2='not viral load suppressed'
  .='Missing'
  ;
run;

 

But the real problem it looks like you are having is how to ENCODE the strings you have into the appropriate codes.

 

What is the logic for handling those example strings?

ZahidHussain
Calcite | Level 5
Thanks for your reply. Encoding is problem. The result variable is character which has numeric number as well as character. If I use input function some value would be missing . So my question is, How I do format this variable both has numeric and character value
Tom
Super User Tom
Super User

You posted examples of the free text field you have, but did not provide any rules (or even a sketch of the rules) for categorizing them.  You did not even provide the categories you want for the example text you provided.

 

Keep the text as one variable.  You can then convert that into one or two more variables to help you categorize them.

 

For example you could start by handling the text that can actually be interpreted as a number.

So if you have the dataset HAVE with the character variable RESULT.

data HAVE;
  input result $20. ;
datalines;
.
< 20
101
Target Not Detected
1.05E+3 (3.02)
322286
162
;

You could create the numeric variable RESULTN and then categorize that.

data want;
  set have;
  resultn=input(result,??32.);
  if missing(result) then category=.;
  else if missing(resultn) then category=.U;
  else if resultn < 1000 then category=1;
  else category=2;
run;

Results:

Obs    result                 resultn    category

 1                                  .        .
 2     < 20                         .        U
 3     101                        101        1
 4     Target Not Detected          .        U
 5     1.05E+3 (3.02)               .        U
 6     322286                  322286        2
 7     162                        162        1


You can then work on making the logic smarter for the .U category values.  Perhaps you can also test if RESULT starts with '<' and take that in consideration when tying to assign a category.   Or you could just build up a list of all of the unknown RESULTS and give them to a human to categorize.  Then use that list to make a dataset or perhaps an INFORMAT that can convert some of those in the appropriate 1 or 2 category.

ZahidHussain
Calcite | Level 5
Hi, Thanks for your reply. I got your point. One question this data file originally from CSV. and the one value of the variable is 1.05E+3 (3.02) which is character format. Now my question is how I convert it to numeric format and without scientific notation (E) in SAS?
Thanks
Tom
Super User Tom
Super User

@ZahidHussain wrote:
Hi, Thanks for your reply. I got your point. One question this data file originally from CSV. and the one value of the variable is 1.05E+3 (3.02) which is character format. Now my question is how I convert it to numeric format and without scientific notation (E) in SAS?
Thanks

And what number do you think that string means? 

 

Spoiler
It does not look like a valid number because of the extra characters at the end.  So "1.05E+3" is just another way of writing 1050.  But the extra character make it an invalid input.  What is the meaning of the "(3.02)" at the end?
ZahidHussain
Calcite | Level 5
I am not pretty sure . As I know the E means 2.71. The same number is in the data set. So I am confused about this
Sajid01
Meteorite | Level 14
Hello
The understanding that E is 2.71 is incorrect. It is e that is 2.7123....
In the present context 1.05E+3 implies 1.05 x 10^+3 and this equal 1050.
From what that has been mentioned, the data [reparation step is being by passed. For example "Target Not Detected" means the component that was being analyzed is either absent or in such a small quantity that it could not be detected. In is typical in certain situations to take this as zero.

ZahidHussain
Calcite | Level 5
Thanks for your reply. You cleared lot if things. But the original number is 1.05E+3(3.02) . So, at the end of this (3.02) gives invalid as I know. So, whats your suggestion. take this number as a invalid or take substr from 1 to 7 which gives the number 1050. Thanks
Sajid01
Meteorite | Level 14
I do not know the line of activity or purpose of the analysis of this data. Consult with domain specialist for guidance. This is particularly because it does not fall under 1, 2 or 99, the values you are trying to encode to.
ZahidHussain
Calcite | Level 5
I think you are right, That number is invalid number , so its supposed to come missing. Thanks everybody for giving me the solution
SASKiwi
PROC Star

It would be advisable to check with the provider of the data regarding the correct treatment of the various values. If  the exponential value is unique, then setting it to missing sounds reasonable. However if you are getting a range of exponential values you might be missing some real data.

ZahidHussain
Calcite | Level 5
I check this number in excel with trim function . It gives invalid number. Thats why I think that this is invalid input. Also this data set there is only one variable like that. Also when I run the program with best32. format it gives the missing value. Any idea of changing the format of exponential value to numeric value.
Thanks
SASKiwi
PROC Star

You could try this:

data want;
  set have;
  exponential_num = input(substr(exponential_char,1,7), 7.);
run;
Sajid01
Meteorite | Level 14
Great. If that solves your question, pleas do mark the question as resolved

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1220 views
  • 0 likes
  • 5 in conversation