BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cooksam13
Fluorite | Level 6

hi,

I have a very large data set of about 10,000 observations and 140 variables varying in numbers and characters. I am trying to find how many time Hep B is mentioned in the data set. Basically trying to quantify doctor's notes, This could show up as "hepatitis B","HepB" "HBsAG negative" "HBsAG positive" etc. I only need the rows associated with people who have been tested for Hep B, but it could be mentioned in different columns such as "remarks" or "tests" 

 

 

I have tried using several different methods I will put below but usually I come out with the wrong number or an error. 

 

data program.new;
set sheet.old;
keep testresults Remarks;
CB = indexc(tests, 'Hep', 'HBsAg');
CB2 = indexc(remarks, 'Hep', 'HBsAG');
IF CB = 0 then delete;
IF CB2 = 0 then delete;
run;

data program.new;
set sheet.old;
if index(upcase(tests), 'HEP') then Hepcheck = "yes";
if index(upcase(tests), 'HBSAG') then HEpCheck = "yes";
if index(upcase(remarks), 'HEP') then Hepcheck = "yes";
if index(upcase(remarks), 'HBSAG') then HepCheck = "yes";
else Hepcheck = "no";

if hepcheck = "no" then delete;
keep hepcheck;
run;

 if someone could help that would be great!

1 ACCEPTED SOLUTION

Accepted Solutions
Rydhm
Obsidian | Level 7

In your second approach, please use else if , otherwise the last condition will ruin all your logic.

 

if index(upcase(tests), 'HEP') then Hepcheck = "yes";
else if index(upcase(tests), 'HBSAG') then HEpCheck = "yes";
else if index(upcase(remarks), 'HEP') then Hepcheck = "yes";
else if index(upcase(remarks), 'HBSAG') then HepCheck = "yes";
else Hepcheck = "no";

View solution in original post

5 REPLIES 5
Rydhm
Obsidian | Level 7

In your second approach, please use else if , otherwise the last condition will ruin all your logic.

 

if index(upcase(tests), 'HEP') then Hepcheck = "yes";
else if index(upcase(tests), 'HBSAG') then HEpCheck = "yes";
else if index(upcase(remarks), 'HEP') then Hepcheck = "yes";
else if index(upcase(remarks), 'HBSAG') then HepCheck = "yes";
else Hepcheck = "no";
Cooksam13
Fluorite | Level 6

Thank you for that quick fix, I tried to condense the code:

 

data program.ds2054;
set sheet.ds2054;
if index(upcase(classbotherspecify), 'HEP C' or 'HEPATITIS C') then hepcheck = 0;
else if index(upcase(classbotherspecify), 'HEP' or 'HBSAG' or 'HBV') then Hepcheck = 1 ;
Else if index(upcase(remarks), 'HEP C' or 'HEPATITIS C') then hepcheck =0;
Else if index(upcase(remarks), 'HEP' or 'HBSAG' or 'HBV') then hepcheck =1;
else if index(upcase (OtherClassBDetails), 'HEP C' or 'HEPATITIS C') then hepcheck =0;
else if index(upcase (OtherClassBDetails), 'HEP' or 'HBSAG' or 'HBV') then hepcheck =1;

else Hepcheck = 0;

 However I get this error now and no data shows up even though i had a lot of data the other way

 

 

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
71:38 71:49 72:43 72:52 72:63 73:32 73:43 74:32 74:41 74:52 75:44 75:55 76:44 76:53 76:64
80:39 80:53 81:32 81:46 82:42 82:56
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
71:46 72:60 73:40 74:49 75:23 75:52 76:23 76:61 80:50 81:43 82:22 82:53
NOTE: Variable OtherClassBDetails is uninitialized.
NOTE: Invalid numeric data, 'HEP C' , at line 71 column 38.
NOTE: Invalid numeric data, 'HEPATITIS C' , at line 71 column 49.
NOTE: Invalid numeric data, 'HEP' , at line 72 column 43.
tarheel13
Rhodochrosite | Level 12

What are the errors you're getting? 

ballardw
Super User

You don't want INDEXC.

Indexc Searches a character expression for any of the specified characters, and returns the position of that character.

So in this example:

data junk;
   input word $;
   pos = indexc(word,'hep');
datalines;
p
ah
ace
;

you get Pos values of 1, because there is a p in the first position in the first value of word, 2 because h is the second letter in the second value and 3 because e is the third letter in word on the observation. ANY of the characters.

If you want to find the entire string HEP then you need to use one of Index, Indexw, Find or Findw. And as in your other post the things like case consideration and presence as a substring in other words is critical.

 

It is best to provide some example values of your variable and what you expect the result to look like.

 

And for consistency sake on my part, I strongly suggest that you use 1/0 numeric coding instead of 'yes' 'no' text coding.

Later when you want to do summaries you can get the number of yes by using the SUM , the percent yes with the Mean statistics in any of multiple procedures. Max =1 tells you if that at least one value is 1, Max=0 tells you none are 1, Range=0 tells you all are the same value. A bit trickier to get such things with 'yes' / 'no'.

tarheel13
Rhodochrosite | Level 12

Try prxmatch? 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2563 views
  • 0 likes
  • 4 in conversation