hi,
I have a very large data set of about 10,000 observations and 140 variables varying in numbers and characters. I am trying to find how many time Hep B is mentioned in the data set. Basically trying to quantify doctor's notes, This could show up as "hepatitis B","HepB" "HBsAG negative" "HBsAG positive" etc. I only need the rows associated with people who have been tested for Hep B, but it could be mentioned in different columns such as "remarks" or "tests"
I have tried using several different methods I will put below but usually I come out with the wrong number or an error.
data program.new;
set sheet.old;
keep testresults Remarks;
CB = indexc(tests, 'Hep', 'HBsAg');
CB2 = indexc(remarks, 'Hep', 'HBsAG');
IF CB = 0 then delete;
IF CB2 = 0 then delete;
run;
data program.new;
set sheet.old;
if index(upcase(tests), 'HEP') then Hepcheck = "yes";
if index(upcase(tests), 'HBSAG') then HEpCheck = "yes";
if index(upcase(remarks), 'HEP') then Hepcheck = "yes";
if index(upcase(remarks), 'HBSAG') then HepCheck = "yes";
else Hepcheck = "no";
if hepcheck = "no" then delete;
keep hepcheck;
run;
if someone could help that would be great!
In your second approach, please use else if , otherwise the last condition will ruin all your logic.
if index(upcase(tests), 'HEP') then Hepcheck = "yes";
else if index(upcase(tests), 'HBSAG') then HEpCheck = "yes";
else if index(upcase(remarks), 'HEP') then Hepcheck = "yes";
else if index(upcase(remarks), 'HBSAG') then HepCheck = "yes";
else Hepcheck = "no";
In your second approach, please use else if , otherwise the last condition will ruin all your logic.
if index(upcase(tests), 'HEP') then Hepcheck = "yes";
else if index(upcase(tests), 'HBSAG') then HEpCheck = "yes";
else if index(upcase(remarks), 'HEP') then Hepcheck = "yes";
else if index(upcase(remarks), 'HBSAG') then HepCheck = "yes";
else Hepcheck = "no";
Thank you for that quick fix, I tried to condense the code:
data program.ds2054;
set sheet.ds2054;
if index(upcase(classbotherspecify), 'HEP C' or 'HEPATITIS C') then hepcheck = 0;
else if index(upcase(classbotherspecify), 'HEP' or 'HBSAG' or 'HBV') then Hepcheck = 1 ;
Else if index(upcase(remarks), 'HEP C' or 'HEPATITIS C') then hepcheck =0;
Else if index(upcase(remarks), 'HEP' or 'HBSAG' or 'HBV') then hepcheck =1;
else if index(upcase (OtherClassBDetails), 'HEP C' or 'HEPATITIS C') then hepcheck =0;
else if index(upcase (OtherClassBDetails), 'HEP' or 'HBSAG' or 'HBV') then hepcheck =1;
else Hepcheck = 0;
However I get this error now and no data shows up even though i had a lot of data the other way
What are the errors you're getting?
You don't want INDEXC.
Indexc Searches a character expression for any of the specified characters, and returns the position of that character.
So in this example:
data junk; input word $; pos = indexc(word,'hep'); datalines; p ah ace ;
you get Pos values of 1, because there is a p in the first position in the first value of word, 2 because h is the second letter in the second value and 3 because e is the third letter in word on the observation. ANY of the characters.
If you want to find the entire string HEP then you need to use one of Index, Indexw, Find or Findw. And as in your other post the things like case consideration and presence as a substring in other words is critical.
It is best to provide some example values of your variable and what you expect the result to look like.
And for consistency sake on my part, I strongly suggest that you use 1/0 numeric coding instead of 'yes' 'no' text coding.
Later when you want to do summaries you can get the number of yes by using the SUM , the percent yes with the Mean statistics in any of multiple procedures. Max =1 tells you if that at least one value is 1, Max=0 tells you none are 1, Range=0 tells you all are the same value. A bit trickier to get such things with 'yes' / 'no'.
Try prxmatch?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.