We are interested in using data from the National Health and Nutrition Examination Survey (NHANES) to establish the prevalence of hearing loss among subjects who participated in the audiology portion of the survey. The subject data are organized in rows. The threshold for hearing (db) for each of 14 sound frequencies in both ears are provided in their databases (14 columns). We are categorizing hearing impairment by the highest threshold (db) at any of those 14 sound frequencies. Can anybody describe the code necessary to determine what the highest threshold would be?
ID# | 500hz | 1000hz | 2000hz | 3000hz | 4000hz | 5000hz | 6000hz |
123 | 0 | 0 | 10 | 20 | 0 | 10 | 0 |
234 | 0 | 0 | 0 | 10 | 0 | 20 | 0 |
456 | 0 | 0 | 10 | 20 | 45 | 75 | 70 |
458 |
|
|
|
|
|
|
|
478 |
|
|
|
|
|
|
|
You would need the names of the SAS data set variables. I'm fairly sure they aren't actually 500hz.
The MAX function will report the largest non-missing value of a list of variables. In a data step very generic code would look like:
data want;
set have;
maxvalue = max(var1, var2, var3);
run;
If the names are nice you may be able to use a list such as
maxvalue = max( of reading1 - reading14);
if the variable names are sequentially numbered. Or if the columns are adjacent columns in the data set
maxvalue = max(of firstreading -- lastreading); <= note the two dashes together.
or just list the variable names of interest.
To better elaborate my question, I am trying to get SAS to read all of my variables 500, 1000, 2000, 3000, 4000, 6000, 8000 hz and categorize them by degree of hearing loss. The classification of hearing loss for my study is the same for all frequencies ex. -10-15 db is normal, 16-24db is slight hearing loss, etc. and I would like SAS to analyze all variables and categorize them by their degree of hearing loss. This is why I would like them to be read in rows, so that my output produces individuals in categories of hearing loss. Because I have 14 variables, I am confused on how to do this.
@AnnaRombakh wrote:
To better elaborate my question, I am trying to get SAS to read all of my variables 500, 1000, 2000, 3000, 4000, 6000, 8000 hz and categorize them by degree of hearing loss. The classification of hearing loss for my study is the same for all frequencies ex. -10-15 db is normal, 16-24db is slight hearing loss, etc. and I would like SAS to analyze all variables and categorize them by their degree of hearing loss. This is why I would like them to be read in rows, so that my output produces individuals in categories of hearing loss. Because I have 14 variables, I am confused on how to do this.
What are the rules for categorizing? What type of analysis?
One approach when you need to make categories from more-or-less continuous values is to create a FORMAT.
Which might look something like:
Proc format; value hearloss 0 - <10 ="Minimal" 10 -<16 ="Normal" 16 -<24 ="Slight" 24 - < ??="some description" /*repeat as needed for desired ranges*/ ;
Applying this format to a variable would create a group that is honored for procedures that use categorical values for analysis (or graphing or reporting).
But not being in the industry I have no idea how to combine 14 variables into a single "score" if that is what you want. If you have rules we can likely help implement them.
If you can't show us what you expect a data set to look like we can't help you make it.
Proc Transpose is a typical tool for making a data set long. One thing that would make this tricky is the NHANES weighting scheme. Making data "long" or "by rows" would tend to duplicate the weighting variable which means the weight would be applied incorrectly for almost any analysis.
Apply a format such as shown to the variables.
Should work.
You do not need to "read them" as shown, though a custom INFORMAT would do that. One reason to stick with the values as current is suppose you see a "borderline" value issue and want to investigate if the model behaves differently with 10-14 and 15 to 24 instead of 10-15 and 16 to 24(are these actually integer values?).
A different FORMAT would allow you to do that new model by changing ONE word: the format associated with the variables. If you actually create text values you would have to go back to an earlier step, create new text values, change the name of the data set to the one with the new values. Wash , rinse. repeat.
Example that you should be able to run:
proc format; value agea 10-14 = '10-14' 15-19 = '15-19' ; value ageb 10-12= '10 to 12' 13-15= '13 to 15' 16-high= '16+' ; run; proc freq data=sashelp.class; tables age; format age agea.; run; proc freq data=sashelp.class; tables age; format age ageb.; run;
The format could be used with Proc Logistic or Surveylogistic with CLASS variables to create the "categories".
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.