BookmarkSubscribeRSS Feed
AnnaRombakh
Calcite | Level 5

We are interested in using data from the National Health and Nutrition Examination Survey (NHANES) to establish the prevalence of hearing loss among subjects who participated in the audiology portion of the survey.  The subject data are organized in rows.  The threshold for hearing (db) for each of 14 sound frequencies in both ears are provided in their databases (14 columns).  We are categorizing hearing impairment by the highest threshold (db) at any of those 14 sound frequencies.  Can anybody describe the code necessary to determine what the highest threshold would be?

 

ID#

500hz

1000hz

2000hz

3000hz

4000hz

5000hz

6000hz

123

0

0

10

20

0

10

0

234

0

0

0

10

0

20

0

456

0

0

10

20

45

75

70

458

 

 

 

 

 

 

 

478

 

 

 

 

 

 

 

 

7 REPLIES 7
Reeza
Super User
Given the input shown, what do you want as output?
ballardw
Super User

You would need the names of the SAS data set variables. I'm fairly sure they aren't actually 500hz.

 

The MAX function will report the largest non-missing value of a list of variables. In a data step very generic code would look like:

 

data want;

   set have;

   maxvalue = max(var1, var2, var3);

run;

 

If the names are nice you may be able to use a list such as

   maxvalue = max( of reading1 - reading14);

if the variable names are sequentially numbered. Or if the columns are adjacent columns in the data set

  maxvalue = max(of firstreading -- lastreading);  <= note the two dashes together.

or just list the variable names of interest.

AnnaRombakh
Calcite | Level 5

To better elaborate my question, I am trying to get SAS to read all of my variables 500, 1000, 2000, 3000, 4000, 6000, 8000 hz and categorize them by degree of hearing loss. The classification of hearing loss for my study is the same for all frequencies ex. -10-15 db is normal, 16-24db is slight hearing loss, etc. and I would like SAS to analyze all variables and categorize them by their degree of hearing loss. This is why I would like them to be read in rows, so that my output produces individuals in categories of hearing loss. Because I have 14 variables, I am confused on how to do this. 

ballardw
Super User

@AnnaRombakh wrote:

To better elaborate my question, I am trying to get SAS to read all of my variables 500, 1000, 2000, 3000, 4000, 6000, 8000 hz and categorize them by degree of hearing loss. The classification of hearing loss for my study is the same for all frequencies ex. -10-15 db is normal, 16-24db is slight hearing loss, etc. and I would like SAS to analyze all variables and categorize them by their degree of hearing loss. This is why I would like them to be read in rows, so that my output produces individuals in categories of hearing loss. Because I have 14 variables, I am confused on how to do this. 


What are the rules for categorizing? What type of analysis?

One approach when you need to make categories from more-or-less continuous values is to create a FORMAT.

Which might look something like:

Proc format;
value hearloss
0 - <10 ="Minimal"
10 -<16 ="Normal"
16 -<24 ="Slight"
24 - < ??="some description"
/*repeat as needed for desired ranges*/
;

Applying this format to a variable would create a group that is honored for procedures that use categorical values for analysis (or graphing or reporting).

But not being in the industry I have no idea how to combine 14 variables into a single "score" if that is what you want. If you have rules we can likely help implement them.

 

If you can't show us what you expect a data set to look like we can't help you make it.

 

Proc Transpose is a typical tool for making a data set long. One thing that would make this tricky is the NHANES weighting scheme. Making data "long" or "by rows" would tend to duplicate the weighting variable which means the weight would be applied incorrectly for almost any analysis.

AnnaRombakh
Calcite | Level 5
I apologize if my questions are confusing or lacking context. Eventually, I will be doing a multiple logistic regression with my data set. I have completed the proc format already for my SAS code, but because I am unable to get SAS to read each variable and classify it in a categorical variable (normal, slight, etc), I am unable to do my descriptive statistics. My thesis is analyzing the relationship between non-HDL-C (total cholesterol, high-density lipoprotein cholesterol/HDL) and degree of hearing loss and would like to be able to have a descriptive table that allows me to see how many individuals of elevated non-HDL-C have hearing loss.
Reeza
Super User
Let's back up a 1000 steps. Have you imported the NHANES data correctly into SAS first? You have a SAS dataset somewhere with the data? The NHANES data is public so you can easily share that data set as well and any code.
ballardw
Super User

Apply a format such as shown to the variables.

Should work.

You do not need to "read them" as shown, though a custom INFORMAT would do that. One reason to stick with the values as current is suppose you see a "borderline" value issue and want to investigate if the model behaves differently with 10-14 and 15 to 24 instead of 10-15 and 16 to 24(are these actually integer values?).

A different FORMAT would allow you to do that new model by changing ONE word: the format associated with the variables. If you actually create text values you would have to go back to an earlier step, create new text values, change the name of the data set to the one with the new values. Wash , rinse. repeat.

 

Example that you should be able to run:

proc format;
value agea
10-14 = '10-14'
15-19 = '15-19'
;
value ageb
10-12= '10 to 12'
13-15= '13 to 15'
16-high= '16+'
;
run;

proc freq data=sashelp.class;
   tables age;
   format age agea.;
run;
proc freq data=sashelp.class;
   tables age;
   format age ageb.;
run;

The format could be used with Proc Logistic or Surveylogistic with CLASS variables to create the "categories".

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 944 views
  • 1 like
  • 3 in conversation