Re: SAS reading rows/combining categorical variables

AnnaRombakh · Posted 06-22-2021 04:25 PM

We are interested in using data from the National Health and Nutrition Examination Survey (NHANES) to establish the prevalence of hearing loss among subjects who participated in the audiology portion of the survey. The subject data are organized in rows. The threshold for hearing (db) for each of 14 sound frequencies in both ears are provided in their databases (14 columns). We are categorizing hearing impairment by the highest threshold (db) at any of those 14 sound frequencies. Can anybody describe the code necessary to determine what the highest threshold would be?

ID#	500hz	1000hz	2000hz	3000hz	4000hz	5000hz	6000hz
123	0	0	10	20	0	10	0
234	0	0	0	10	0	20	0
456	0	0	10	20	45	75	70
458
478

Reeza · Posted 06-22-2021 04:26 PM

Given the input shown, what do you want as output?

ballardw · Posted 06-22-2021 04:35 PM

You would need the names of the SAS data set variables. I'm fairly sure they aren't actually 500hz.

The MAX function will report the largest non-missing value of a list of variables. In a data step very generic code would look like:

data want;

set have;

maxvalue = max(var1, var2, var3);

run;

If the names are nice you may be able to use a list such as

maxvalue = max( of reading1 - reading14);

if the variable names are sequentially numbered. Or if the columns are adjacent columns in the data set

maxvalue = max(of firstreading -- lastreading); <= note the two dashes together.

or just list the variable names of interest.

AnnaRombakh · Posted 06-22-2021 04:56 PM

To better elaborate my question, I am trying to get SAS to read all of my variables 500, 1000, 2000, 3000, 4000, 6000, 8000 hz and categorize them by degree of hearing loss. The classification of hearing loss for my study is the same for all frequencies ex. -10-15 db is normal, 16-24db is slight hearing loss, etc. and I would like SAS to analyze all variables and categorize them by their degree of hearing loss. This is why I would like them to be read in rows, so that my output produces individuals in categories of hearing loss. Because I have 14 variables, I am confused on how to do this.

ballardw · Posted 06-22-2021 05:35 PM

@AnnaRombakh wrote:

To better elaborate my question, I am trying to get SAS to read all of my variables 500, 1000, 2000, 3000, 4000, 6000, 8000 hz and categorize them by degree of hearing loss. The classification of hearing loss for my study is the same for all frequencies ex. -10-15 db is normal, 16-24db is slight hearing loss, etc. and I would like SAS to analyze all variables and categorize them by their degree of hearing loss. This is why I would like them to be read in rows, so that my output produces individuals in categories of hearing loss. Because I have 14 variables, I am confused on how to do this.

What are the rules for categorizing? What type of analysis?

One approach when you need to make categories from more-or-less continuous values is to create a FORMAT.

Which might look something like:

Proc format;
value hearloss
0 - <10 ="Minimal"
10 -<16 ="Normal"
16 -<24 ="Slight"
24 - < ??="some description"
/*repeat as needed for desired ranges*/
;

Applying this format to a variable would create a group that is honored for procedures that use categorical values for analysis (or graphing or reporting).

But not being in the industry I have no idea how to combine 14 variables into a single "score" if that is what you want. If you have rules we can likely help implement them.

If you can't show us what you expect a data set to look like we can't help you make it.

Proc Transpose is a typical tool for making a data set long. One thing that would make this tricky is the NHANES weighting scheme. Making data "long" or "by rows" would tend to duplicate the weighting variable which means the weight would be applied incorrectly for almost any analysis.

AnnaRombakh · Posted 06-22-2021 06:12 PM

I apologize if my questions are confusing or lacking context. Eventually, I will be doing a multiple logistic regression with my data set. I have completed the proc format already for my SAS code, but because I am unable to get SAS to read each variable and classify it in a categorical variable (normal, slight, etc), I am unable to do my descriptive statistics. My thesis is analyzing the relationship between non-HDL-C (total cholesterol, high-density lipoprotein cholesterol/HDL) and degree of hearing loss and would like to be able to have a descriptive table that allows me to see how many individuals of elevated non-HDL-C have hearing loss.

Reeza · Posted 06-22-2021 06:21 PM

Let's back up a 1000 steps. Have you imported the NHANES data correctly into SAS first? You have a SAS dataset somewhere with the data? The NHANES data is public so you can easily share that data set as well and any code.

ballardw · Posted 06-22-2021 06:23 PM

Apply a format such as shown to the variables.

Should work.

You do not need to "read them" as shown, though a custom INFORMAT would do that. One reason to stick with the values as current is suppose you see a "borderline" value issue and want to investigate if the model behaves differently with 10-14 and 15 to 24 instead of 10-15 and 16 to 24(are these actually integer values?).

A different FORMAT would allow you to do that new model by changing ONE word: the format associated with the variables. If you actually create text values you would have to go back to an earlier step, create new text values, change the name of the data set to the one with the new values. Wash , rinse. repeat.

Example that you should be able to run:

proc format;
value agea
10-14 = '10-14'
15-19 = '15-19'
;
value ageb
10-12= '10 to 12'
13-15= '13 to 15'
16-high= '16+'
;
run;

proc freq data=sashelp.class;
   tables age;
   format age agea.;
run;
proc freq data=sashelp.class;
   tables age;
   format age ageb.;
run;

The format could be used with Proc Logistic or Surveylogistic with CLASS variables to create the "categories".

Ready to join fellow brilliant minds for the SAS Hackathon?