This might seem like a stupid question but how do I get a frequency distribution ... without the listing, just a summary?
for example in spss, if I want to get a frequency distribution of gender, I'd go:
Frequencies gender. run.
If I wanted the genders and respective ID's, I'd do:
Crosstabs ID by gender.
The later would give me a long listing, which seems to be the default when I do a proc freq in SAS. How would I get the former? I just want a count of missing values, etc for a variable, not a case by case list of males, female, male, male male, female ... etc
Yeah I tried doing that and all I get from SAS are these stupid endless requests to: save to a file, print or clear - any responses leading to yet another request for the same.
For example right now I am looking at a table that has ~4.6 million recs. I want a freq dist of the MD_diag1 field because I want to recode some text values into a numeric. So, I just want a freq dist of that field so I can scan through it for diagnosis that are similar and recode to be the same. In spss, this is a one line command and gives me a list summarize list/total count of each type of diagnosis in that field. In sas, I get an endless listing that I never actually get to see because 'the window is filled according to sas and I get prompted to dump the output to a file, print or clear it.
You said “gender” which usually has two values. Your question was stated so as to make it hard to know you didn’t mean “gender,” but MD_Diag1.” This variable apparently has many many possible distinct values, some millions actually. However, you did say that your output has “male” for a value, and that does sound like “gender,” not MD_Diag1.
If you leave “ID” out of the table request AND leave “gender” out of the table request, you should not see “male” or get a case-by-case result, as you did. Thus your problem becomes too many distinct values of MD_Diag1. Even if SAS (or SPSS) COULD print them, you could not scan with your eye millions of values for MD_Diag1.
If you print ones with counts of over 1000, that would be interesting in terms of getting at the most common diagnoses. However, you said your goal was to combine diagnoses, not find out the most common ones. My guess is that if your MD_Diag1 is, say, a ten-digit code, and you have millions of different ones, the first three digits would be a major grouping, then the next three digits would be a finer grouping within the first group, and so on. Someone who understands diagnoses would need to suggest a rule for grouping.