08-23-2014 01:12 PM
(newbie here...been using SAS for just two weeks now...)
My data set has about 2M entries broken down into 2 classes where the entries must be classified into 16 gender/age groups. I need to make histograms of about 25 variables per entry. Each plot must overlay histograms of one of the variables for the two classes, and I need one plot for every combination of gender/age. The problem is that sgplot will do this if I overlay two different variables, but I want to overlay one variable for two classes.
The usual solution is to widen the data so that for each original variable there are two variables in the widened data set, one for each for each of the two classes. This may be impractical in my case, I'd have to widen the 2M entries of 25 variables into 2M entries of (25 variables) times (2 classes) and maybe times (16 gender/age cohorts). Or perhaps with the use of WHERE clauses in sgplot I might only have to double the number of variables.
Overlaying in this manner is simple in other statistical packages, there must be a better way in SAS! Might it have something to do with custom templates? Or must I widen the data set? And if so, must I just double the number of variables (one for each class), or must I explode each variable by a factor of 32 (for each gender/age group as well)?
08-24-2014 11:10 PM
Not sure I fully understand your use case, but it seems you are asking for Histogram by groups. Also, you did not mention which release of SAS you are using. SGPLOT can do grouped histograms with SAS 9.4M2, released Aug 5. But since you are unlikely to have access to that release, the only way I can see is to widen your data into individual columns per group.
See my recent blog article: New Graph Features in SAS9.4M2
See previous article on Comparative Histograms: Comparative Histograms - Graphically Speaking
08-25-2014 09:18 AM
Thanks Sanjay, the new features in SAS9.4M2 may be exactly what I want. Can you specify multiple variables in the GROUP= option?