I think your first task is to figure out just how sparse the data are and the first step within that is to find out the distribution of type. If it is fairly evenly distributed among your 600,000 cases then you have approximately 1500 cases per level of type, which is not sparse at all. However, if some levels of type have very few members, you might want to combine levels. You can also do the same for all your other variables. Next, to get an overall sense of sparseness, the /LIST option on PROC FREQ can be very helpful, something like: PROC FREQ data = mydata; TABLE type*kind*weekday*psum*rsum*dp*vnet*dep*dsum*dret*dnet/LIST; RUN: However, this may produce too many rows to look at, in which case you can do it by smaller sets of variables. The statistical technique should, I think, be multinomial logistic (as you suspected). There are exact methods to deal with sparse tables, but they will take preposterous amounts of time with N = 600,000. However, HPLOGISTIC may offer some savings of time, depending on your exact setup (see the documentation).
... View more