It's more complicated than the usual long data and short data thing. I was simplifying a bit.
You may already know what a visit matrix is but in case not I'll explain. Supposed there is a study that asks people questions at various visits over time, say at Baseline, 3 6 and 12 months. And say there are various data collection forms that may or may not at each visit (demographics or lab values or whatever. Say there are 10 forms.
If you put the 4 times across the top and the 10 forms down the side you form a matrix that has 4 x 10 = 40 cells. You put an 'x' in each cell where a form is collected. That's a visit matrix. Perhaps demographics is collected only at baseline so the row for it only has an 'x' at Baseline. Perhaps lab values are collected only in 6 month intervals so the row for it only has an 'x' at Baseline, 6 and 12 months. Etc. You get the idea. That is how the study I'm working on now is organized, only much bigger.
Suppose 15 of the 40 cells in your visit matrix has an 'x.' That means you have 15 formsets. The forms go down the left hand side and the visits (or sets) go across the top, thus where there is an 'x' we call it a formset. Also, the formsets are unique across subjects. So Subject 1 has formsets numbered 1 thru 15, subject 2 has formsets numbers 16 thru 30, etc.
The data is collected in one giant column. I did not set this up. I don't know if it's genius or if it''s crazy, although I do know it's crazy that we had one person set it up and nobody else knew how it worked then he left and now we have to deal with it because we're halfway through a very long study.
Here is how the collected data looks.
ID FormSet Value Variable
1 1 M Sex
1 1 70 Ht_Inches
1 1 150 Wt_lbs
1 2 120 BP_Diastolic
1 2 80 BP_Systolic
etc
All the data we collect is in the Value column. The Variable column tells the variable but its values will be repeated. For example, Sex will be included for each subject. If BP_Diastolic is measured for each subject at each visit then BP_Diastolic will be there every time. As you can imagine, after awhile the columns become very long.
Meanwhile, in other datasets we have attributes of the variables, such as the field rules and formats, etc. I merge them all together to get them into one dataset.
I hope that's at least somewhat clear. It's a pretty complicated thing. But anyway, IDLabel statement in PROC TRANSPOSE worked like a charm so that's good and now all that's left is assigning the formats
I don't see how I can assign the formats when it's in one giant column so I'll try it after I do the PROC TRANSPOSE and we have a rectangular dataset. Perhaps at that point I can merge in the name of the formats so that it's on the dataset and then use the PROC SQL code mentioned above. I haven't tried it yet. I think at that point I could also do a big DATA _NULL_ to create but since there are a lot of variables and formats it would be cleaner to do it with just those handful of lines in the PROC SQL mentioned above.
I'll keep you updated.
... View more