Re: Proc Transpose

deleted_user · Posted 04-30-2009 01:42 PM

Help please! I am a novice trying to transpose long/tall to short/wide....I am merging a relational database that came from Microsoft Access. Many of the tables, but not all, have duplicate/multiple observations. How can I easily merge and then transpose all the tables? I just want one StudyID, for example, per subject.

I transposed every single variable of all the tables individually and I wound up with such a ridiculously wide data set (with the correct number of observations, however)....
BUT many of the new variables created in the process of 'transpose' are blank or just repeated values.

Any ideas?

sbb · Posted 04-30-2009 04:42 PM

Share some of your PROC TRANSPOSE code. And if you can share a data sample of exactly what you have for INPUT layout and how you expect it to be rendered by PROC TRANSPOSE (considering the possible companion statements and/or parameters), that would go a long way with helping you in your quest for more effective "transposed" data.

Scott Barry
SBBWorks, Inc.

Doc_Duke · Posted 05-01-2009 08:48 AM

In addition to Steve's request for more detail to your question, I'll make some general observations. I infer you are working with clinical research data.

The request for easy is a nonstarter, it ain't gonna be easy. It's not particularly hard, but I would not call it easy.

Typically clinical research data will have a table with visit information, one row per visit per subject. Each visit will have some attributes (sometimes a hundred). The visits are identified by visit number and visit date. Sometimes there are visits that occur between scheduled visits (a real pain, I usually segregate them out and process them separately). You want to take these rows and line them up by sets of columns on visit number. TRANSPOSE will take the maximum number of visit and line up based on that. So, if you are interested in the first 5 visits, but somebody had 15 visits, a naive transpose will generate 15 times x variables. To make that come together, you need to first subset the data from access to get just the rows and columns of interest.

Similarly, if you have a diagnosis table. Every diagnosis is recorded, but you might only be interested in categories of diagnosis. Transform the dx to a category and transpose based on that variable.

There is some broad processes in common here. In the data warehousing domain, it's called "Extract, Transform, and Load" (ETL). You might get some good ideas on how to handle additional problems in the ETL literature.

Doc Muhlbaier
Duke

Doc_Duke · Posted 05-01-2009 08:49 AM

Opps. Sorry, I was referencing Scott's message.

Proc Transpose