Well, I don't know that paper, however the process itself sounds backwards. PDF's are outputs, they are only used for people to look at, they are not conducive to any other process. Most databases (Oracle, Medidata Rave etc.) have modules designed for standard CRF builds. These are accessible for Data Management staff - who are responsible for this part - and other users. It is down to the DM group to create standardised CRF libraries, then use these to implement database builds. As a programmer, you can simply extract this metadata directly from the database. This is a preferred method as then all the information done and entered in one place (hence one of the main reasons we use databases in the first place), it is stored in a usable format, and provides the option to extract as raw data or produce reports. Doing this process the other way, getting an output, then reading that in and processing loses all of this - i.e. if anything changes you need to start again by getting the output and processing it.
... View more