So to start off I will tell you a quick story; if you are not in a comfortable seat I will give you a minute to find one. First and foremost, I am a wet science guy, but I do have some natural skill in making computers behave, however I never saw myself as a programmer. I started working at a biotech company a few years ago as an immunology research scientist, working in R&D, developing new diagnostic products for our portfolio. Some of our products posed some challenges in collecting and managing data, and especially when we were developing a product that required review by the USDA. Due to the afore mentioned skill in ensuring appropriate behavior from electronic devices, I was awarded the extra role of managing our Data Integrity from conception through development and on to submission.
As part of that work I came face to face with what became one of my most formidable nemesis; microtiter plate assays. Now if you know what these are then you know my pain, if not I will describe them briefly. They are rectangular plastic devices that have 96 individual wells (8 rows (labeled A-H) x 12 columns) which are used for a multitude of disciplines as a platform to incubate, grow, detect, or really for any other means of scientific experimentation. They aslo come in a variety of sizes; 2x3, 3x4, 4x6, 6x8, and 8x12. Each well can contain a different combination that is being evaluated, or a different sample being used for verification of the test. These plates are then read in a spectrophotometer (plate reader) and the only usual output was a 96 cell rectangle (8x12) of raw optical density (OD) values in an excel file; I am aware that there are some readers that can dump to oracle databases but some of these development efforts were over before IT could setup a warehouse.
In my case it was always a test that was developed for a market that required the data to be submitted to the USDA for review and approval. Now part of the pain is that depending on which lab the data was being collected meant that the software for the plate readers changed, which programmatically is an easy fix, and I will discuss more in the code walk though. For the sake of this discussion I will outline my rules of engagement for ETL’ing data from the labs to the agency. All of the data came in excel files, I can only accept data approved by the assay team, and it must be the rawest form of the data (no manual manipulation, i.e. copy/paste/transpose/etc. done in excel).
My first step was to approach it from a process improvement perspective, and to rope those rascally scientists by developing a template (excel file), which does not make them all luddite crazy, and that they can use to collect the data. This has the benefit of at least mimicking an automated collection system (database dump), and allowed me to control the format that I was expecting to receive and thus reduced variability. It also had to be fully developed to handle the most complex experimental design, including dilution ratios for each well, multiple lots of strips within a single plate, and high volumes of plates in a single experiment.
So once I got the business approval on the excel my next step was to design a program around the template so that it was as easy as possible to make sure that only minimal code manipulation was necessary to change the processing parameters; so an automated program to process them all was an absolute must. The final output of the program had to be a file that matched the various maps into a single dataset that our biostatistician would then use to analyze the experiment.
My last step was to spend time and effort to take each section of developed code and review and enhance it so that each version became more succinct and automated than the one before it. I will admit that I am not done with this step. As I learn and mature in my programming skills I revisit this program and try to make it more efficient. Our company just recently transitioned to O365, and we got a SAS server last year, so I am in the process of transitioning this from a Base SAS program running client side, to an EG program running on the SAS server. There are subtle differences in the way Base and EG do things and my mentor was a purist; she demanded that I never, ever, ever, ever, developed code in EG (she claimed it would make me a lesser programmer). While I know some great programmers that only use EG, I have only come across a few that truly understand how their code really works. So although she may have been an unwavering whip cracker, she was right and I thank her for making me learn it the old school way.
Now that you have the gist of the story it’s your call as to whether or not you want to dive into the attached PDF. I will warn you that for some it is intense, and you may not understand much of what you are looking at. However, it is my hope that you will learn something that will make you a better coder, and if you have any questions I am more than happy to answer them.
Thanks,
Chris
I will add some further attachments in the next few days.
Chris- Thanks so much for sharing your successful project! I’m another scientist that uses SAS to convert crummy Excel files / Access databases (crummy from a setup perspective) into usable information.
> she demanded that I never, ever, ever, ever, developed code in EG
Developing in EG is like playing the violin with boxing gloves. Don't let go of her! 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.