Hi all,
I got asked what I thought was a relatively easy question today, but after getting into it, I couldn't answer it on my own. I am turning to you all as I am sure that someone else has dealt with something similar. I am trying to create difference scores, but all I have to go off of is a screenshot of an SPSS dataset as they say they can't share the whole set with me. I do recognize that I am in a SAS forum, but it's mostly about the logic behind the problem. I also recognize that we do not generally like pictures but rather the data itself. Again, I apologize but am working with what I have.
I have edited the picture for clarity. I'm trying to create a new variable that takes a participant's value for Stimulation at time 0 from the alcohol session minus the value for Stimulation at time 0 from the placebo session (i.e., AmPstim; Alcohol minus Placebo for stimulation; two cells highlighted red in picture above). However, because each value is on a different row it won't let me create a difference score. My suggestion was to switch to wide format to create the difference score and then switch back, but I was wondering if anyone knew a way to do this with syntax while keeping it in long format(SPSS would be preferred but I can pretty easily translate SAS to SPSS). Alternatively, I would love any lessons on how others would proceed with this type of problem in the future or any applications of why one way would be better as I cannot wrap my head around the logic to program is.
Thank you in advance and I apologize for not adhering to all of the rules.
I may be missing it in your message, but what do you want as output?
Where would that go in the screenshot?
In SAS I would merge the data with itself, rather than make it a single row and then do the subtraction. You can do the similar functionality in SPSS by merging, doing the subtraction/create a new variable and only keeping the results.
From values in the data set how do we determine which value is the "value for Stimulation at time 0 from the alcohol session" and "Stimulation at time 0 from the placebo session" (not order in a picture but a rule, set of variable values or similar)
You do not have any variable labeled "time" so this is somewhat important.
How do we identify "participant"? What role does drinking_session play? (I suspect that this has some correspondence to time 0 but you didn't tell us that).
Second is what does the desired output look like.
This is some pseudo code that places a difference on what I think is the row with the placebo value.
data want; set have; retain alc ; by study_id drinking_session; if first.study_id then alc=youralcoholvariable; if first.drinking_session and drinking_session=2 then difference = alc -yourplacebovariable; run;
This assumes the data is sorted by study_id drinking_session and that the orders of the variables of interest appear on the first record the drinking session.
It has been a long time since I did anything with SPSS and do not remember an equivalent to the SAS First and Last processing. Good luck.
I apologize. I was evidently not very clear judging by the confusion regarding this. Still, thank you for your help!
To answer some of your questions in case someone else comes along and wants this valuable information,
@ballardw wrote:From values in the data set how do we determine which value is the "value for Stimulation at time 0 from the alcohol session" and "Stimulation at time 0 from the placebo session" (not order in a picture but a rule, set of variable values or similar)
These should be the variables highlighted in the red boxes. The output that I desire is Astimulation_tot (when order= alcohol) - Pstimulation_tot (when order= placebo) for each of the assessment_r that they share for each participant.
@ballardw wrote:You do not have any variable labeled "time" so this is somewhat important.
How do we identify "participant"? What role does drinking_session play? (I suspect that this has some correspondence to time 0 but you didn't tell us that).
That was one of the comments I had as well is that "time" is really important here. I believe the "time" variable to be "assessment_r" as it ranges from 0 to 11 and repeats this pattern when order=placebo and order=alcohol.
The participant is based on the "study_id" which is unique for each participant. Drinking_session is only ever 1 or 2. Some participants did alcohol first and some did placebo. Again, this is how I interpreted it as I got roughly the same description you got, but I do have the ability to ask him some questions.
@ballardw wrote:Second is what does the desired output look like.
This is some pseudo code that places a difference on what I think is the row with the placebo value.
data want; set have; retain alc ; by study_id drinking_session; if first.study_id then alc=youralcoholvariable; if first.drinking_session and drinking_session=2 then difference = alc -yourplacebovariable; run;This assumes the data is sorted by study_id drinking_session and that the orders of the variables of interest appear on the first record the drinking session.
It has been a long time since I did anything with SPSS and do not remember an equivalent to the SAS First and Last processing. Good luck.
The desired output would be a new variable called "AmPStim" that would be the difference scores explained above. For instance, the first few would be 8-0 = 8; 54 - 3 = 51; 36-2= 34; 22-0= 22.
The data is sorted by study_id.
I appreciate you taking the time to write all this out for me! I can look up the first and last processing for SPSS as it has been a while for me as well.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.