Hello Everyone!
In my below datasets, I need to compare between two unique columns (jp1~jp2, eye1~eye2 etc) and calculate value based on condition as if both contain value of 1, then display as 1 else zero.
Once done, sum all the score from derived columns.
In my actual datasets, I have two hundred columns with different name (not end with suffix 1 or 2)
and I need to follow below approach,
Is there any alternative way to program it.
Thank you
data testing;
input pid visit jp1 jp2 eye1 eye2 ear1 ear2 nose1 nose2;
cards;
101 2 1 1 1 1 1 0 0 1
102 3 0 0 1 0 1 1 1 1
102 5 0 0 1 0 1 1 0 0
102 6 1 1 1 1 1 1 1 1
103 4 0 0 0 0 0 0 1 1
104 7 1 1 1 1 1 1 1 0
105 3 . 0 0 . . . . .
;
run;
data comp;
set testing;
by pid visit;
if jp1=1 and jp2 =1 then jp_score=1; else f_score=0;
if eye1=1 and eye2 =1 then eye_score=1 ;else eye_score=0;
if ear1=1 and ear2 =1 then ear_score=1; else ear_score=0;
if nose1=1 and nose2 =1 then nos_score=1; else nos_score=0;
sum_score=sum(f_score,eye_score,ear_score,nos_score);
run;
I have two hundred columns with different name (not end with suffix 1 or 2).
This is a complication. If it had not been for the above, ARRAY would work well. However, what are the real variable names? Are the two variables to compare always next to each other with relatively similar names, or are the names not similar so that one such variable is named (for example) COPN and the variable it is supposed to be compared with named FROGZ12 (which is not at all similar to the variable name COPN) and is not next to COPN? We need details here.
@abraham1 I need to know if the variables to compare are consecutive in the data set, or can they be widely separated. Please answer.
Hello @abraham1,
As long as your sample dataset TESTING with its pairs of variables next to each other is representative for your real data, you could transpose the data from wide to long and finally back:
/* Change the data structure from wide to long */
proc transpose data=testing out=long;
by pid visit;
run;
/* Compute the índividual scores */
data complong(drop=_: col1);
set long;
by pid visit;
if first.visit then suffix=0;
_m=mod(_n_-1,2);
suffix+_m;
score = col1=1 & lag(col1)=1;
if _m;
run;
/* Create wide output dataset and compute sum score */
proc transpose data=complong out=compwide(drop=_:) prefix=score;
by pid visit;
var score;
id suffix;
run;
data want;
set compwide;
sum_score=sum(of score:);
run;
The individual scores are now named SCORE1, SCORE2, etc. for simplicity. If needed, you can rename or label them based on the names of the original variables.
(Note the inconsistency "jp_score" vs. "f_score" in your COMP dataset.)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.