I am sorry if this is a straight forward question but I cannot figure out how to solve this.
I have two datasets, one of values where each row is a set of values (and other columns but they are excluded for the sake of example) and one of sequences which I would like to sum for each observation. In the example data below I would like to have 4 new columns in the values data seq1_sum, seq2_sum, seq3_sum, and seq4_sum. These sums would be the sums over each of the sequences respectively. For example the seq4 column would be the sum across each row for score3, score4, and score7.
sequences:
values:
Use PROC SCORE.
https://documentation.sas.com/doc/en/statug/latest/statug_score_syntax01.htm
You might have to restructure your "scoring" dataset to get it to work.
If you want help provide actual datasets. And the answer you are looking for.
You will probably want to reduce the problem to just a few variables and a few observations of data.
Here is some fake data to more or less match what you have in your screenshots - run this and use PROC PRINT to look at these if you like:
data seqs;
length seq $20 sequences $6 s1-s8 3;
array s {*} s1-s8;
do i=1 to dim(s);
s[i]=(rand('uniform')<0.3);
end;
drop i;
call symputx("nseqs",_N_-1);
infile cards dsd truncover firstobs=1 dlm='|';
input seq sequences;
cards;
1,2,3,4,5,6,7,8|seq1
1,2,3,4|seq2
1,3,6,8|seq3
3,4,7|seq4
;
run;
data values;
length score1-score8 3;
array s {*} score1-score8;
do r=1 to 20;
do i=1 to dim(s);
s[i]=(rand('uniform')<0.6);
end;
output;
end;
drop r i;
run;
I can't say I really understand your request - especially whether you're trying to sum things vertically or horizontally. My assumption based on what you said is that you want to sum horizontally... :
data want;
set
seqs (in=A)
values
;
array T {&nseqs} $20 _temporary_;
array sc {*} score1-score8;
array sq {*} seq1-seq4;
if A then T[_N_]=seq;
else do;
call missing(of sq[*]);
do i=1 to dim(sq);
do s=1 to countW(T[i],',');
sq[i]+sc[scan(T[i],s,',')*1];
end;
end;
output;
end;
keep score1-score8 seq1-seq4;
run;
proc print data=want; run;
Next time, you 'd better post your data by using a data step code,not just a picuture. Nobody would like to type it for you . This time I used quickbluefish 's code to generate your sample data.
data seqs;
length seq $20 sequences $6 s1-s8 3;
array s {*} s1-s8;
do i=1 to dim(s);
s[i]=(rand('uniform')<0.3);
end;
drop i;
call symputx("nseqs",_N_-1);
infile cards dsd truncover firstobs=1 dlm='|';
input seq sequences;
cards;
1,2,3,4,5,6,7,8|seq1
1,2,3,4|seq2
1,3,6,8|seq3
3,4,7|seq4
;
run;
data values;
length score1-score8 3;
array s {*} score1-score8;
do r=1 to 20;
do i=1 to dim(s);
s[i]=(rand('uniform')<0.6);
end;
output;
end;
drop r i;
run;
filename x temp;
data _null_;
set seqs;
file x;
x=cats(sequences,'_sum=sum(',prxchange('s/(\d+)/score\1/',-1,seq),');');
put x;
run;
data want;
set values;
%include x/source;
run;
What you apparently want is code that looks like this:
data want;
set values;
seq1 = sum(score1 ,score2 ,score3 ,score4 ,score5 ,score6 ,score7 ,score8 );
seq2 = sum(score1 ,score2 ,score3 ,score4 );
seq3 = sum(score1 ,score3 ,score6 ,score8 );
seq4 = sum(score3 ,score4 ,score7 );
run;
You can use a DATA _NULL_ step reading dataset SEQ to produce that code in a temporary file, which can then be INCLUDED in a subsequent DATA WANT step, as in:
filename sumcode temp;
data _null_;
set seq;
array sdummies {*} s1-s8 ;
file sumcode;
put seqnames '= sum(' @;
do i=1 to dim(sdummies);
if sdummies{i}=1 then put 'score' i ',' @;
end;
put + (-1) ');';
run;
data want;
set values;
%include sumcode / source2 ;
run;
The primary tool used here is the trailing @ used in the PUT statements, which holds the output column pointer in place, waiting for the next PUT statement. The last PUT has the pointer control '+ (-1)' which moves the pointer back one character, in order to replace the troublesome unnecessary comma appended to the last variable name with a closing paren and a semi-colon.
This program is untested in the absence of sample data in the form of working DATA steps.
I see other responders have utilized the character variable SEQ, instead of the S1 through S8 variables that I used. If those variables are not available, then change the DATA _NULL_ step to:
filename sumcode temp;
data _null_;
set seq;
file sumcode;
put seqnames '= sum(' @;
do i=1 to countw(seq);
addend=cats('score',scan(seq,i)) ;
if i<countw(seq) then addend=cats(addend,',');
put addend @;
end;
put ');';
run;
Use PROC SCORE.
https://documentation.sas.com/doc/en/statug/latest/statug_score_syntax01.htm
You might have to restructure your "scoring" dataset to get it to work.
If you want help provide actual datasets. And the answer you are looking for.
You will probably want to reduce the problem to just a few variables and a few observations of data.
I was motivated by Tom's PROC SCORE.
You could use IML code to get the desired output which is most simple and efficient.
data seqs;
length seq $20 sequences $6 s1-s8 3;
array s {*} s1-s8;
do i=1 to dim(s);
s[i]=(rand('uniform')<0.3);
end;
drop i;
call symputx("nseqs",_N_-1);
infile cards dsd truncover firstobs=1 dlm='|';
input seq sequences;
cards;
1,2,3,4,5,6,7,8|seq1
1,2,3,4|seq2
1,3,6,8|seq3
3,4,7|seq4
;
run;
data values;
length score1-score8 3;
array s {*} score1-score8;
do r=1 to 20;
do i=1 to dim(s);
s[i]=(rand('uniform')<0.6);
end;
output;
end;
drop r i;
run;
proc iml;
use seqs(keep=s:);
read all var _num_ into s;
close;
use values(keep=score:);
read all var _num_ into score;
close;
want=score*t(s);
create want from want[c=('sum_seq1':'sum_seq'+char(nrow(s)))];
append from want;
close;
quit;
Make sure your scoring dataset has _NAME_ and _TYPE_ variables.
data seq ;
_type_='SCORE';
input description:$20. _name_ $ score1-score8;
cards;
1,2,3,4,5,6,7,8 seq1 1 1 1 1 1 1 1 1
1,2,3,4 seq2 1 1 1 1 0 0 0 0
1,3,6,8 seq3 1 0 1 0 0 1 0 1
3,4,7 seq4 0 0 1 1 0 0 1 0
;
Now let's make up a few observations of data:
data values;
input id score1-score8;
cards;
1 1 2 3 4 5 6 7 8
2 0 1 0 1 0 1 0 1
3 1 1 1 1 1 1 1 1
;
Then score it
proc score data=values score=seq out=want;
var score1-score8;
run;
Results:
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.