BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Coreyw184
Calcite | Level 5

I am sorry if this is a straight forward question but I cannot figure out how to solve this.

I have two datasets, one of values where each row is a set of values (and other columns but they are excluded for the sake of example) and one of sequences which I would like to sum for each observation. In the example data below I would like to have 4 new columns in the values data seq1_sum, seq2_sum, seq3_sum, and seq4_sum. These sums would be the sums over each of the sequences respectively. For example the seq4 column would be the sum across each row for score3, score4, and score7.

sequences:

Coreyw184_0-1741648086035.png

values:

Coreyw184_1-1741648167609.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Use PROC SCORE.

https://documentation.sas.com/doc/en/statug/latest/statug_score_syntax01.htm

 

You might have to restructure your "scoring" dataset to get it to work.

 

If you want help provide actual datasets. And the answer you are looking for.

You will probably want to reduce the problem to just a few variables and a few observations of data.

 

View solution in original post

7 REPLIES 7
quickbluefish
Barite | Level 11

Here is some fake data to more or less match what you have in your screenshots - run this and use PROC PRINT to look at these if you like:

data seqs;
length seq $20 sequences $6 s1-s8 3;
array s {*} s1-s8;
do i=1 to dim(s);
	s[i]=(rand('uniform')<0.3);
end;
drop i;
call symputx("nseqs",_N_-1);
infile cards dsd truncover firstobs=1 dlm='|';
input seq sequences;
cards;
1,2,3,4,5,6,7,8|seq1
1,2,3,4|seq2
1,3,6,8|seq3
3,4,7|seq4
;
run;

data values;
length score1-score8 3;
array s {*} score1-score8;
do r=1 to 20;
	do i=1 to dim(s);
		s[i]=(rand('uniform')<0.6);
	end;
	output;
end;
drop r i;
run;

I can't say I really understand your request - especially whether you're trying to sum things vertically or horizontally.  My assumption based on what you said is that you want to sum horizontally... :

data want;
set
	seqs (in=A)
	values
	;
array T {&nseqs} $20 _temporary_;
array sc {*} score1-score8;
array sq {*} seq1-seq4;
if A then T[_N_]=seq;
else do;
	call missing(of sq[*]);
	do i=1 to dim(sq);
		do s=1 to countW(T[i],',');
			sq[i]+sc[scan(T[i],s,',')*1];
		end;
	end;
	output;
end;
keep score1-score8 seq1-seq4;
run;

proc print data=want; run;

 

Ksharp
Super User

Next time, you 'd better post your data by using a data step code,not just a picuture. Nobody would like to type it for you . This time I used quickbluefish 's code to generate your sample data.

 

data seqs;
length seq $20 sequences $6 s1-s8 3;
array s {*} s1-s8;
do i=1 to dim(s);
	s[i]=(rand('uniform')<0.3);
end;
drop i;
call symputx("nseqs",_N_-1);
infile cards dsd truncover firstobs=1 dlm='|';
input seq sequences;
cards;
1,2,3,4,5,6,7,8|seq1
1,2,3,4|seq2
1,3,6,8|seq3
3,4,7|seq4
;
run;

data values;
length score1-score8 3;
array s {*} score1-score8;
do r=1 to 20;
	do i=1 to dim(s);
		s[i]=(rand('uniform')<0.6);
	end;
	output;
end;
drop r i;
run;

filename x temp;
data _null_;
 set seqs;
 file x;
 x=cats(sequences,'_sum=sum(',prxchange('s/(\d+)/score\1/',-1,seq),');');
 put x;
run;
data want;
 set values;
 %include x/source;
run;
mkeintz
PROC Star

What you apparently want is code that looks like this:

 

data want;
  set values;
  seq1 = sum(score1 ,score2 ,score3 ,score4 ,score5 ,score6 ,score7 ,score8 );
  seq2 = sum(score1 ,score2 ,score3 ,score4 );
  seq3 = sum(score1 ,score3 ,score6 ,score8 );
  seq4 = sum(score3 ,score4 ,score7 );
run;

You can use a DATA _NULL_ step reading dataset SEQ to produce that code in a temporary file, which can then be INCLUDED in a subsequent DATA WANT step, as in:

 

filename sumcode temp;
data _null_;
  set seq;

  array sdummies {*} s1-s8 ;

  file sumcode;
  put seqnames '= sum(' @;
  do i=1 to dim(sdummies);
    if sdummies{i}=1 then put 'score' i ',' @;
  end;
  put + (-1) ');';
run;

data want;
  set values;
  %include sumcode / source2 ;
run;

 

 

The primary tool used here is the trailing @ used in the PUT statements, which holds the output column pointer in place, waiting for the next PUT statement.  The last PUT has the pointer control '+ (-1)' which moves the pointer back one character, in order to replace the troublesome unnecessary comma appended to the last variable name with a closing paren and a semi-colon.

 

This program is untested in the absence of sample data in the form of working DATA steps.

 

I see other responders have utilized the character variable SEQ, instead of the S1 through S8 variables that I used.  If those variables are not available, then change the DATA _NULL_ step to:

 

filename sumcode temp;
data _null_;
  set seq;
  file sumcode;

  put seqnames '= sum(' @;
  do i=1 to countw(seq);
    addend=cats('score',scan(seq,i)) ;
    if i<countw(seq) then addend=cats(addend,',');
    put addend @;
  end;
  put ');';
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Use PROC SCORE.

https://documentation.sas.com/doc/en/statug/latest/statug_score_syntax01.htm

 

You might have to restructure your "scoring" dataset to get it to work.

 

If you want help provide actual datasets. And the answer you are looking for.

You will probably want to reduce the problem to just a few variables and a few observations of data.

 

Coreyw184
Calcite | Level 5
PROC SCORE was exactly what I needed, thank you!
Ksharp
Super User

I was motivated by Tom's PROC SCORE.

You could use IML code to get the desired output which is most simple and efficient.

 

data seqs;
length seq $20 sequences $6 s1-s8 3;
array s {*} s1-s8;
do i=1 to dim(s);
	s[i]=(rand('uniform')<0.3);
end;
drop i;
call symputx("nseqs",_N_-1);
infile cards dsd truncover firstobs=1 dlm='|';
input seq sequences;
cards;
1,2,3,4,5,6,7,8|seq1
1,2,3,4|seq2
1,3,6,8|seq3
3,4,7|seq4
;
run;

data values;
length score1-score8 3;
array s {*} score1-score8;
do r=1 to 20;
	do i=1 to dim(s);
		s[i]=(rand('uniform')<0.6);
	end;
	output;
end;
drop r i;
run;




proc iml;
use seqs(keep=s:);
read all var _num_ into s;
close;
use values(keep=score:);
read all var _num_ into score;
close;
want=score*t(s);
create want from want[c=('sum_seq1':'sum_seq'+char(nrow(s)))];
append from want;
close;
quit;

Tom
Super User Tom
Super User

Make sure your scoring dataset has _NAME_ and _TYPE_ variables.

 

data seq ;
  _type_='SCORE';
  input description:$20. _name_ $ score1-score8;
cards;
1,2,3,4,5,6,7,8 seq1 1 1 1 1 1 1 1 1
1,2,3,4         seq2 1 1 1 1 0 0 0 0
1,3,6,8         seq3 1 0 1 0 0 1 0 1
3,4,7           seq4 0 0 1 1 0 0 1 0
;

Now let's make up a few observations of data:

data values;
  input id score1-score8;
cards;
1 1 2 3 4 5 6 7 8
2 0 1 0 1 0 1 0 1
3 1 1 1 1 1 1 1 1
;

Then score it

proc score data=values score=seq out=want;
  var score1-score8;
run;

Results:

Tom_0-1741699536988.png

 

 

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1184 views
  • 4 likes
  • 5 in conversation