BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tonybesas
Obsidian | Level 7

I have this primary dataset that looks like this, with thousands of lines and hundreds of columns:

 

file   baby back ball bear beat begin call
text1  3    4    0    1    0    1     0
text2  2    0    0    3    3    1     2
text3  0    0    0    1    0    5     1
text4  1    1    2    1    1    4     1

And I also have this secondary dataset that looks like this, which spells out the variables in the primary dataset that should be added up:

 

word pole
baby f3neg
back f1pos
ball f1pos
bear f2pos
beat f2neg
begin f2neg
call f3pos

 

What I'd like to do is, for each value of 'file' in the primary database, to sum up the values for the words in the primary database according to the distribution of the words in the secondary database, like this:

 

F1=(sum of all f1pos words if any) - (sum of all f1neg words if any)
F2=(sum of all f2pos words if any) - (sum of all f2neg words if any)
F3=(sum of all f3pos words if any) - (sum of all f3neg words if any)

 

For example:

 

F1 for text1 = ( 4 for back + 0 for ball ) - ( 0 because no words are marked as f1neg in the secondary database ) = 4
F1 for text2 = ( 0 for back + 0 for ball ) - ( 0 because no words are marked as f1neg in the secondary database ) = 0
F1 for text3 = ( 0 for back + 0 for ball ) - ( 0 because no words are marked as f1neg in the secondary database ) = 0
F1 for text4 = ( 1 for back + 2 for ball ) - ( 0 because no words are marked as f1neg in the secondary database ) = 3

 

F2 for text1 = ( 1 for bear ) - ( 0 for beat + 1 for begin ) = 0
F2 for text2 = ( 3 for bear ) - ( 3 for beat + 1 for begin ) = -1
F2 for text3 = ( 1 for bear ) - ( 0 for beat + 5 for begin ) = -4
F2 for text4 = ( 1 for bear ) - ( 1 for beat + 4 for begin ) = -4

 

and so on for F3.

 

and then add this back to the primary database:

 

file   baby back ball bear beat begin call  F1   F2   F3
text1  3    4    0    1    0    1     0     4     0   -3
text2  2    0    0    3    3    1     2     0    -1    0 
text3  0    0    0    1    0    5     1     0    -4    1
text4  1    1    2    1    1    4     1     3    -4    0

How could this be done in SAS?

 

thank you ahead

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you have SAS/STAT licensed then use PROC SCORE.  Your metadata needs to be in the format of a scoring dataset.

Obs    _type_    _name_    back    ball    bear    beat    begin    baby    call

 1     SCORE       f1        1       1       .       .        .       .       .
 2     SCORE       f2        .       .       1      -1       -1       .       .
 3     SCORE       f3        .       .       .       .        .      -1       1

Then you can use it with PROC SCORE.

proc score data=one score=score out=want;
run;

You can convert your existing layout with just a little coding.

data two;
  input source :$32. target :$32. factor ;
datalines;
baby f3 -1
back f1 1
ball f1 1
bear f2 1
beat f2 -1
begin f2 -1
call f3 1
;

proc sort data=two;
  by target ;
run;
proc transpose data=two out=score ;
  by target ;
  id source;
  var factor;
run;
data score;
  _type_='SCORE';
  set score;
  drop _name_;
  rename target=_name_;
run;

View solution in original post

18 REPLIES 18
PeterClemmensen
Tourmaline | Level 20

@tonybesas Hi and welcome to the SAS Community 🙂

 

Here is one way

 

data one;
input file $ baby back ball bear beat begin call;
datalines;
text1 3 4 0 1 0 1 0
text2 2 0 0 3 3 1 2
text3 0 0 0 1 0 5 1
text4 1 1 2 1 1 4 1
;

data two;
input word $ pole $;
datalines;
baby f3neg
back f1pos
ball f1pos
bear f2pos
beat f2neg
begin f2neg
call f3pos
;

data want(drop=i v word pole);
    if _N_=1 then do;
        declare hash h(dataset:'two');
        h.definekey('word', 'pole');
        h.definedone();
    end;

    set one;
    if 0 then set two;
    array _ {*} _numeric_;
    
    do i=1 to dim(_);
        v=vname(_[i]);
        if h.check(key:v, key:'f1pos')=0 then F1=sum(F1,    _[i]);
        if h.check(key:v, key:'f1neg')=0 then F1=sum(F1, -1*_[i]);
        if h.check(key:v, key:'f2pos')=0 then F2=sum(F2,    _[i]);
        if h.check(key:v, key:'f2neg')=0 then F2=sum(F2, -1*_[i]);
        if h.check(key:v, key:'f3pos')=0 then F3=sum(F3,    _[i]);
        if h.check(key:v, key:'f3neg')=0 then F3=sum(F3, -1*_[i]);
    end;
run;

 

Result

 

file  baby back ball bear beat begin call F1  F2  F3
text1 3    4    0    1    0    1     0    4   0   -3
text2 2    0    0    3    3    1     2    0   -1  0
text3 0    0    0    1    0    5     1    0   -4  1
text4 1    1    2    1    1    4     1    3   -4  0

 

tonybesas
Obsidian | Level 7

This works great, really fast! Thank you for taking the time to do this, draycut!

PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

%macro dothis;
proc sql noprint;
    select distinct word into :f1pos separated by ',' from secondary_data_set
        where lowcase(pole)='f1pos';
    select distinct word into :f1neg separated by ',' from secondary_data_set
        where lowcase(pole)='f1neg';
    select distinct word into :f2pos separated by ',' from secondary_data_set
        where lowcase(pole)='f2pos';
    select distinct word into :f2neg separated by ',' from secondary_data_set
        where lowcase(pole)='f2neg';
    select distinct word into :f3pos separated by ',' from secondary_data_set
        where lowcase(pole)='f3pos';
    select distinct word into :f3neg separated by ',' from secondary_data_set
        where lowcase(pole)='f3neg';
quit;

data want;
    set have;
    f1=%if &f1pos^= %then sum(&f1pos); -
       %if &f1neg^= %then sum(&f1neg); ;
    /* you do the typing for F2 and F3, I'm lazy */
run;
%mend;

%dothis
--
Paige Miller
tonybesas
Obsidian | Level 7

Thank you for contributing this code! it works really well!

novinosrin
Tourmaline | Level 20

Hi @tonybesas   First off, very interesting question from the perspective of 

 1. How to build an algorithm that will scale

 2. How to structure the flow

 3. How to look up, rotate, look up and again rotate or in other words going from vertical to horizontal and vice versa.

 

The following is simple data-driven and yet will scale. The positive and negatives are the only categorical component that matters in a series and the series can be any number. 

 


data one;
input file $ baby back ball bear beat begin call;
datalines;
text1 3 4 0 1 0 1 0
text2 2 0 0 3 3 1 2
text3 0 0 0 1 0 5 1
text4 1 1 2 1 1 4 1
;

data two;
input word $ pole $;
datalines;
baby f3neg
back f1pos
ball f1pos
bear f2pos
beat f2neg
begin f2neg
call f3pos
;
proc transpose data=one out=t;
by file;
var baby--call;
run;
proc sql;
create table t2 as
select file,substr(pole,1,2) as series, substr(pole,3) as sign,sum(col1) as s
from t a left join two b
on a._name_=b.word
group by file,series,sign
order by file,series,sign;
quit;

data t3;
set t2;
by file series;
dif=dif(s);
if first.series then dif=s;
if last.series;
run;

proc transpose data=t3(keep=file series dif) out=want(drop=_name_);
by file ;
id series;
var dif;
run;

data final_want;
merge one want;
by file;
run;
novinosrin
Tourmaline | Level 20

data one;
input file $ baby back ball bear beat begin call;
datalines;
text1 3 4 0 1 0 1 0
text2 2 0 0 3 3 1 2
text3 0 0 0 1 0 5 1
text4 1 1 2 1 1 4 1
;

data two;
input word $ pole $;
datalines;
baby f3neg
back f1pos
ball f1pos
bear f2pos
beat f2neg
begin f2neg
call f3pos
;

data temp;
 if _n_=1 then do;
  if 0 then set one two;
  dcl hash H (dataset:'two');
  h.definekey  ("word");
  h.definedata ("pole");
  h.definedone ();
  dcl hash H1 (ordered:'y');
  h1.definekey  ("pole");
  h1.definedata ("pole","s");
  h1.definedone ();
  dcl hiter hi('h1');
 end;
 set one;
 array t baby--call;
 do over t;
  rc=h.find(key:vname(t));
  if h1.find() ne 0 then s=t;
  else s=sum(s,t);
  h1.replace();
 end;
 do while(hi.next()=0);
  series=substr(pole,1,2);
  output;
 end;
 h1.clear();
 keep file series  s;
run;

proc sql noprint;
select distinct series into :series separated by ' '
from w1;
quit;

data want;
 if _n_=1 then do;
 if 0 then set one ;
   dcl hash H (dataset:'one') ;
   h.definekey  ("file") ;
   h.definedata (all:'y') ;
   h.definedone () ; 
   array t(*) &series;
 end;
 do _n_=1 by 1 until(last.file);
  do until(last.series);
   set temp;
   by file series;
   dif=dif(s);
   if first.series then dif=s;
  end;
  h.find();
  t(_n_)=dif;
 end;
 drop series dif s;
run;
tonybesas
Obsidian | Level 7

Great code, took care of the job! thank you, novinosrin,

tonybesas
Obsidian | Level 7

Sorry, I meant to reply to your first code, novinosrin, but somehow this ended up beneath your second code. It turns out, the second code generates an error message, pointing to the ampersand:

array t(*) &series;
ERROR 22-322: Syntax error, expecting one of the following: a name, a numeric constant, $, (, ;, VARCHAR, _ALL_, _CHARACTER_,
_CHAR_, _NUMERIC_, _TEMPORARY_.

 

tonybesas
Obsidian | Level 7

Great code, took care of the job! thank you, novinosrin,

Tom
Super User Tom
Super User

I would format you metadata table to be easier to use for generating the code.

 

data two;
  input source :$32. target :$32. sign :$1.;
datalines;
baby f3 -
back f1 +
ball f1 +
bear f2 +
beat f2 -
begin f2 -
call f3 +
;

proc sort data=two;
  by target source ;
run;

Then you can use that metadata to generate the statements you need.

 

 

filename code temp;
data _null_;
  file code ;
  set two ;
  by target ;
  if first.target then put target '=sum(' @;
  else put ',' @;
  put sign source @ ;
  if last.target then put ');' ;
run;

Resulting text file:

f1 =sum(+ back ,+ ball );
f2 =sum(+ bear ,- beat ,- begin );
f3 =sum(- baby ,+ call );

You can then use that code against your input data.

38   data want;
39     set one ;
40     %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file /Volumes/scratch/SAS_work3B6800005EBE_amrdrml464.pfizer.com/#LN00020.
41  +f1 =sum(+ back ,+ ball );
42  +f2 =sum(+ bear ,- beat ,- begin );
43  +f3 =sum(- baby ,+ call );
NOTE: %INCLUDE (level 1) ending.
44   run;

Result:

Obs    file     baby    back    ball    bear    beat    begin    call    f1    f2    f3

 1     text1      3       4       0       1       0       1        0      4     0    -3
 2     text2      2       0       0       3       3       1        2      0    -1     0
 3     text3      0       0       0       1       0       5        1      0    -4     1
 4     text4      1       1       2       1       1       4        1      3    -4     0

 

tonybesas
Obsidian | Level 7

thank you, Tom, this worked really well! 

Tom
Super User Tom
Super User

If you have SAS/STAT licensed then use PROC SCORE.  Your metadata needs to be in the format of a scoring dataset.

Obs    _type_    _name_    back    ball    bear    beat    begin    baby    call

 1     SCORE       f1        1       1       .       .        .       .       .
 2     SCORE       f2        .       .       1      -1       -1       .       .
 3     SCORE       f3        .       .       .       .        .      -1       1

Then you can use it with PROC SCORE.

proc score data=one score=score out=want;
run;

You can convert your existing layout with just a little coding.

data two;
  input source :$32. target :$32. factor ;
datalines;
baby f3 -1
back f1 1
ball f1 1
bear f2 1
beat f2 -1
begin f2 -1
call f3 1
;

proc sort data=two;
  by target ;
run;
proc transpose data=two out=score ;
  by target ;
  id source;
  var factor;
run;
data score;
  _type_='SCORE';
  set score;
  drop _name_;
  rename target=_name_;
run;
tonybesas
Obsidian | Level 7

Yes, I have PROC SCORE in my installation, and this works really well! Thank you, Tom, for your contribution!

hashman
Ammonite | Level 13

@tonybesas:

Something like below should work - under the assumption that the order of WORD values in TWO matches that of the like-named variables in ONE:

data one ;                                                                                                                                                                                                                                                      
  input file $ baby back ball bear beat begin call ;                                                                                                                                                                                                            
  lines ;                                                                                                                                                                                                                                                       
text1 3 4 0 1 0 1 0                                                                                                                                                                                                                                             
text2 2 0 0 3 3 1 2                                                                                                                                                                                                                                             
text3 0 0 0 1 0 5 1                                                                                                                                                                                                                                             
text4 1 1 2 1 1 4 1                                                                                                                                                                                                                                             
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data two ;                                                                                                                                                                                                                                                      
  input word $ pole $ ;                                                                                                                                                                                                                                         
  lines ;                                                                                                                                                                                                                                                       
baby  f3neg                                                                                                                                                                                                                                                     
back  f1pos                                                                                                                                                                                                                                                     
ball  f1pos                                                                                                                                                                                                                                                     
bear  f2pos                                                                                                                                                                                                                                                     
beat  f2neg                                                                                                                                                                                                                                                     
begin f2neg                                                                                                                                                                                                                                                     
call  f3pos                                                                                                                                                                                                                                                     
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want ;                                                                                                                                                                                                                                                     
  array x [2,7] _temporary_ ;                                                                                                                                                                                                                                   
  if _n_ = 1 then do _i_ = 1 by 1 until (z) ;                                                                                                                                                                                                                   
    set two (keep = pole) end = z ;                                                                                                                                                                                                                             
    x[1,_i_] = input (char (pole, 2), 1.) ;                                                                                                                                                                                                                     
    x[2,_i_] = choosen (findc ("np", char (pole,3)), -1, 1) ;                                                                                                                                                                                                   
  end ;                                                                                                                                                                                                                                                         
  set one ;                                                                                                                                                                                                                                                     
  array v baby--call ;                                                                                                                                                                                                                                          
  array F [3] ;                                                                                                                                                                                                                                                 
  do over v ;                                                                                                                                                                                                                                                   
    _n_ = x[1,_i_] ;                                                                                                                                                                                                                                            
    f[_n_] = sum (f[_n_], v * x[2,_i_]) ;                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
run ;                             

Or, alternatively, the array X can be replaced by a hash table, which has the advantage of not making the assumption mentioned above, as the attributes for each variable (which F and which sign) are searched by the variable name in the table X:

data want (drop = _:) ;                                                                                                                                                                                                                                         
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    dcl hash x () ;                                                                                                                                                                                                                                             
    x.definekey ("_w") ;                                                                                                                                                                                                                                        
    x.definedata ("_n_", "_iorc_") ;                                                                                                                                                                                                                            
    x.definedone () ;                                                                                                                                                                                                                                           
    do _n_ = 1 by 1 until (z) ;                                                                                                                                                                                                                                 
      set two (rename=word=_w) end = z ;                                                                                                                                                                                                                        
      x.add (key:_w, data:input(char(pole,2),1.), data:choosen(findc("np",char (pole,3)),-1,1)) ;                                                                                                                                                               
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  set one ;                                                                                                                                                                                                                                                     
  array v baby--call ;                                                                                                                                                                                                                                          
  array F [3] ;                                                                                                                                                                                                                                                 
  do over v ;                                                                                                                                                                                                                                                   
    x.find (key:vname(v)) ;                                                                                                                                                                                                                                     
    f[_n_] = sum (f[_n_], v * _iorc_) ;                                                                                                                                                                                                                         
  end ;                                                                                                                                                                                                                                                         
run ;

Kind regards

Paul D. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1827 views
  • 7 likes
  • 6 in conversation