Hello there,
I've got a question I've yet to find an answer to. Let's say I have a table with columns containing a 1 or 0, something like this:
Value1 | Value2 | Value3 | Value4 |
---|---|---|---|
1 | 0 | 0 | 0 |
0 | 1 | 1 | 0 |
0 | 0 | 1 | 1 |
1 | 0 | 1 | 0 |
1 | 1 | 1 | 1 |
And another table that contains lookup values:
Value | Return |
---|---|
1 | 5 |
2 | 18 |
3 | 2 |
4 | 1 |
What I want to do is add a field which is the sum of the associated value in the lookup table if the indicator is a 1. So something like this:
Value1 | Value2 | Value3 | Value4 | Total |
---|---|---|---|---|
1 | 0 | 0 | 0 | 5 |
0 | 1 | 1 | 0 | 20 |
0 | 0 | 1 | 1 | 3 |
1 | 0 | 1 | 0 | 7 |
1 | 1 | 1 | 1 | 26 |
Thanks in advance
Hash seems pretty handy for this kind of lookup:
data source;
input value1 value2 value3 value4;
cards;
1 0 0 0
0 1 1 0
0 0 1 1
1 0 1 0
1 1 1 1
;;;;
run;
data lookup;
input value return;
cards;
1 5
2 18
3 2
4 1
;;;;
data want;
if _n_=1 then do;
declare hash h(dataset:'lookup');
h.definekey('value');
h.definedata('return');
h.definedone();
call missing (value,return);
end;
set source;
array _v value1-value4;
do over _v;
if _v=1 then do;
rc=h.find(key:_i_);
total=sum(total,return);
end;
end;
drop value return rc;
run;
Haikuo
Are your tables that small?
No. In reality there are 200+ indicator variables with close to a million rows.
How big is the lookup table?
the lookup table has the same number of rows as the indicator variables, so 200+ rows.
You need to know the number of rows vars ahead of time, but you can easily put those into macro variables.
This solutions loads the lookup table into a temporary array and then filters through it. A Hash solution may even be more efficient.
data source;
input value1 value2 value3 value4;
cards;
1 0 0 0
0 1 1 0
0 0 1 1
1 0 1 0
1 1 1 1
;
run;
data lookup;
input value return;
cards;
1 5
2 18
3 2
4 1
;
run;
data want;
array lookup(4) _temporary_;
do i=1 to 4;
set lookup;
lookup(i)=return;
end;
do j=1 to 5;
array obs(4) value1-value4;
set source;
running_total=0;
do i=1 to 4;
running_total=running_total+obs(i)*lookup(i);
end;
output;
drop i j value return;
end;
run;
Given your examples can you explain how you got the value of total for each row?
For the first row, just the value of Value1 is equal to 1. Therefore you look up the rows in the lookup table where Value=1, whose value of the Result column is 5.
For the second row, the value of Value2 and Value3 are equal to 1. Therefore you look up the rows in the lookup table where Value=2 and Value=3 and add the values of the Result column for those rows which are 18 and 2, sum is 20.
For the third row, the value of Value3 and Value4 are equal to 1. Therefore you look up the rows in the lookup table where Value=3 and Value=4 and add the values of the Result column for those rows which are 2 and 1, sum is 3.
And so on...
Why not just transpose the data ? So table one becomes:
RowId Value YesNo
1 1 1
1 2 0
1 3 0
1 4 0
1 5 0
2 1 0
2 2 1
....
Then you can just join with the lookup values an sum up RETURN to get TOTAL.
I considered a transpose solution too, but the problem is that the indicator variable isn't always the same name as the lookup table. So there could be some values where the column name is ValueABC and I would do a translation to look up the value where the Value is 'XYZ' in the lookup table. I should have specified this, but even though a transpose solution would work with the example I provided it wouldn't unfortunately work for the actual problem I am trying to solve.
You could consider PROC SCORE. You again need to process the look up table and name the "VALUE" to match the name in the master. But you don't have to know how many.
This is also basic matrix math, so IML will probably handle it very easily.
Lots of options
I'm sure there are a lot of options since I can't imagine this being uncommon. I just need to try understand your proposed solution with my minimal experience of the SAS language, but I'm hoping I can let you know soon if I worked something out.
Below is a fairly simple way. The first line creates a macro variable called &n. and sets it to the number of rows that you have, thus you would just have to change it from 4 to 200 (or however number of values that you actually have). _n_ is an automatic counter that keeps track of the number of the row you are actually reading. However, within an interation _n_ can be reused and it won't forget its correct value when it gets to the next iteration.
data table1;
input Value1-Value4;
cards;
1 0 0 0
0 1 1 0
0 0 1 1
1 0 1 0
1 1 1 1
;
run;
data table2;
input Value Return;
cards;
1 5
2 18
3 2
4 1
;
run;
%let n=4;
data want (drop=Value Return);
array recodes(&n.)_temporary_;
array values(*) value1-value&n.;
if _n_ eq 1 then do;
do _n_=1 to &n.;
set table2;
recodes(value)=Return;
end;
end;
set table1;
do _n_=1 to dim(values);
total=sum(total,values(_n_)*recodes(_n_));
end;
run;
You will need to get a solution to matching the 0/1 variables to the proper weight value from your lookup table.
The right answer probably depends on why the names are not matching now. If it is just because users are entering names manually perhaps you can use the order of the variables in the dataset to drive the match?
Call your first dataset HAVE and your value lookup table LOOKUP. Then this code should create new dataset WANT that is a copy of HAVE with the new TOTAL variable appended on the right.
data want ;
set have ;
array all _numeric_;
do _i_ = 1 to dim(all) ;
if all(_i_) then do;
set lookup (keep=return) point=_i_ ;
total = sum(total,return);
end;
end;
drop return ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.