Hello
Lets say I have a data set with Long structure with all possible combinations of 9 categorical variables.
Lets say that I have another data set with information of coefficients for each category (for each variable)
My target- I want to add to data set possible_combinations_t the following columns:
X1_Coef
X2_Coef
X3_Coef
X4_Coef
X5_Coef
X6_Coef
X7_Coef
X8_Coef
X9_Coef
data possible_combinations_t;
do x1 = 1 to 3;
do x2 = 1 to 2;
do x3 = 1 to 2;
do x4 = 1 to 3;
do x5 = 1 to 2;
do x6 = 1 to 3;
do x7 = 1 to 4;
do x8 = 1 to 2;
do x9 = 1 to 3;
output;
end;
end;
end;
end;
end;
end;
end;
end;
end;
run;
I don't think you provided enough information to answer the question.
If you want to merge two datasets we need to see what the datasets look like. Do they have a common variable that be used in a BY statement? If not you might need to use PROC SQL so you can craft logic to find the matching observations.
Your use of the term COEFFIENT makes it sound like perhaps you want score data using a formula? Did you look into using PROC SCORE?
Make a dataset with _TYPE_ variable set to SCORE. Use the same variable names ( X1 to X9) to hold the coefficients. Add a _NAME_ variable to hold the name you want for the NEW variable with predicted score.
data score;
retain _name_ 'MyModel' _type_ 'SCORE' ;
input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;
So this example SCORE dataset is the formula:
MyModel = X1 +2*X2 +3*X3 +4*x4+5*x5+6*X6+7*X7+8*X8+9*X9;
Then use PROC SCORE to score your dataset.
proc score data=possible_combinations_t score=score out=results;
run;
First 10 results:
Can you please show how score data set is looking like?
Does it have only one row??
Sorry I dont understand
You created a data set called score with columns X1-X9 (see input statement)
then you used retain statement and wrote there columns _name_ myModel _type_ score.
I dont see these columns in your daat set
data score;
retain _name_ 'MyModel' _type_ 'SCORE' ;
input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;
@Ronein wrote:
Sorry I dont understand
You created a data set called score with columns X1-X9 (see input statement)
then you used retain statement and wrote there columns _name_ myModel _type_ score.
I dont see these columns in your daat set
data score; retain _name_ 'MyModel' _type_ 'SCORE' ; input x1-x9; cards; 1 2 3 4 5 6 7 8 9 ;
Did you trying running PROC PRINT on the SCORE dataset?
Sorry
Here are 2 data sets I want to merge
data possible_combinations_t;
do x1 = 1 to 3;
do x2 = 1 to 2;
do x3 = 1 to 2;
do x4 = 1 to 3;
do x5 = 1 to 2;
do x6 = 1 to 3;
do x7 = 1 to 4;
do x8 = 1 to 2;
do x9 = 1 to 3;
output;
end;
end;
end;
end;
end;
end;
end;
end;
end;
run;
Data coef_t;
input var $ value coefficient;
cards;
X1 1 -0.1
X1 2 0.25
X1 3 0
X2 1 0.35
X2 2 0
X3 1 0.75
X3 2 0
X4 1 0.95
X4 2 1.45
X4 3 0
X5 1 -1.2
X5 2 0
X6 1 -1.1
X6 2 2.2
X6 3 0
X7 1 -0.7
X7 2 0.8
X7 3 1.6
X7 4 0
X8 1 -1.1
X8 2 0
X9 1 0.6
X9 2 0.9
X9 3 0
INTER 1 -2.5
;
Run;
You are on this forum for so long that you could at least to try provide some "my own experiments" solutions in your questions.
Try this:
data possible_combinations_t;
do x1 = 1 to 3;
do x2 = 1 to 2;
do x3 = 1 to 2;
do x4 = 1 to 3;
do x5 = 1 to 2;
do x6 = 1 to 3;
do x7 = 1 to 4;
do x8 = 1 to 2;
do x9 = 1 to 3;
output;
end;
end;
end;
end;
end;
end;
end;
end;
end;
run;
Data coef_t;
input var $ value coefficient;
cards;
X1 1 -0.1
X1 2 0.25
X1 3 0
X2 1 0.35
X2 2 0
X3 1 0.75
X3 2 0
X4 1 0.95
X4 2 1.45
X4 3 0
X5 1 -1.2
X5 2 0
X6 1 -1.1
X6 2 2.2
X6 3 0
X7 1 -0.7
X7 2 0.8
X7 3 1.6
X7 4 0
X8 1 -1.1
X8 2 0
X9 1 0.6
X9 2 0.9
X9 3 0
INTER 1 -2.5
;
Run;
data want;
if 1=_N_ then
do;
if 0 then set coef_t;
declare hash H(dataset:"coef_t");
H.defineKey("var", "value");
H.defineData("coefficient");
H.defineDone();
drop i var value coefficient;
end;
set possible_combinations_t;
/* X1_Coef ... x9_Coef - stupid SAS manes,
that's why I'm caling them XCoef_1 = XCoef_9
*/
array XCoef_[9];
array x[9];
do i = 1 to dim(x);
if 0=H.find(key: strip(upcase(vname(x[i]))), key:x[i])
then XCoef_[i] = coefficient;
end;
/* INTER=-2.5; */ /* <-- uncomment if you need INTER */
run;
Bart
Not clear where you are going with this, but to combine those two datasets I would convert the first one into a "tall" dataset so it has VAR and VALUE variables like the second one does. Might help to remember which observations the values came from.
data tall;
row+1;
set possible_combinations_t ;
length var $8 value 8;
array x[9] ;
do index=1 to dim(x);
var=upcase(vname(x[index]));
value=x[index];
output;
end;
drop x: ;
run;
Now you can sort and merge by VAR and VALUE. Or let PROC SQL do it for you.
proc sql ;
create table want as
select *
from tall natural join coef_t
;
quit;
Results:
...
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.