Statistical Procedures

Ronein · Posted 01-22-2025 01:40 PM

Hello

Lets say I have a data set with Long structure with all possible combinations of 9 categorical variables.

Lets say that I have another data set with information of coefficients for each category (for each variable)

My target- I want to add to data set possible_combinations_t the following columns:

X1_Coef

X2_Coef

X3_Coef

X4_Coef

X5_Coef

X6_Coef

X7_Coef

X8_Coef

X9_Coef



data possible_combinations_t;
   do x1 = 1 to 3;
   do x2 = 1 to 2;
   do x3 = 1 to 2;
   do x4 = 1 to 3;
   do x5 = 1 to 2;
   do x6 = 1 to 3;
   do x7 = 1 to 4;
   do x8 = 1 to 2;
   do x9 = 1 to 3;
     output;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
run;

Tom · Posted 01-22-2025 01:49 PM

I don't think you provided enough information to answer the question.

If you want to merge two datasets we need to see what the datasets look like. Do they have a common variable that be used in a BY statement? If not you might need to use PROC SQL so you can craft logic to find the matching observations.

Your use of the term COEFFIENT makes it sound like perhaps you want score data using a formula? Did you look into using PROC SCORE?

Tom · Posted 01-22-2025 07:41 PM

Make a dataset with _TYPE_ variable set to SCORE. Use the same variable names ( X1 to X9) to hold the coefficients. Add a _NAME_ variable to hold the name you want for the NEW variable with predicted score.

data score;
  retain _name_ 'MyModel' _type_ 'SCORE' ;
  input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;

So this example SCORE dataset is the formula:

MyModel = X1 +2*X2 +3*X3 +4*x4+5*x5+6*X6+7*X7+8*X8+9*X9;

Then use PROC SCORE to score your dataset.

proc score data=possible_combinations_t score=score out=results;
run;

First 10 results:

Ronein · Posted 01-23-2025 02:57 AM

Can you please show how score data set is looking like?

Does it have only one row??

Ronein · Posted 01-23-2025 03:03 AM

Sorry I dont understand

You created a data set called score with columns X1-X9 (see input statement)

then you used retain statement and wrote there columns _name_ myModel _type_ score.

I dont see these columns in your daat set

data score;
  retain _name_ 'MyModel' _type_ 'SCORE' ;
  input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;

Tom · Posted 01-23-2025 10:46 AM

@Ronein wrote:

Sorry I dont understand

You created a data set called score with columns X1-X9 (see input statement)

then you used retain statement and wrote there columns _name_ myModel _type_ score.

I dont see these columns in your daat set
data score;
  retain _name_ 'MyModel' _type_ 'SCORE' ;
  input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;

Did you trying running PROC PRINT on the SCORE dataset?

Ronein · Posted 01-23-2025 02:56 AM

Sorry

Here are 2 data sets I want to merge

data possible_combinations_t;
   do x1 = 1 to 3;
   do x2 = 1 to 2;
   do x3 = 1 to 2;
   do x4 = 1 to 3;
   do x5 = 1 to 2;
   do x6 = 1 to 3;
   do x7 = 1 to 4;
   do x8 = 1 to 2;
   do x9 = 1 to 3;
     output;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
run;

Data coef_t;
input var $  value coefficient;
cards;
X1 1	 -0.1
X1 2	 0.25
X1 3 	0
X2 1	 0.35
X2 2	 0
X3 1	 0.75
X3 2	 0
X4 1	 0.95
X4 2	 1.45
X4 3 	0
X5 1	 -1.2
X5 2	 0
X6 1	 -1.1
X6 2	 2.2
X6 3	 0
X7 1	 -0.7
X7 2	 0.8
X7 3	 1.6
X7 4 	0
X8 1	 -1.1
X8 2  0
X9 1	 0.6
X9 2	 0.9
X9 3 	0
INTER 1 -2.5
;
Run;

yabwon · Posted 01-23-2025 04:08 AM

You are on this forum for so long that you could at least to try provide some "my own experiments" solutions in your questions.

Try this:

data possible_combinations_t;
   do x1 = 1 to 3;
   do x2 = 1 to 2;
   do x3 = 1 to 2;
   do x4 = 1 to 3;
   do x5 = 1 to 2;
   do x6 = 1 to 3;
   do x7 = 1 to 4;
   do x8 = 1 to 2;
   do x9 = 1 to 3;
     output;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
run;

Data coef_t;
input var $  value coefficient;
cards;
X1 1	 -0.1
X1 2	 0.25
X1 3 	0
X2 1	 0.35
X2 2	 0
X3 1	 0.75
X3 2	 0
X4 1	 0.95
X4 2	 1.45
X4 3 	0
X5 1	 -1.2
X5 2	 0
X6 1	 -1.1
X6 2	 2.2
X6 3	 0
X7 1	 -0.7
X7 2	 0.8
X7 3	 1.6
X7 4 	0
X8 1	 -1.1
X8 2  0
X9 1	 0.6
X9 2	 0.9
X9 3 	0
INTER 1 -2.5
;
Run;

data want;
  if 1=_N_ then
    do;
      if 0 then set coef_t;
      declare hash H(dataset:"coef_t");
      H.defineKey("var", "value");
      H.defineData("coefficient");
      H.defineDone();
      drop i var value coefficient;
    end;

  set possible_combinations_t;

  /* X1_Coef ... x9_Coef  - stupid SAS manes, 
  that's why I'm caling them XCoef_1 = XCoef_9 
  */
  array XCoef_[9];
  array x[9];

  do i = 1 to dim(x);
    if 0=H.find(key: strip(upcase(vname(x[i]))), key:x[i]) 
      then XCoef_[i] = coefficient; 
  end;

  /* INTER=-2.5; */ /* <-- uncomment if you need INTER */
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Tom · Posted 01-23-2025 10:45 AM

Not clear where you are going with this, but to combine those two datasets I would convert the first one into a "tall" dataset so it has VAR and VALUE variables like the second one does. Might help to remember which observations the values came from.

data tall;
  row+1;
  set possible_combinations_t ;
  length var $8 value 8;
  array x[9] ;
  do index=1 to dim(x);
     var=upcase(vname(x[index]));
     value=x[index];
     output;
  end;
  drop x: ;
run;

Now you can sort and merge by VAR and VALUE. Or let PROC SQL do it for you.

proc sql ;
create table want as 
select * 
from tall natural join coef_t
;
quit;

Results:

...

Statistical Procedures

Merge

Re: Merge

Re: Merge

Re: Merge

Re: Merge

Re: Merge

Re: Merge

Re: Merge

Re: Merge

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...