Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

Lets say I have a data set with Long structure with all possible combinations of 9 categorical variables.

Lets say that I have another data set with information of coefficients for each category (for each variable)

My target- I want to add to data set possible_combinations_t the following columns:

X1_Coef

X2_Coef

X3_Coef

X4_Coef

X5_Coef

X6_Coef

X7_Coef

X8_Coef

X9_Coef 

 



data possible_combinations_t;
   do x1 = 1 to 3;
   do x2 = 1 to 2;
   do x3 = 1 to 2;
   do x4 = 1 to 3;
   do x5 = 1 to 2;
   do x6 = 1 to 3;
   do x7 = 1 to 4;
   do x8 = 1 to 2;
   do x9 = 1 to 3;
     output;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
run;

 

 

8 REPLIES 8
Tom
Super User Tom
Super User

I don't think you provided enough information to answer the question.

If you want to merge two datasets we need to see what the datasets look like.  Do they have a common variable that be used in a BY statement?  If not you might need to use PROC SQL so you can craft logic to find the matching observations.

 

Your use of the term COEFFIENT makes it sound like perhaps you want score data using a formula?  Did you look into using PROC SCORE?

Tom
Super User Tom
Super User

Make a dataset with _TYPE_ variable set to SCORE.  Use the same variable names ( X1 to X9) to hold  the coefficients.  Add a _NAME_ variable to hold the name you want for the NEW variable with predicted score.

data score;
  retain _name_ 'MyModel' _type_ 'SCORE' ;
  input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;

So this example SCORE dataset is the formula:

MyModel = X1 +2*X2 +3*X3 +4*x4+5*x5+6*X6+7*X7+8*X8+9*X9;

Then use PROC SCORE to score your dataset.

proc score data=possible_combinations_t score=score out=results;
run;

First 10 results:

Tom_0-1737592813133.png

 

Ronein
Meteorite | Level 14

Can  you please show how score data set is looking like?

Does it have only one row??

 

 

 

Ronein
Meteorite | Level 14

Sorry I dont understand

You created a data set called score with columns X1-X9 (see input statement)

then you used retain statement and wrote there columns _name_ myModel _type_ score.

I dont see these columns in your daat set

data score;
  retain _name_ 'MyModel' _type_ 'SCORE' ;
  input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;

 

Tom
Super User Tom
Super User

@Ronein wrote:

Sorry I dont understand

You created a data set called score with columns X1-X9 (see input statement)

then you used retain statement and wrote there columns _name_ myModel _type_ score.

I dont see these columns in your daat set

data score;
  retain _name_ 'MyModel' _type_ 'SCORE' ;
  input x1-x9;
cards;
1 2 3 4 5 6 7 8 9
;

 


Did you trying running PROC PRINT on the SCORE dataset?

Ronein
Meteorite | Level 14

Sorry

Here are 2  data sets I want to merge

data possible_combinations_t;
   do x1 = 1 to 3;
   do x2 = 1 to 2;
   do x3 = 1 to 2;
   do x4 = 1 to 3;
   do x5 = 1 to 2;
   do x6 = 1 to 3;
   do x7 = 1 to 4;
   do x8 = 1 to 2;
   do x9 = 1 to 3;
     output;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
run;

Data coef_t;
input var $  value coefficient;
cards;
X1 1	 -0.1
X1 2	 0.25
X1 3 	0
X2 1	 0.35
X2 2	 0
X3 1	 0.75
X3 2	 0
X4 1	 0.95
X4 2	 1.45
X4 3 	0
X5 1	 -1.2
X5 2	 0
X6 1	 -1.1
X6 2	 2.2
X6 3	 0
X7 1	 -0.7
X7 2	 0.8
X7 3	 1.6
X7 4 	0
X8 1	 -1.1
X8 2  0
X9 1	 0.6
X9 2	 0.9
X9 3 	0
INTER 1 -2.5
;
Run;
yabwon
Onyx | Level 15

You are on this forum for so long that you could at least to try provide some "my own experiments" solutions in your questions. 

 

Try this:

data possible_combinations_t;
   do x1 = 1 to 3;
   do x2 = 1 to 2;
   do x3 = 1 to 2;
   do x4 = 1 to 3;
   do x5 = 1 to 2;
   do x6 = 1 to 3;
   do x7 = 1 to 4;
   do x8 = 1 to 2;
   do x9 = 1 to 3;
     output;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
  end;
run;

Data coef_t;
input var $  value coefficient;
cards;
X1 1	 -0.1
X1 2	 0.25
X1 3 	0
X2 1	 0.35
X2 2	 0
X3 1	 0.75
X3 2	 0
X4 1	 0.95
X4 2	 1.45
X4 3 	0
X5 1	 -1.2
X5 2	 0
X6 1	 -1.1
X6 2	 2.2
X6 3	 0
X7 1	 -0.7
X7 2	 0.8
X7 3	 1.6
X7 4 	0
X8 1	 -1.1
X8 2  0
X9 1	 0.6
X9 2	 0.9
X9 3 	0
INTER 1 -2.5
;
Run;

data want;
  if 1=_N_ then
    do;
      if 0 then set coef_t;
      declare hash H(dataset:"coef_t");
      H.defineKey("var", "value");
      H.defineData("coefficient");
      H.defineDone();
      drop i var value coefficient;
    end;

  set possible_combinations_t;

  /* X1_Coef ... x9_Coef  - stupid SAS manes, 
  that's why I'm caling them XCoef_1 = XCoef_9 
  */
  array XCoef_[9];
  array x[9];

  do i = 1 to dim(x);
    if 0=H.find(key: strip(upcase(vname(x[i]))), key:x[i]) 
      then XCoef_[i] = coefficient; 
  end;

  /* INTER=-2.5; */ /* <-- uncomment if you need INTER */
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Not clear where you are going with this, but to combine those two datasets I would convert the first one into a "tall" dataset so it has VAR and VALUE variables like the second one does.  Might help to remember which observations the values came from.

data tall;
  row+1;
  set possible_combinations_t ;
  length var $8 value 8;
  array x[9] ;
  do index=1 to dim(x);
     var=upcase(vname(x[index]));
     value=x[index];
     output;
  end;
  drop x: ;
run;

Now you can sort and merge by VAR and VALUE.  Or let PROC SQL do it for you.

proc sql ;
create table want as 
select * 
from tall natural join coef_t
;
quit;

Results:

Tom_0-1737647145828.png

...

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 932 views
  • 1 like
  • 3 in conversation