Rhodochrosite | Level 12

## Prodcue subscripts

@IanWakeling ... continuing with the subscript theme:

I need to create a idx matrix dynamically.

``X2=X[(1:5)||(20:30),];``

In the above matrices, the last one determines how many elements get selected. This gets applied to the first matrix where row1 and row2 are the start end index.

Here I want to select:

X2=X[(1:18) || (49:(49+59) || ...];

4 REPLIES 4
Rhodochrosite | Level 12

## Re: Prodcue subscripts

I can solve it with a do loop, but it there a more elegant way to achieve this:

H3 is the first matrix and H5 the last.

``````H6=DO(H3[1,1],H5[1],1);

DO I=2 TO NCOL(H5);
H7=DO(H3[1,I],H3[1,I] + H5[I] -1,1);
H6=H6 || H7;
END;

H6=H6`;

PRINT H6;``````
Barite | Level 11

## Re: Prodcue subscripts

I can't think of any elegant way to avoid the loop.  If H3 and H5 have a lot of columns, then it is generally not a good idea to 'grow' a matrix by concatenation within a loop, since it involves creating a new matrix for each iteration.  The following avoid this issue by 1st declaring H6 to be large enough to hold everything.

``````H6 = j(sum(H5), 1);
e = 1;
do i = 1 to ncol(H3);
r = 0:(H5[i] - 1);
H6[ r + e ] = r + H3[1, i];
e = e + H5[i];
end;``````
SAS Super FREQ

## Re: Prodcue subscripts

I don't understand the question. Can you provide a program that shows what you have and what you want?

Rhodochrosite | Level 12

## Re: Prodcue subscripts

What I asked for is a small piece of the following problem I am encountering:

During the task of finding the best regression model, there's always a discussion going on what should be the scope of the data. I am NOT talking here about splitting the universe into training and validation data sets.

Here I allude to the fact that the data owner could recommend taking out certain observations before running the regression.

So I apply the corresponding filters and end up having the data set that feeds into the regression model.

This time I didn't save neither the final data set nor the filter. I only have the universe data set and the regression model estimates.

The task is to shuffle from the universe in such a manner that I approximate the estimates.

I've rebuilt my approach with the SASHELP.CARS data set.

My credit goes to @Ksharp for his many fantastic contributions in the regard of the Genetic Algorithms, especially the https://communities.sas.com/t5/SAS-IML-File-Exchange/Genetic-Algorithm-For-Integer-Programming-Footb...

And to @Rick_SAS for his Sweep operator trick.

Frankly spoken I do not really control the crossover and mutuation options in the GA block. But I am happy with the solution I found.

My code goes like this:

1. I departure from SASHELP.CARS
2. I run a regression and make an output with the residuals
3. I sort the output by the residuals
4. I create ranks according to the residuals
5. I subset the data set by playing with the residuals. (This mimics the data owners' perspective to leave some observations out due to special conditions under which they could have been produced).
6. I run the regression whose results and estimates I delivered (and whose associated data set got lost and has to be reconstructed)
7. I save the estimates and write them later as objective values to the PROC IML where the GA evolves
8. I go back to the complete universe data set and run a proc transpose to create a dummy coding for the class variables in the regression
9. I load this dummy matrix into IML
10. I build the GA making use of the sweep operator and returning the deviation of the estimates from the objective values
11. As I do the problem encoding as fixed-length real, I want to translate a value of 0.5 (i.e.) from the solution vector to an IDX 1:50 if the class has 100 members
12. To improve the algorithm I use a "2." solution vector to move the 50 observations within the range. So perhaps in this case I select observations 20:70 from the Type "Truck" into the training data set.
13. As I had sorted the data set by its class variable and residual, 1:50 for the Type "Truck" from 100 available would mean that I choose the best 50 observations in terms of  residuals. The sorting adds logic to the GA and allows for better controlling the slope.

``````
PROC GLM DATA=SASHELP.CARS PLOTS=ALL;
WHERE (UPCASE(Type) NE "HYBRID") * (Cylinders NE .);
CLASS TYPE  ;
MODEL MPG_Highway =  Cylinders Horsepower TYPE  / SOLUTION;
OUTPUT OUT=PRED_CARS PREDICTED=PRED RESIDUAL=RES;
RUN;

PROC SORT DATA=PRED_CARS;
BY Type DESCENDING RES;
RUN;

PROC RANK DATA=WORK.PRED_CARS GROUPS=4 OUT=PRED_CARS1;
BY Type;
VAR RES;
RANKS R_RES;
RUN;

PROC FREQ DATA=WORK.PRED_CARS1;
TABLE TYPE * R_RES;
RUN;

DATA PRED_CARS2;
SET WORK.PRED_CARS1;
WHERE  ((UPCASE(TYPE)="TRUCK")*(R_RES NE 0) +
(UPCASE(TYPE)="SPORTS")*(R_RES NE 3) +
(UPCASE(TYPE)="SEDAN")*(R_RES LT 2) +
(UPCASE(TYPE) NOT IN ("TRUCK" "SPORTS" "SEDAN")) );
RUN;

PROC FREQ DATA=WORK.PRED_CARS2;
TABLE TYPE * R_RES;
RUN;

ODS OUTPUT PARAMETERESTIMATES=PARMS;
PROC GLM DATA=WORK.PRED_CARS2 PLOTS=ALL;
CLASS TYPE (REF="SUV") ;
MODEL MPG_Highway =  Cylinders Horsepower TYPE  / SOLUTION;
RUN;

PROC TRANSREG DATA=	SASHELP.CARS DESIGN;
WHERE (UPCASE(Type) NE "HYBRID") * (Cylinders NE .);
MODEL IDENTITY(LENGTH)
CLASS(TYPE / ZERO=NONE );
ID Cylinders Horsepower MPG_Highway Type;
OUTPUT OUT=UNIVERSE(DROP=LENGTH _:) ;
RUN;

DATA UNIVERSE;
FORMAT INTERCEPT 3.0 &_TRGIND;
SET WORK.UNIVERSE;
INTERCEPT=1;
RUN;

PROC SORT DATA=WORK.UNIVERSE;
BY Type;
RUN;

DATA WORK.UNIVERSE;
SET WORK.UNIVERSE;
BY TYPE;
IF FIRST.TYPE THEN I+1;
N=I;
DROP I;
RUN;

PROC SQL;
SELECT "'"||STRIP(NAME)||"'" INTO :VARS SEPARATED BY " "
FROM DICTIONARY.COLUMNS
WHERE LIBNAME="WORK" AND MEMNAME="UNIVERSE";
QUIT;

%PUT &VARS;

PROC IML;
USE WORK.UNIVERSE NOBS NOBS;
READ ALL VAR _NUM_ INTO VALS
[C=varNUMERIC];
CLOSE UNIVERSE;

/*the objective values*/
OBJ_B1=-1.1609;
OBJ_B2=-0.022394;
OBJ_B0=33.4; /*this is i.e. the objective value for the intercept*/
OBJ_B3=4.99;

TEST=VALS[,{'INTERCEPT'  'TypeSedan' 'TypeSports'
'TypeTruck' 'TypeWagon' 'TypeSUV' 'Cylinders' 'Horsepower' 'MPG_Highway'}];

S=SWEEP(TEST`*TEST,1:NCOL(TEST)-1);

PRINT S;

START_END=LOC(N^=T({.}||REMOVE(N,NOBS))) //
LOC(N^=T(REMOVE(N,1)||{.}));

H2=START_END[2,]-START_END[1,]+1;

H3=START_END // H2 // CUSUM(H2);

ST_END=H3;

PRINT ST_END;

IDXH=DO(1,NCOL(ST_END)*2,2);

IDX_X=1:NCOL(ST_END)*2;

RANGE=IDX_X[,1:NCOL(ST_END)];
POS=IDX_X[,NCOL(ST_END)+1:NCOL(ST_END)*2];

IDER=(J(1, NCOL(ST_END),0.3) // J(1, NCOL(ST_END),1)) || (J(1, NCOL(ST_END),0) // J(1, NCOL(ST_END),1));

start football(x) global(VALS, ST_END,OBJ_B1,OBJ_B2, OBJ_B0, OBJ_B3, RANGE, POS,
MUESTRA, BETA1, BETA2, BETA0, BETA3, IDX );

RANGER=X[,RANGE];
POSR=X[,POS];

X2=CEIL(ST_END[3,]#RANGER);

IDX=DO(ST_END[1,1],X2[1],1) + FLOOR((ST_END[3,1]-X2[1])*POSR[1]);

DO I=2 TO NCOL(RANGE);
IDX1=DO(ST_END[1,I],ST_END[1,I] + X2[I] -1,1) + FLOOR((ST_END[3,I]-X2[I])*POSR[I]);
IDX=IDX || IDX1;
END;

IDX=IDX`;

MUESTRA=VALS[IDX,{'INTERCEPT'  'TypeSedan' 'TypeSports'
'TypeTruck' 'TypeWagon' 'TypeSUV' 'Cylinders' 'Horsepower' 'MPG_Highway'}];

S=SWEEP(MUESTRA`*MUESTRA,1:NCOL(MUESTRA)-1);

BETA1=S[NROW(S)-2,NCOL(S)];
BETA2=S[NROW(S)-1,NCOL(S)];
BETA0=S[1,NCOL(S)];
BETA3=S[NROW(S)-4,NCOL(S)];

POINTS = ABS((BETA1-OBJ_B1)/OBJ_B1) + ABS((BETA2-OBJ_B2)/OBJ_B2) + ABS((BETA0-OBJ_B0)/OBJ_B0)
+ ABS((BETA3-OBJ_B3)/OBJ_B3);

if NROW(MUESTRA) < 250 | ABS(BETA0 - OBJ_B0) > 1  then points=1;

return (points);

finish;

ENCOD=NCOL(ST_END)*2;

id=gasetup(1,ENCOD,123);

call gasetobj(id,0,"football");

call gasetcro(id,1.0,2);

call gasetmut(id,0.2,2,1);

call gasetsel(id,100,1,0.95);

call gainit(id,1000,IDER);

print id;

niter = 40;

summary = j(niter,3);

mattrib summary [c = {"Min Points", "A?", "B?"} l=""];

do i = 1 to niter;

call garegen(id);

call gagetval(value, id);

summary[i,1] = value[1];

summary[i,2] = value[2];

summary[i,3] = value[3];

end;

call gagetmem(mem, value, id, 1);

PRINT MEM BETA1 BETA2 BETA0 BETA3;

iteration = t(1:niter);

print iteration summary;

call gaend(id);

MUESTRA_CH=TYPE[IDX,];

CREATE MUESTRA_CH FROM MUESTRA_CH [COLNAME='TYPE'];
APPEND FROM MUESTRA_CH;
CLOSE;

CREATE MUESTRA FROM MUESTRA [COLNAME= {'INTERCEPT'  'TypeSedan' 'TypeSports'
'TypeTruck' 'TypeWagon' 'TypeSUV' 'Cylinders' 'Horsepower' 'MPG_Highway'}];
APPEND FROM MUESTRA;
CLOSE;

quit;

DATA MUESTRA_ALL;
MERGE MUESTRA MUESTRA_CH;
RUN;

ODS OUTPUT PARAMETERESTIMATES=PARMS;
PROC GLM DATA=WORK.MUESTRA PLOTS=ALL;
MODEL MPG_Highway =  Cylinders Horsepower &_TRGIND / SOLUTION;
RUN;
``````
From The DO Loop