Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-25-2018 08:29 AM
(662 views)

@IanWakeling ... continuing with the subscript theme:

I need to create a idx matrix dynamically.

`X2=X[(1:5)||(20:30),];`

In the above matrices, the last one determines how many elements get selected. This gets applied to the first matrix where row1 and row2 are the start end index.

Here I want to select:

X2=X[(1:18) || (49:(49+59) || ...];

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I can solve it with a do loop, but it there a more elegant way to achieve this:

H3 is the first matrix and H5 the last.

```
H6=DO(H3[1,1],H5[1],1);
DO I=2 TO NCOL(H5);
H7=DO(H3[1,I],H3[1,I] + H5[I] -1,1);
H6=H6 || H7;
END;
H6=H6`;
PRINT H6;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I can't think of any elegant way to avoid the loop. If H3 and H5 have a lot of columns, then it is generally not a good idea to 'grow' a matrix by concatenation within a loop, since it involves creating a new matrix for each iteration. The following avoid this issue by 1st declaring H6 to be large enough to hold everything.

```
H6 = j(sum(H5), 1);
e = 1;
do i = 1 to ncol(H3);
r = 0:(H5[i] - 1);
H6[ r + e ] = r + H3[1, i];
e = e + H5[i];
end;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What I asked for is a small piece of the following problem I am encountering:

During the task of finding the best regression model, there's always a discussion going on what should be the scope of the data. I am NOT talking here about splitting the universe into training and validation data sets.

Here I allude to the fact that the data owner could recommend taking out certain observations before running the regression.

So I apply the corresponding filters and end up having the data set that feeds into the regression model.

This time I didn't save neither the final data set nor the filter. I only have the universe data set and the regression model estimates.

The task is to shuffle from the universe in such a manner that I approximate the estimates.

I've rebuilt my approach with the SASHELP.CARS data set.

My credit goes to @Ksharp for his many fantastic contributions in the regard of the Genetic Algorithms, especially the https://communities.sas.com/t5/SAS-IML-File-Exchange/Genetic-Algorithm-For-Integer-Programming-Footb....

And to @Rick_SAS for his Sweep operator trick.

Frankly spoken I do not really control the crossover and mutuation options in the GA block. But I am happy with the solution I found.

My code goes like this:

- I departure from SASHELP.CARS
- I run a regression and make an output with the residuals
- I sort the output by the residuals
- I create ranks according to the residuals
- I subset the data set by playing with the residuals. (This mimics the data owners' perspective to leave some observations out due to special conditions under which they could have been produced).
- I run the regression whose results and estimates I delivered (and whose associated data set got lost and has to be reconstructed)
- I save the estimates and write them later as objective values to the PROC IML where the GA evolves
- I go back to the complete universe data set and run a proc transpose to create a dummy coding for the class variables in the regression
- I load this dummy matrix into IML
- I build the GA making use of the sweep operator and returning the deviation of the estimates from the objective values
- As I do the problem encoding as fixed-length real, I want to translate a value of 0.5 (i.e.) from the solution vector to an IDX 1:50 if the class has 100 members
- To improve the algorithm I use a "2." solution vector to move the 50 observations within the range. So perhaps in this case I select observations 20:70 from the Type "Truck" into the training data set.
- As I had sorted the data set by its class variable and residual, 1:50 for the Type "Truck" from 100 available would mean that I choose the best 50 observations in terms of residuals. The sorting adds logic to the GA and allows for better controlling the slope.

```
PROC GLM DATA=SASHELP.CARS PLOTS=ALL;
WHERE (UPCASE(Type) NE "HYBRID") * (Cylinders NE .);
CLASS TYPE ;
MODEL MPG_Highway = Cylinders Horsepower TYPE / SOLUTION;
OUTPUT OUT=PRED_CARS PREDICTED=PRED RESIDUAL=RES;
RUN;
PROC SORT DATA=PRED_CARS;
BY Type DESCENDING RES;
RUN;
PROC RANK DATA=WORK.PRED_CARS GROUPS=4 OUT=PRED_CARS1;
BY Type;
VAR RES;
RANKS R_RES;
RUN;
PROC FREQ DATA=WORK.PRED_CARS1;
TABLE TYPE * R_RES;
RUN;
DATA PRED_CARS2;
SET WORK.PRED_CARS1;
WHERE ((UPCASE(TYPE)="TRUCK")*(R_RES NE 0) +
(UPCASE(TYPE)="SPORTS")*(R_RES NE 3) +
(UPCASE(TYPE)="SEDAN")*(R_RES LT 2) +
(UPCASE(TYPE) NOT IN ("TRUCK" "SPORTS" "SEDAN")) );
RUN;
PROC FREQ DATA=WORK.PRED_CARS2;
TABLE TYPE * R_RES;
RUN;
ODS OUTPUT PARAMETERESTIMATES=PARMS;
PROC GLM DATA=WORK.PRED_CARS2 PLOTS=ALL;
CLASS TYPE (REF="SUV") ;
MODEL MPG_Highway = Cylinders Horsepower TYPE / SOLUTION;
RUN;
PROC TRANSREG DATA= SASHELP.CARS DESIGN;
WHERE (UPCASE(Type) NE "HYBRID") * (Cylinders NE .);
MODEL IDENTITY(LENGTH)
CLASS(TYPE / ZERO=NONE );
ID Cylinders Horsepower MPG_Highway Type;
OUTPUT OUT=UNIVERSE(DROP=LENGTH _:) ;
RUN;
DATA UNIVERSE;
FORMAT INTERCEPT 3.0 &_TRGIND;
SET WORK.UNIVERSE;
INTERCEPT=1;
RUN;
PROC SORT DATA=WORK.UNIVERSE;
BY Type;
RUN;
DATA WORK.UNIVERSE;
SET WORK.UNIVERSE;
BY TYPE;
IF FIRST.TYPE THEN I+1;
N=I;
DROP I;
RUN;
PROC SQL;
SELECT "'"||STRIP(NAME)||"'" INTO :VARS SEPARATED BY " "
FROM DICTIONARY.COLUMNS
WHERE LIBNAME="WORK" AND MEMNAME="UNIVERSE";
QUIT;
%PUT &VARS;
PROC IML;
USE WORK.UNIVERSE NOBS NOBS;
READ ALL VAR {N};
READ ALL VAR _NUM_ INTO VALS
[C=varNUMERIC];
READ ALL VAR {"TYPE"};
CLOSE UNIVERSE;
/*the objective values*/
OBJ_B1=-1.1609;
OBJ_B2=-0.022394;
OBJ_B0=33.4; /*this is i.e. the objective value for the intercept*/
OBJ_B3=4.99;
TEST=VALS[,{'INTERCEPT' 'TypeSedan' 'TypeSports'
'TypeTruck' 'TypeWagon' 'TypeSUV' 'Cylinders' 'Horsepower' 'MPG_Highway'}];
S=SWEEP(TEST`*TEST,1:NCOL(TEST)-1);
PRINT S;
START_END=LOC(N^=T({.}||REMOVE(N,NOBS))) //
LOC(N^=T(REMOVE(N,1)||{.}));
H2=START_END[2,]-START_END[1,]+1;
H3=START_END // H2 // CUSUM(H2);
ST_END=H3;
PRINT ST_END;
IDXH=DO(1,NCOL(ST_END)*2,2);
IDX_X=1:NCOL(ST_END)*2;
RANGE=IDX_X[,1:NCOL(ST_END)];
POS=IDX_X[,NCOL(ST_END)+1:NCOL(ST_END)*2];
IDER=(J(1, NCOL(ST_END),0.3) // J(1, NCOL(ST_END),1)) || (J(1, NCOL(ST_END),0) // J(1, NCOL(ST_END),1));
start football(x) global(VALS, ST_END,OBJ_B1,OBJ_B2, OBJ_B0, OBJ_B3, RANGE, POS,
MUESTRA, BETA1, BETA2, BETA0, BETA3, IDX );
RANGER=X[,RANGE];
POSR=X[,POS];
X2=CEIL(ST_END[3,]#RANGER);
IDX=DO(ST_END[1,1],X2[1],1) + FLOOR((ST_END[3,1]-X2[1])*POSR[1]);
DO I=2 TO NCOL(RANGE);
IDX1=DO(ST_END[1,I],ST_END[1,I] + X2[I] -1,1) + FLOOR((ST_END[3,I]-X2[I])*POSR[I]);
IDX=IDX || IDX1;
END;
IDX=IDX`;
MUESTRA=VALS[IDX,{'INTERCEPT' 'TypeSedan' 'TypeSports'
'TypeTruck' 'TypeWagon' 'TypeSUV' 'Cylinders' 'Horsepower' 'MPG_Highway'}];
S=SWEEP(MUESTRA`*MUESTRA,1:NCOL(MUESTRA)-1);
BETA1=S[NROW(S)-2,NCOL(S)];
BETA2=S[NROW(S)-1,NCOL(S)];
BETA0=S[1,NCOL(S)];
BETA3=S[NROW(S)-4,NCOL(S)];
POINTS = ABS((BETA1-OBJ_B1)/OBJ_B1) + ABS((BETA2-OBJ_B2)/OBJ_B2) + ABS((BETA0-OBJ_B0)/OBJ_B0)
+ ABS((BETA3-OBJ_B3)/OBJ_B3);
if NROW(MUESTRA) < 250 | ABS(BETA0 - OBJ_B0) > 1 then points=1;
return (points);
finish;
ENCOD=NCOL(ST_END)*2;
id=gasetup(1,ENCOD,123);
call gasetobj(id,0,"football");
call gasetcro(id,1.0,2);
call gasetmut(id,0.2,2,1);
call gasetsel(id,100,1,0.95);
call gainit(id,1000,IDER);
print id;
niter = 40;
summary = j(niter,3);
mattrib summary [c = {"Min Points", "A?", "B?"} l=""];
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
summary[i,1] = value[1];
summary[i,2] = value[2];
summary[i,3] = value[3];
end;
call gagetmem(mem, value, id, 1);
PRINT MEM BETA1 BETA2 BETA0 BETA3;
iteration = t(1:niter);
print iteration summary;
call gaend(id);
MUESTRA_CH=TYPE[IDX,];
CREATE MUESTRA_CH FROM MUESTRA_CH [COLNAME='TYPE'];
APPEND FROM MUESTRA_CH;
CLOSE;
CREATE MUESTRA FROM MUESTRA [COLNAME= {'INTERCEPT' 'TypeSedan' 'TypeSports'
'TypeTruck' 'TypeWagon' 'TypeSUV' 'Cylinders' 'Horsepower' 'MPG_Highway'}];
APPEND FROM MUESTRA;
CLOSE;
quit;
DATA MUESTRA_ALL;
MERGE MUESTRA MUESTRA_CH;
RUN;
ODS OUTPUT PARAMETERESTIMATES=PARMS;
PROC GLM DATA=WORK.MUESTRA PLOTS=ALL;
MODEL MPG_Highway = Cylinders Horsepower &_TRGIND / SOLUTION;
RUN;
```

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.