11-25-2016 12:56 PM

I'm going to select N subsets from a matrix in a DO loop.

N is variable, let's say 5;

Any tips on how to name the subsets: SUB1, SUB2, SUB3, SUB4, SUB5 ?

Solution

11-28-2016
10:26 AM

11-25-2016 10:59 PM

11-25-2016 12:59 PM

What do you specifically mean by a subset from a matrix?

11-28-2016 10:23 AM

RE: SUBSET

The full matrix (say, ALLDAT) is numerical stock data by stock & date: maybe 250K rows and 75 columns.

E.g., 2500 rows by 75 columns of data for "AAPL", one row for each date // 2500 rows of data for "AXP" // ...

A separate 250K X 1 stock tickers array (say, ALLTKR) tells me which rows of ALLDAT correspond to which stocks. So the 1st 2500 rows are "AAPL" data, the next 2500 rows are "AXP" data, etc.

I'm testing a stock trading strategy on a subset of ALLDAT. Let's say the subset is the 30 stocks currently in the Dow Jones Industrials Average (DJIA).

I'd specify the 30 tickers of the DJIA: {"AAPL" "AXP" "BA" "CAT" "CSCO" "CVX" "DD" "DIS" "GE" "GS" "HD" "IBM" "INTC" "JNJ" "JPM" "KO" "MCD" "MMM" "MRK" "MSFT" "NKE" "PFE" "PG" "TRV" "UNH" "UTX" "V" "VZ" "WMT" "XOM"}

Then I'd like to extract the rows of ALLDAT for each of those tickers into a matrix named as the ticker: extract 2500 "AAPL" rows into a matrix named AAPL, 2500 "AXP" rows into AXP, etc.

Within a DO loop over the dates I would perform calculations, compare, and select stocks for a portfolio.

I could probably skip the creation of the 30 submatrices by using pointers in ALLTKR to reference rows of ALLDAT but for debugging purposes having separate matrices may be very helpful.

11-28-2016 10:56 AM

A useful approach is to read each stock, analyze each stock, and then go on to the next stock. After reading all the data and computing statistics for each stock, you can do additional analysis that compares different stocks.

For example, the following loop reads one stock into X. If computes various statistics for each stock. After the loop is over, I have a matrix called RESULTS that contains all the information that I need to build my portfolio. The following program uses the following techniques:

- How to read big data in blocks
- How to use the WHERE clause to read subsets of a data set
- How to assign matrix attributes so you can access rows or columns by names

```
proc iml;
StockNames = {"IBM" "Intel" "Microsoft"}; /*{"AAPL" "AXP" ... "WMT" "XOM"}*/
results = j(3, ncol(StockNames), .);
mattrib results rowname={"Mean" "Stddev" "MaxVol"}
colname=StockNames;
use Sashelp.stocks;
/* http://blogs.sas.com/content/iml/2013/01/21/reading-big-data.html */
do ID = 1 to ncol(StockNames);
/* read in data for each stock */
/* http://blogs.sas.com/content/iml/2016/04/04/where-clause-in-sasiml.html */
read all var _NUM_ into X[colname=varNames]
where(Stock=(StockNames[ID]));
/* http://blogs.sas.com/content/iml/2012/10/01/access-rows-or-columns-of-a-matrix-by-names.html */
results["Mean", ID] = mean(X[,"Close"]);
results["Stddev", ID] = std(X[,"Close"]);
results["MaxVol", ID] = max(X[,"Volume"]);
end;
close;
print results;
```

11-25-2016 06:42 PM

I rarely name the subsets. In a loop you can extract the first subset, do what you want with it (for example, compute its mean) and store the result. Then you extract the next subset and store its results, etc, until all subsets have been handled.

For a reminder that you should pre-allocate the results matrix for efficiency, see

Solution

11-28-2016
10:26 AM

11-25-2016 10:59 PM