topic Re: LSTM with dltrain in SAS Data Science

LSTM with dltrain

Torben2 — Tue, 24 Aug 2021 10:05:34 GMT

Hello all,

I have two questions about dttrain. I am using dltrain to predict time series using LSTM.

There are two options for which I would like to have more information:
- stagnation
- nthreads

Stagnation:
SAS help says the following: "specifies the number of iterations completed without improvement before stopping the optimization early. When the validTable parameter is specified, the validation scores are monitored for stagnation."

Does iterations mean epochs? If not, what are iterations?

What exactly is meant by validation scores? Loss or error or both together?

I have tried a few settings, including stagnation = 1. However, even with an increase in validation errors, the training (including validation table) did not stop before the end of the specified maxEpochs.

nthreads:
Can the runtime of the training be reduced by specifying threads? What is a reasonable number of threads and how can I determine it for my system?

Thanks a lot!

Many greetings
Torben

Re: LSTM with dltrain

zhongxiuliu — Fri, 27 Aug 2021 13:02:32 GMT

1. epochs are group of iterations, representing when all the data has been used in updating weight.

in stochastic gradient descent, each iteration we calculate derivatives and update weights, using only a small sample of all data. This way, when we have huge data, we can update quick. (For more information, google stochastic gradient descent.)

If we have 500 data, each iteration we sample 50, then it take 10 iterations for all data to be used once.

this 10 iterations, is an epoch.

2. validation score: the objective function, or loss. (error + regularization)

3. nthreads: is how many GPU devices you use for calculation, not related to the algorithm. SAS Viya use parallel computing, so its like how many computers you want to do the calculation. Larger number=> faster. But you can't have it larger than the available GPU your IT gives you.

4. stagnation: your understanding is correct. I suspect is keep going because of the objective function value is still going down, though the error stopped going down.

Re: LSTM with dltrain

Torben2 — Wed, 01 Sep 2021 12:08:50 GMT

Thank you very much for your answers. You have helped me a lot.

I am not sure if I understood the second point correctly. Does it mean that you can choose between objective function OR loss as validation score?
How to understand the expression in the brackets (error + regularization)?

Thanks in advance.

Torben

Re: LSTM with dltrain

zhongxiuliu — Wed, 01 Sep 2021 13:14:33 GMT

I meant objective function and loss function are often used to describe the same thing 🙂 However, they are different from error function.

Both function are error function (the error) + regularization (e.g., the squared or absolute value of weights; some people call it R1, R2; some people call it Lasso Ridge).

The reason behind this is: if we just minimize the error, we can easily get a model with very big weights, which makes our activation function's slope really deep (a little change in x, causes big change in y); our model would overfit, unstable and sensitive to noise .

Minimizing both error and the weights, makes our neural network less sensitive to noises in data, and more generalizable.