KudoZ: term help for translators

The KudoZ network provides a framework for translators and others to assist each other with translations or explanations of terms and short phrases.

Spanish term

Robert Copeland

United States

1205 questions

exponencialmente decreciente de los gradientes cuadrados pasados

Spanish to English Other Mathematics & Statistics

Context: También almacena un promedio exponencialmente decreciente de los gradientes cuadrados pasados similar a RMSprop.

Proposed translations (English)

4	an exponentially decreasing average of squared past gradients	Francois Boye
4	exponentially decaying average of past squared gradientss	Helena Chavarria

Change log

Jul 22, 2020 16:31: philgoddard changed "Field (write-in)" from "General knowledge" to "(none)"

Proposed translations

8 hrs

Francois Boye

United States

6007 answers

Specializes in field

Selected

an exponentially decreasing average of squared past gradients

a decreasing average is an average that decreases over time; that decrease is exponential if follows an exponential function.

https://en.wikipedia.org/wiki/Gradient

4 KudoZ points awarded for this answer. Comment: "Selected automatically based on peer agreement."

17 mins

Helena Chavarria

Spain

1808 answers

Native in English

exponentially decaying average of past squared gradientss

I've found this, though I have no idea what it means!

Adaptive Moment Estimation (Adam) is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients s like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients v, similar to momentum. Whereas momentum can be seen as a ball running down a slope, Adam behaves like a heavy ball with friction, which thus prefers flat minima in the error surface.

https://towardsdatascience.com/optimisation-algorithm-adapti...

--------------------------------------------------
Note added at 17 mins (2020-07-22 16:23:26 GMT)
--------------------------------------------------

Oops! 'Gradients' should only have one 's'.

--------------------------------------------------
Note added at 2 hrs (2020-07-22 18:59:14 GMT)
--------------------------------------------------

The naive way to do the windowed accumulation of squared gradients is simply by accumulating the last w squared gradients. However, storing and updating the w previous squared gradients is not efficient, especially when the parameter to be updated is very large, which in deep learning could become millions of parameters. Instead, the author of Adadelta implements the accumulation as an exponentially decaying average of the squared gradients, which denoted by 𝔼[g²]. This local accumulation at timestep 𝑡 is computed by

https://medium.com/konvergen/continuing-on-adaptive-method-a...

4.6 Adam
Adaptive Moment Estimation (Adam) [10] is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum:

https://arxiv.org/pdf/1609.04747.pdf

RMSprop
Root Mean Squared Propagation (RMSprop) is very close to Adagrad, except for it does not provide the sum of the gradients, but instead an exponentially decaying average. This decaying average is realized through combining the Momentum algorithm and Adagrad algorithm, with a new term.

https://mlfromscratch.com/optimizers-explained/#/

Adam
Adam stands for Adaptive Moment Estimation. In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients, similar to momentum.

https://www.kaggle.com/residentmario/keras-optimizers

Peer comment(s):

agree	philgoddard : These terms may look difficult to a non-statistician, and I'm not one, but they're fairly easy to guess and Google. 7 mins
	Cheers, Phil :-)
disagree	Francois Boye : a gradient does not decay; instead it in/decreases 1 hr
	As I have mentioned, I'm definitely no expert. I suggest you contact the authors of the papers/articles I've used to illustrate my answer. Thank you for your much-appreciated opinion.

Glossary entry (derived from question below)

Spanish term or phrase:

exponencialmente decreciente de los gradientes cuadrados pasados

English translation:

an exponentially decreasing average of squared past gradients

Spanish term

exponencialmente decreciente de los gradientes cuadrados pasados

Proposed translations

an exponentially decreasing average of squared past gradients

Something went wrong...

exponentially decaying average of past squared gradientss

Something went wrong...

Your current localization setting

Select a language

Glossary entry (derived from question below)

Spanish term or phrase:

exponencialmente decreciente de los gradientes cuadrados pasados

English translation:

an exponentially decreasing average of squared past gradients

Spanish term

exponencialmente decreciente de los gradientes cuadrados pasados

Proposed translations

an exponentially decreasing average of squared past gradients

Something went wrong...

exponentially decaying average of past squared gradientss

Something went wrong...

You have native languages that can be verified

Your current localization setting

Select a language