Size: a a a

Глубинное обучение (группа)

2018 July 06

kk

k k in Глубинное обучение (группа)
But here problem is even in stateless model of lstm the internal state of peaple are kept within a batch, thus if we have a sequence like seq1,seq2,...seqn which are randomly shuffeled and each seq can be blonged to any arbitrary person the lstm considers them as seq1 as the history of seq2 and so on which os not correct
источник

AB

Arcady Balandin in Глубинное обучение (группа)
there must be a "clear" option in the lib u r using
источник

kk

k k in Глубинное обучение (группа)
Arcady Balandin
there must be a "clear" option in the lib u r using
Would you please explain more
источник

AB

Arcady Balandin in Глубинное обучение (группа)
consider batch = 1 sample. then you can reset the state of lstm after training it on each sample.
источник

kk

k k in Глубинное обучение (группа)
Arcady Balandin
consider batch = 1 sample. then you can reset the state of lstm after training it on each sample.
Yeah you r right but batchsize 1 did not give a good result so i enlarged the batch size but it was one suggestion do have any other idea in sense of crossvalidation policy between or within people?
источник

AB

Arcady Balandin in Глубинное обучение (группа)
a completely different method is e.g. in case of machine translation/text classification is to put a "termination symbol" at the end of each text sample.
источник

kk

k k in Глубинное обучение (группа)
Since the temporal dependency exist within data for each people a simple suffeling wont work so i tried the walkfroward ,... but still not rubost enough
источник

kk

k k in Глубинное обучение (группа)
Arcady Balandin
a completely different method is e.g. in case of machine translation/text classification is to put a "termination symbol" at the end of each text sample.
So if i use such a termination symbol in my data then how the lstm understand to reset the model state within a batch
источник

AB

Arcady Balandin in Глубинное обучение (группа)
k k
Yeah you r right but batchsize 1 did not give a good result so i enlarged the batch size but it was one suggestion do have any other idea in sense of crossvalidation policy between or within people?
even within batches of many samples the network is activated with your data. after activation with each sample the state must be reset. it all depends on your library how one does it
источник

kk

k k in Глубинное обучение (группа)
Im using keras with tf backend
источник

AB

Arcady Balandin in Глубинное обучение (группа)
k k
So if i use such a termination symbol in my data then how the lstm understand to reset the model state within a batch
this is a separate approach. the network is supposed to sooner or later understand what terminatin symbol means.
источник

AB

Arcady Balandin in Глубинное обучение (группа)
k k
Im using keras with tf backend
strange. doesn't "Stateful" equal False in tf by deafult?
источник

kk

k k in Глубинное обучение (группа)
Arcady Balandin
even within batches of many samples the network is activated with your data. after activation with each sample the state must be reset. it all depends on your library how one does it
Yeah i tried to have a callback within batch to reset the model after each sample but it seems that keras do not give callback within batch or per sample as far as i know , thus im open to any other solution for cross validation and generalization do you have any suggestion?
источник

AB

Arcady Balandin in Глубинное обучение (группа)
apart from using another lib no :)
источник

kk

k k in Глубинное обучение (группа)
Arcady Balandin
strange. doesn't "Stateful" equal False in tf by deafult?
Yes it works in both cases statefull or stateless but even in stateless it keeps state within one batch and reset automatically for next batch , the problem is if you have sample from different people in each batch
источник

YB

Yuri Baburov in Глубинное обучение (группа)
k k
Yes it works in both cases statefull or stateless but even in stateless it keeps state within one batch and reset automatically for next batch , the problem is if you have sample from different people in each batch
Things to check:
1) Maybe your metric isn't good enough for learning a useful representation so network doesn't generalize well? Maybe there's not enough different people data in the dataset?

2) Maybe there is much noise in the data or the labels are very noisy?
источник

YB

Yuri Baburov in Глубинное обучение (группа)
There should be no problem of having multiple people data in a batch.
источник

YB

Yuri Baburov in Глубинное обучение (группа)
Also, regarding learning stateful behavior -- you can try to sample data: instead of providing each 40 timesteps, give only 10 or 5 or 1. Or instead of measuring every second, do 5x sampling for the data and average 5 labels into 1 (*corrected). This might prevent your kind of overfitting/generalization failure that comes from learning local behaviour instead of global.
Also it's important to remember that there are cases and tasks when a LSTM layer doesn't help much because there's not much reliable correlation in global behaviour.
источник

YB

Yuri Baburov in Глубинное обучение (группа)
Imagine that you would do the same as you do for EEG but for the sounds that people make:
Once per second measure a sound they pronounce, and attempt to find correlation between sounds made 1 second ago, 2 seconds ago, 3 seconds ago and now. On that timeframe, the only thing that is the same is a person's timbre, and it's different for each person -- exactly what you found in your dataset!
источник

kk

k k in Глубинное обучение (группа)
Yuri Baburov
Imagine that you would do the same as you do for EEG but for the sounds that people make:
Once per second measure a sound they pronounce, and attempt to find correlation between sounds made 1 second ago, 2 seconds ago, 3 seconds ago and now. On that timeframe, the only thing that is the same is a person's timbre, and it's different for each person -- exactly what you found in your dataset!
So you mean that a small time frame does not have enough data to charactrize desire pattern in this case and it should be a longer time frame? Or i missunderstood?
источник