Телеграмм чат группы natural_language

Vadim Fomin in Natural Language Processing

10:11пожаловаться #1

VF

wisam

Hello friends I have a question about embeding using bert
" I want to transfer learning (fine tuning) to embed my dataset "
And to have an embeding as a output how can I reach that ?
Notice that I'm working on unsupervised without labels
Thank u

Hi, check out this example from the transformers repo: https://github.com/huggingface/transformers/tree/master/examples/language-modeling

GitHub

huggingface/transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0. - huggingface/transformers

10:12пожаловаться #2

w

wisam in Natural Language Processing

Thanks

10:38пожаловаться #3

M

This is the image of transformer model explained in jay alammar's blog.
Where from these two vectors( Kencdec and Vencdec) come from ?

10:41пожаловаться #4

M

This is the image of transformer model explained in jay alammar's blog.
Where from these two vectors( Kencdec and Vencdec) come from ?

The orange and blue matrices

Alexander in Natural Language Processing

10:56пожаловаться #5

A

Что если взять модель Bert (с помощью которой я классифицировал задачи Sentiment Analysis) и попробовать классифицировать ею стихи или прозу писателей? Т. е. можно с помощью нее обобщить стили написанного текста без сильной привязки к содержимому или конкретным словам? Или для этого придумали что-то другое?

Denis Kirjanov in Natural Language Processing

11:53пожаловаться #6

DK

Alexander

Что если взять модель Bert (с помощью которой я классифицировал задачи Sentiment Analysis) и попробовать классифицировать ею стихи или прозу писателей? Т. е. можно с помощью нее обобщить стили написанного текста без сильной привязки к содержимому или конкретным словам? Или для этого придумали что-то другое?

что такое модель берт в данном случае? вы файнтюнили, учились с нуля, взяли готовую? или речь о конкретном классификаторе поверх каких-то бертовых векторов

Alexander in Natural Language Processing

11:57пожаловаться #7

A

Я только разбираюсь и взял готовую модель bert-base-cased. Для распознания спама и тональности текста вроде подходит. Вот и подумал, можно ли ее попробовать для классификации прозы/поэзии или лучше взять что-то другое

Denis Kirjanov in Natural Language Processing

12:08пожаловаться #8

DK

Alexander

Я только разбираюсь и взял готовую модель bert-base-cased. Для распознания спама и тональности текста вроде подходит. Вот и подумал, можно ли ее попробовать для классификации прозы/поэзии или лучше взять что-то другое

подходит что? берт -- это вектора, если они подходят для спама и тональности, значит, вы еще построили какую-то модель поверх для определения спама и тональьности

а ответ на ваш вопрос простой -- попробуйте разные алгоритмы векторизации и классификации и замерьте качество на вашей задаче

12:12пожаловаться #9

ID

This is the image of transformer model explained in jay alammar's blog.
Where from these two vectors( Kencdec and Vencdec) come from ?

this is an encoder output, which was processed by Encoder-Decoder Self-Attention K and V matricies.

Nikolay Shmyrev in Natural Language Processing

12:14пожаловаться #10

NS

Сергей Устьянцев

всем привет

Есть что не интересное? Давайте трансляцию тут

Yuri Baburov in Natural Language Processing

12:19пожаловаться #11

YB

Nikolay Shmyrev

Есть что не интересное? Давайте трансляцию тут

там сегодня две секции параллельно идут. так что смотри сам, что тебе интереснее.
http://www.dialog-21.ru/programme2020/

Oleg Mosalov in Natural Language Processing

12:27пожаловаться #12

OM

Alexander

Что если взять модель Bert (с помощью которой я классифицировал задачи Sentiment Analysis) и попробовать классифицировать ею стихи или прозу писателей? Т. е. можно с помощью нее обобщить стили написанного текста без сильной привязки к содержимому или конкретным словам? Или для этого придумали что-то другое?

Вы можете взять результат оцифровки текста, который генерирует BERT, и применять к нему любые существующие алгоритмы классификации или кластеризации. Заранее предсказать, что получится и насколько это будет соответствовать ожиданиям - вряд ли возможно. Анализируйте и начинайте либо делать тонкую настройку любимой модели, либо кросс-валидацию нескольких моделей, либо всё вместе, ну и т.д.

Alexander in Natural Language Processing

12:30пожаловаться #13

A

Oleg Mosalov

Вы можете взять результат оцифровки текста, который генерирует BERT, и применять к нему любые существующие алгоритмы классификации или кластеризации. Заранее предсказать, что получится и насколько это будет соответствовать ожиданиям - вряд ли возможно. Анализируйте и начинайте либо делать тонкую настройку любимой модели, либо кросс-валидацию нескольких моделей, либо всё вместе, ну и т.д.

Спасибо!

14:34пожаловаться #14

M

Ivan Dolgov

this is an encoder output, which was processed by Encoder-Decoder Self-Attention K and V matricies.

Is this the output of the last encoder,
K = R * Wk
V = R * Wv
Here R is the output of 2nd last encoder....
Then there must be something like
Q = R * Wq

15:39пожаловаться #15

M

Is this the output of the last encoder,
K = R * Wk
V = R * Wv
Here R is the output of 2nd last encoder....
Then there must be something like
Q = R * Wq

Correct me if I am wrong.

15:39пожаловаться #16

ID

Is this the output of the last encoder,
K = R * Wk
V = R * Wv
Here R is the output of 2nd last encoder....
Then there must be something like
Q = R * Wq

if R - output from last encoder, than for K and V you're right, but Q calculates from decoder inputs. There are two types of Self-Attention in Decoder architecture, self-attention and encoder-decoder attention. In First type you use information only from target sequence (just like in Encoder only for target), and in second type you try to find dependencies between your input and target sequence (like in classic Attention Mechanism), so you use for this type of attention an information from target (for Query) and from input (for Key and Value).

15:53пожаловаться #17

M

Ivan Dolgov

if R - output from last encoder, than for K and V you're right, but Q calculates from decoder inputs. There are two types of Self-Attention in Decoder architecture, self-attention and encoder-decoder attention. In First type you use information only from target sequence (just like in Encoder only for target), and in second type you try to find dependencies between your input and target sequence (like in classic Attention Mechanism), so you use for this type of attention an information from target (for Query) and from input (for Key and Value).

I don't want to touch decoder now.
Even in encoder during self attention phase we have 3 matrices Query, Key and Value... And calculated as I stated above....
In the first encoder they are calculated as follows :
K = X * Wk
V = X * Wv
Q = X * Wq
Where X is the embeddings
And for others encoders except the first, we replace X by R( output of previous encoder )

16:09пожаловаться #18

ID

I don't want to touch decoder now.
Even in encoder during self attention phase we have 3 matrices Query, Key and Value... And calculated as I stated above....
In the first encoder they are calculated as follows :
K = X * Wk
V = X * Wv
Q = X * Wq
Where X is the embeddings
And for others encoders except the first, we replace X by R( output of previous encoder )

Oh, yes, you're right here. Initial question was about Kencdec and Vencdec and i thought you're talking about decoder already.

16:12пожаловаться #19

ID

Kencdec and Vencdec is calculated from last output of encoder. I think it's all you should know, while trying to understand what happens in encoder.