Hello, welcome toPeanut Shell Foreign Trade Network B2B Free Information Publishing Platform!
18951535724
  • Word2vec principle (iii) model based on negative sampling

       2026-05-10 NetworkingName930
    Key Point:In the last article, we talked about the word2vec model based on hierarchical softmax, and here we look at another way to solve the word2vec model: negative sampling。1. Shortcomings and improvements of hierarchical softmaxBefore talking about the word2vec model based on negative sampling, let's look at the shortcomings of hierarchical softmax. Indeed, the use of hoffman trees instead of traditional neural networks can increase the efficien

    In the last article, we talked about the word2vec model based on hierarchical softmax, and here we look at another way to solve the word2vec model: negative sampling。

    1. Shortcomings and improvements of hierarchical softmax

    Before talking about the word2vec model based on negative sampling, let's look at the shortcomings of hierarchical softmax. Indeed, the use of hoffman trees instead of traditional neural networks can increase the efficiency of model training. But if the central word in our training samplew is a very remote word, it has to go down in the hoffman tree for a long time. Can't you make the model simpler without a homman tree like this

    Negative sampling is the way to solve the word2vec model, which abandons the hoffman tree and uses the negative sampling method to solve it. Let's look at negative sampling's solution。

    2. Summary of models based on negative sampling

    Since its name is negative sampling, sampling is certainly used. There are many methods of sampling, such as the famous mcmc. The negative sampling here is not as complicated as the mcmc。

    For example, we have a training sample, the central word is w, and there are 2c words in the surrounding contextNtext(w)\). Because of this central word, it does and coNtext(w) exists, so it is a real case. By sampling negative sampling, we get neg and w different central words. (w i, i=1, 2, .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Ntext(w)\ and (w i\) constitute a negative example of neg that does not exist. Using this formal and neg negative example, we go back to the binary logic and get a negative sample of model parameters corresponding to each word \\(w i\) (\theta {) and the word vector of each word。

    As can be seen from the description above, the negative sampling, because of the lack of hoffmann tree, can train the model every time it takes a negative example by sampling a different neg central word, so the whole process is simpler than hierarchical softmax。

    But two questions still need to be understood:

    1) what if a binary logical regression takes place through a positive and neg negative

    2) how to conduct negative sampling

    We'll discuss question 1 in section iii, and question 2 in section iv.

    3. Model gradient based on negative sampling

    Word2vector principle

    Word2vector principle

    4. Negative sampling method

    Now let's see how we do negative sampling and get a neg negative. The word2vec sampling method is not complex, and if the vocabulary is v, then we split a segment of 1 into v, one word for each corresponding vocabulary. Of course, the length of the line is different for each word, the length of the line for the high-frequency word and the length for the low-frequency word. The length of the line in each wordw is determined by the following pattern:

    Word2vector principle

    Before we sample, we divide this line, which is 1 length, into an m equivalent, here in m>v, which ensures that each word corresponds to the line, into the corresponding piece. And each one of the ms falls on the line of a word. When we sample, we just have to sample the neg position from the m position, and the word that goes to each location is our negative。

    Word2vector principle

    In word2vec, m takes the default value \\(10^8)。

    5. Cbow model based on negative sampling

    With the above-mentioned negative sampling negative sampling method and the logical regression solver model parameter, we can sum up the cbow model algorithm process based on negative sampling. The gradient iterative process uses random gradients to increase:

    Input: text training sample based on cbow, term vector dimensions mccunt, cbow context size 2c, steps long, negative sample numbers neg

    Output: model parameters for each word in the vocabulary \\ (θ), all word vectors \\ (x w\)

    1. Random initialization of all model parameters, all word vectorsw

    For each training sample \\(co)Ntext (w 0), w 0), negative sampling of neg central words \ (w i, i=1,2, ... Eg\)

    3. An iterative process of gradients for each sample of the training concentration ((co)Ntext (w 0), w 0, w 1, ... W {neg}\) does the following:

    Word2vector principle

    6. Skip-gram model based on negative sampling

    With the foundation of the previous section cbow and the basis of the previous version of the skip-gram model based on hierarchical softmax, we can also summarize the skip-gram algorithm process based on negative sampling. The gradient iterative process uses random gradients to increase:

    Input: text training sample based on skip-gram, mccunt, 2c context size of skip-gram, steps long, nog of negative sample。

    Output: model parameters for each word in the glossary, all word vectors xw

    1. Random initialization of all model parameters, all word vectorsw

    For each training sample \\(co)Ntext (w 0), w 0), negative sampling of neg central words \ (w i, i=1,2, ... Eg\)

    3. An iterative process of gradients for each sample of the training concentration ((co)Ntext (w 0), w 0, w1, ... W neg}\) does the following:

    Word2vector principle

    References:

    Word2vec (iii) model based on negative sampling

    Word2vec details the mathematical principles

     
    ReportFavorite 0Tip 0Comment 0
    >Related Comments
    No comments yet, be the first to comment
    >SimilarEncyclopedia
    Featured Images
    RecommendedEncyclopedia