Graph embeddings are methods that are used to represent nodes, edges or subgraphs as low-dimensionalvectors. This is very important, considering that vectors can be processed more efficiently tha n arbitraryobjects. The vectors produced by a graph embedding technique capture the most important features froma complex graph structure. In this thesis, we focus on graph embeddings related to the nodes of a graph,i.e., each node is represented as a low-dimensional vector.Converting objects to multi-dimensional vectors is a methodology that has been used consistently in datamanagement and mining. For example, we can take a thick book with many words, representing theword sequence and sentences as a graph. We can also take all the single words of that book, and thosewill be its features. As a result, we have simpler and less information in the book representation than thewhole book. The words that are exported from the book are a part or feature of the book. Therefore, wetransformed something complicated into a more straightforward form with a reduced dimensions. Graphembeddings are specific types of embeddings that transform the graph or parts of the graph intoembeddings, fixed-length vectors, or tensors. In Figure 1, we export from the graph the adjacency matrixto represent the 𝑛 × 𝑛 table's nodes. If we have a node linked to another node, then we assign 1;otherwise, 0. The real-world networks (graph) will have a very sparse adjacency matrix. Finally, theembedded translates to a smaller matrix in low dimensions, as in Figure 1 [5]. Likewise, linear algebracan help to reduce the dimensionality with many useful tools like the SVD (Figure 2)
figure1:Graph structure of the Zachary Karate Club social network
We take the graph and learn the important features, distilling it with vectors that machines can
manipulate.
The graph embeddings represent the essential features of an input graph in a compact and low
dimensional format. Moreover, these vectors can be used as features for machine learning tasks. Also,
it can be used for direct comparison or as a representation in a deep learning model.
2.1 Word embeddings
How similar are two words? Can we measure the distance between two words? Is there a mathematical
method to represent the words? How similar are the strings?
Let us take the word of CAT= [10100000000000000001000000]
We can investigate the word "cat" at the bit level. The bits representing the word cat but do not tell us
anything about the word, and we cannot use this information because we do not know the meaning of
the word.
Another approach is to build grammar and syntactic rules for these words, but this does not work on a
large scale. Instead, we have to build a model to predict the sequence of words or find what words are
close to one another and measure the distance.
Another commonly used approach is to count the words in a document (TF-IDF). In this approach, we
know nothing about the words, nor do we know anything about the relationship with the other words.
We can build a context window from several documents and record the sequences of the words. The
context and meaning of the words are the objectives. The sequence of appearance of the words when
they appear together will help illuminate their meaning. The word ordering matters and the context
around the word helps us understand the meaning of words. In the end, we can assume which word
follows the other. We need many documents to understand the context and a larger table for each word.
On the other hand, a big sparse matrix is difficult to handle. Therefore, we need to reduce the
dimensionality of the matrix.
In Figure 2, an SVD example is given. However, although SVD is a very powerful technique, it requires
a significant amount of memory and computational power.
figure2:Linear algebra SVD Decomposition.
Eventually, for every word, we can use the context window and can predict the next word. One model
is the skip-gram model used for prediction when we have two words and want to predict the next one.
Several models are based on skip-gram models, like word2vec.
The skip-gram model learns a vector representation for each word that maximizes the word's probability,
given the previous words. In Figure 3, we have the input vector; then, we have a hidden layer of linear
neurons.
figure3:Skip-Gram model learns a vector representation for each word.
We want to be able to predict the probability for every word in the corpus that is the next word. We do
not need the output but the hidden layer. The hidden layer is a weight matrix table crea ted during the
input vector computation to produce the output. The weights are the result of gradient descent and back-
propagation to learn the appropriate weights for every word. The hidden layer is a weights matrix with
one row per word and one column per neuron. Finally, these are the embeddings; and we train the skip-
gram to learn the context and the meaning for every word. The hidden layer produced by this process
corresponds to the word embedding. Figure 4 [5] presents an example of word embeddings.
figure4 :Word embeddings condense representation.
Every word is associated with a vector from the hidden layer and those vectors are the word embeddings.
The word embeddings can represent the original words because the vectors still preserve the context that
words represent. The relationship between embeddings is identical in the world of words. For example,
in Figure 4 we have the vector for the words man and woman, which is the same with the full context.
Finally, for every word, we can calculate the distance to find relations with each other. These calculations
are efficient because they are applied in a low-dimensional space compared to the initial space. Also, we
can plot to visualize the distances for every word to examine patterns and find relations.
2.2 Graph embeddings
We could present the graph as a text corpus, the nodes as words, and adopt the word embeddings as the
graph embeddings; thus, we want to learn more about the nodes' means. We can then replace the
adjacency matrix with low dimensional vectors. Therefore, we can embrace the idea from the deep walk
[5], and we could say that each node is like a word, and the neighborhood around the node is the context
window. We can use the skip-gram model, and with a node and a context window, we can predict what
will be next. Thus, the aim is to predict neighboring nodes given a node. Figure 5 shows an example.
figure5:A visualization of Deep-walk.
We sample nodes with random walks from the input graph. Every node in the graph takes n fixed-length
random walks like the sentences in the word embeddings. Then we create the context windows or
neighborhood windows. We used the method to create the neighborhoods with the random walk, but
there are many alternatives.
Embeddings are the hidden layer weights from the skip-gram model; Figure 6 [7] shows the neurons and
the embeddings.





0 Σχόλια