Symanto Brain Glossary

Active Learning

Type of machine learning where the model is trained on the data it considers most relevant, by prioritising the instances which are considered to be the most useful, and in that way reduce the data needs.

Annotated corpora

Apart from the pure text, a corpus can also be provided with additional linguistic information, called 'annotation'. This information can be of different nature, such as prosodic, semantic or historical annotation.

Deep Learning Model

A deep learning model is a trained model using a neural network architecture or a set of labelled data that contains multiple layers.

F1 score

An evaluation metric that measures a model’s accuracy. It combines the precision and recall scores of a model. The accuracy metric computes how many times a model made a correct prediction across the entire dataset. This can be a reliable metric only if the dataset is class-balanced; that is, each class of the dataset has the same number of samples.

Few-shot learning

Few-shot learning refers to the model’s ability to classify new data when only a limited number of training instances (e.g., 10 to 100) have been provided. As a result, after being exposed to a small amount of prior information, the model improves its performance.

Label (Class)

A pre-defined category of open-ended text.

Model training

Model training is the process of feeding an ML algorithm with data to help identify and learn good examples for all labels (classes) involved.

Neural Language Model

A language model based on neural networks exploits their ability to learn distributed representations to reduce the need for huge numbers of training examples when learning highly complex functions.

Pattern

Patterns are descriptive sentences which assist the model in understanding and better detecting your label.

Prompt Ranker

A tool for unsupervised prompt ranking. Prompts (or hypothesis, or label descriptions), are the key components that make Symanto Brain work. Without annotated data (which is often the case when using Symanto Brain), finding a good prompt is hard: PromptRanker helps you do that. Upload a dataset to be classified and manually write a list of different prompts for each label in the label space: PromptRanker will return a number for each prompt to score the prompts according to their estimated trustworthiness. The higher the score, the better the prompt.

Semantic matching

Semantic matching is a technique to determine whether two or more elements have a similar meaning.

Siamese network

A Siamese neural network is an artificial neural network that contains two or more identical subnetworks, where usually only one of them is trained, and all are later used to find the similarity of the inputs by comparing their feature vectors.

Text classification

Text Classification is the process of categorizing text into one or more different classes to organize, structure, and filter into any parameter.

Zero-shot

Zero-shot refers to the model’s ability to classify objects that it has never seen before, or in other words, allows us to assign an appropriate label to a piece of text without having received any training examples before.