nlp - w3toppers.com

OpenAI GPT-3 API: How do I make sure answers are from a customized (fine-tuning) dataset?

Semantic search example The following is an example of semantic search based on embeddings using the OpenAI API. Wrong goal: OpenAI API should answer from the fine-tuning dataset if the prompt is similar to the one from the fine-tuning dataset It’s completely wrong logic. Forget about fine-tuning. As stated in the official OpenAI documentation: Fine-tuning … Read more

How to use Bert for long text classification?

You have basically three options: You can cut the longer texts off and only use the first 512 Tokens. The original BERT implementation (and probably the others as well) truncates longer sequences automatically. For most cases, this option is sufficient. You can split your text in multiple subtexts, classify each of them and combine the … Read more

What is the difference between lemmatization vs stemming?

Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope … Read more

Anyone know of some good Word Sense Disambiguation software? [closed]

My list are not exhaustive but surely Googling for more will be better for your purposes. For softwares here’s a short list, remember to CITE the relevant sources!!! GWSD: Unsupervised Graph-based Word Sense Disambiguation http://lit.csci.unt.edu/~rada/downloads/GWSD/GWSD.1.0.tar.gz SenseLearner: All-Words Word Sense Disambiguation Tool http://lit.csci.unt.edu/~rada/downloads/senselearner/SenseLearner2.0.tar.gz KYOTO UKB graph-based WSD http://ixa2.si.ehu.es/ukb/ pyWSD: Python Implementation of Simple WSD algorithms https://github.com/alvations/pywsd … Read more

How do I open multiple instances of Visual Studio Code?

Ctrl + Shift + N will open a new window, while Ctrl+K then releases the keys, and pressing O would open the current tab in a new window. You can then use menu File → Open Folder to have two instances of Visual Studio Code with different folders in each window. ⌘ + Shift + … Read more

How do I do word Stemming or Lemmatization?

If you know Python, The Natural Language Toolkit (NLTK) has a very powerful lemmatizer that makes use of WordNet. Note that if you are using this lemmatizer for the first time, you must download the corpus prior to using it. This can be done by: >>> import nltk >>> nltk.download(‘wordnet’) You only have to do … Read more

How do you implement a “Did you mean”? [duplicate]

Actually what Google does is very much non-trivial and also at first counter-intuitive. They don’t do anything like check against a dictionary, but rather they make use of statistics to identify “similar” queries that returned more results than your query, the exact algorithm is of course not known. There are different sub-problems to solve here, … Read more

Stemmers vs Lemmatizers

Q1: “[..] are English stemmers any useful at all today? Since we have a plethora of lemmatization tools for English” Yes. Stemmers are much simpler, smaller and usually faster than lemmatizers, and for many applications their results are good enough. Using a lemmatizer for that is a waste of resources. Consider, for example, dimensionality reduction … Read more

Detecting syllables in a word

Read about the TeX approach to this problem for the purposes of hyphenation. Especially see Frank Liang’s thesis dissertation Word Hy-phen-a-tion by Com-put-er. His algorithm is very accurate, and then includes a small exceptions dictionary for cases where the algorithm does not work.

What are some simple NLP projects that a CS undergrad can try implementing? [closed]

There are plenty of them. Here is a list of different NLP problems: spam detection text genre categorization (news, fiction, science paper) finding similar texts (for example search for similar articles) find something about author (genre, native-speaker/non-native-speaker) create automatic grader for student’s work check text for plagiarism create an application that looks for grammatical errors … Read more