Questions tagged [nlp]

Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NLP tasks

  • Text pre-processing
  • Coreference resolution
  • Dependency parsing
  • Document summarization
  • Named entity recognition (NER)
  • Information extraction (IE)
  • Language modeling
  • Part-of-speech (POS) tagging
  • Morphological analysis and wordform generation
  • Phrase-structure (constituency) parsing
  • Machine translation (MT)
  • Question answering (QA)
  • Sentiment analysis
  • Semantic parsing
  • Text categorization
  • Textual entailment detection
  • Topic modeling
  • Word Sense Disambiguation (WSD)

Beginner books on Natural Language Processing

2740 questions
12
votes
4 answers

How to process natural language queries?

I'm curious about natural language querying. Stanford has what looks to be a strong set of software for processing natural language. I've also seen the Apache OpenNLP library, and the General Architecture for Text Engineering. There are an…
Steve Kallestad
  • 3,128
  • 4
  • 21
  • 39
12
votes
6 answers

How to get the number of syllables in a word?

I have already gone through this post which uses nltk's cmudict for counting the number of syllables in a word: from nltk.corpus import cmudict d = cmudict.dict() def nsyl(word): return [len(list(y for y in x if y[-1].isdigit())) for x in…
Dawny33
  • 8,296
  • 12
  • 48
  • 104
8
votes
2 answers

Pros/Cons of stop word removal?

What are the pros / cons of removing stop words from text in the context of a text classification problem, I'm wondering what the best approach is (i.e. to remove or not to remove)? I've read somewhere (but can't locate the reference) that it may be…
Jimmy Collins
  • 253
  • 2
  • 4
6
votes
3 answers

Stemmer/lemmatizer for Polish language

I'm looking for a stemmer/lemmatizer for Polish language, preferably in Python. What would you recommend? I have a list of ingredients in a recipe. Plural forms are inflected differently, depending on the counter, e.g.: for tomatoes 5 pomidorów 2…
dzieciou
  • 697
  • 1
  • 6
  • 15
6
votes
3 answers

Fine-tuning LLM or prompting engineering?

For some type of chatbot, like a customized chatbot, may it be better to fine-tune the LM for the domain-specific data and the type of question instead of relying on the prompt as prompts have to be processed by the LM every time? Another reason is…
Frank
  • 105
  • 5
5
votes
1 answer

Named entity disambiguation contests

I am interested in the field of named entity disambiguation and want to learn more about it. I have heard that there are contests organised by various associations on these kind of research topics. These contests are very helpful as they give a…
AvinashK
  • 151
  • 3
4
votes
1 answer

Stemmer or dictionary?

I have recently ported a stemmer from Java to Python for a highly inflectional language. The stemmer learns how to change suffixes from the dictionary of words and their inflected forms. It basically builds a stemming table with learned stemming…
dzieciou
  • 697
  • 1
  • 6
  • 15
4
votes
2 answers

Removing junk sentences

I have transcripts of phone calls with customers and agents. I'm trying to find promises which were made by an agent to a customer. I already did punctuation restoration. But there are a lot of sentences that don't have any sense. I would like to…
illuminato
  • 308
  • 1
  • 9
4
votes
1 answer

Good-Turing Smoothing Intuition

I'm working through the Coursera NLP course by Jurafsky & Manning, and the lecture on Good-Turing smoothing struck me odd. The example given was: You are fishing (a scenario from Josh Goodman), and caught: 10 carp, 3 perch, 2 whitefish, 1 trout,…
Ghillie Dhu
  • 141
  • 4
4
votes
1 answer

Extract key phrases from a single document

I need to extract relevant key phrases from a single document. Since I don't have a lot of documents, TF-IDF doesn't really work. Currently I'm using TextRank. It produces okay-ish result - some really good phrases along with a lot of garbage. Is…
4
votes
2 answers

Will LLMs accumulate its skills after each time it is taught by one in-context learning?

If the model’s parameters aren’t updated during the in-context learning (ICL), is the skills it just learned during the current ICL be kept/saved somehow in the model by some other way other than parameters? Put it in another way, will LLMs…
Frank
  • 105
  • 5
3
votes
1 answer

NLP Text Summarization - which metrics to use in evaluation?

I'm trying to implement Text Summarization task using different algorithms and libraries. To evaluate which one gave the best result I need some metrics. I have read about the Bleu and Rouge metrics but as I have understand both of them need the…
Jane Mänd
  • 349
  • 3
  • 9
3
votes
2 answers

Can natural language generation algorithms generate valid words too?

Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. — Wikipedia Is NLG about building meaningful…
sashank
  • 131
  • 4
3
votes
1 answer

High / low resources language : what does it mean?

In NLP, languages are often referred as low resource or high resource. What do these terms mean?
Astariul
  • 1,004
  • 8
  • 18
3
votes
1 answer

Word taxonomies for Facebook likes categories

I query the Facebook graph API to get some users' likes, which come with a "category" field, which can be, for instance, Italian Restaurant or Health & Wellness website and so on. I need to draw a profile of the user so I was thinking of retrieving…
martina.physics
  • 255
  • 2
  • 8
1
2 3
8 9