0

What is the expected output when we run the Elasticsearch kuromoji plugin : Using the number and reading form filter the code is not working as it should. but if they are used separately it is working properly.


    PUT test
    {
      "settings": {
        "index": {
          "analysis": {
            "filter": {
              "kuromoji_number": {
                "type": "kuromoji_number"
              },
              "kuromoji_readingform": {
                "type": "kuromoji_readingform"
              }
            },
            "tokenizer": {
              "kuromoji": {
                "type": "kuromoji_tokenizer"
              }
            }
          }
        }
      }
    }

    GET /test/_analyze
    {
      "text": "一〇〇〇",
      "tokenizer": "kuromoji",
      "filter": [
        "kuromoji_number",
        "kuromoji_readingform"
      ]
    }

should the output be like :

    {
      "tokens": [
        {
          "token": "一",
          "number": 1,
          "reading_form": "ichi"
        },
        {
          "token": "〇",
          "number": 0,
          "reading_form": "zero"
        },
        {
          "token": "〇",
          "number": 0,
          "reading_form": "zero"
        },
        {
          "token": "〇",
          "number": 0,
          "reading_form": "zero"
        }
      ]
    }

or like this

    {
      "tokens" : [
        {
          "token" : "〇",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "word",
          "position" : 0
        }
      ]
    }

How to understand how the plugin would work in case of 2 filters.

I am getting the 2nd output , but shouldn't the ideal answer be the 1st output.

Aman
  • 13
  • 5

0 Answers0