Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titleindex에 analyzer 추가
echo '{
    "analysis": {
        "tokenizer": {
            "nori_user_dict": {
                "type": "nori_tokenizer",
                "decompound_mode": "mixed",
                "user_dictionary": "userdict_ko.txt"
            }
        },
        "analyzer": {
            "my_analyzer": {
                "type": "custom",
                "tokenizer": "nori_user_dict"
            },
            "korean": {
                "type": "nori",
                "stopwords": "_korean_"
            }
        }
    }
}' |  \
  http PUT http://localhost:9200/nori_test/_settings \
  Content-Type:application/json

...

Code Block
languagebash
titleopen index
http POST http://localhost:9200/nori_test/_open \
  Content-Type:application/json


새로운 analyzer로 형태소 분석을 해보면 지리산과 남악제를 잘 처리해 주는 걸 볼 수 있습니다. 

Code Block
languagebash
title형태소 분석
echo '{
  "analyzer": "my_analyzer",
  "text" : "지리산남악제 및 군민의날."
}
' |  \
  http GET http://localhost:9200/nori_test/_analyze \
  Content-Type:application/json
Expand
Code Block
languagejs
title형태소 분석 결과
{
    "tokens": [
        {
            "token": "지리산",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 0
        },
        {
            "token": "남악제",
            "start_offset": 3,
            "end_offset": 6,
            "type": "word",
            "position": 1
        },
        {
            "token": "및",
            "start_offset": 7,
            "end_offset": 8,
            "type": "word",
            "position": 2
        },
        {
            "token": "군민",
            "start_offset": 9,
            "end_offset": 11,
            "type": "word",
            "position": 3
        },
        {
            "token": "의",
            "start_offset": 11,
            "end_offset": 12,
            "type": "word",
            "position": 4
        },
        {
            "token": "날",
            "start_offset": 12,
            "end_offset": 13,
            "type": "word",
            "position": 5
        }
    ]
}



Ref

...