搜索嵌套子文档

本节介绍可用于搜索深度嵌套文档的潜在技术,展示如何使用 Solr 的一些查询解析器和文档转换器构建更复杂的查询。

这些功能需要在架构中声明 _root__nest_path_。有关架构和索引配置的详细信息,请参阅 索引嵌套文档

本节不演示嵌套文档的分面。有关嵌套文档分面,请参阅 块连接分面计数 部分。

查询示例

对于即将到来的示例,我们将假设一个索引包含 索引嵌套文档 中涵盖的相同文档

[{ "id": "P11!prod",
   "name_s": "Swingline Stapler",
   "description_t": "The Cadillac of office staplers ...",
   "skus": [ { "id": "P11!S21",
               "color_s": "RED",
               "price_i": 42,
               "manuals": [ { "id": "P11!D41",
                              "name_s": "Red Swingline Brochure",
                              "pages_i":1,
                              "content_t": "..."
                            } ]
             },
             { "id": "P11!S31",
               "color_s": "BLACK",
               "price_i": 3
             } ],
   "manuals": [ { "id": "P11!D51",
                  "name_s": "Quick Reference Guide",
                  "pages_i":1,
                  "content_t": "How to use your stapler ..."
                },
                { "id": "P11!D61",
                  "name_s": "Warranty Details",
                  "pages_i":42,
                  "content_t": "... lifetime guarantee ..."
                } ]
 },
 { "id": "P22!prod",
   "name_s": "Mont Blanc Fountain Pen",
   "description_t": "A Premium Writing Instrument ...",
   "skus": [ { "id": "P22!S22",
               "color_s": "RED",
               "price_i": 89,
               "manuals": [ { "id": "P22!D42",
                              "name_s": "Red Mont Blanc Brochure",
                              "pages_i":1,
                              "content_t": "..."
                            } ]
             },
             { "id": "P22!S32",
               "color_s": "BLACK",
               "price_i": 67
             } ],
   "manuals": [ { "id": "P22!D52",
                  "name_s": "How To Use A Pen",
                  "pages_i":42,
                  "content_t": "Start by removing the cap ..."
                } ]
 } ]

子文档转换器

默认情况下,与查询匹配的文档不会在响应中包含任何嵌套子项。[child] 文档转换器可用于使用文档的后代丰富查询结果。

有关此转换器的详细说明及其语法和限制的具体信息,请参阅部分 [child - ChildDocTransformerFactory]。

一个简单的查询,匹配所有描述中包含“订书机”的文档

$ curl 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=description_t:staplers'
{
  "response":{"numFound":1,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
      {
        "id":"P11!prod",
        "name_s":"Swingline Stapler",
        "description_t":"The Cadillac of office staplers ...",
        "_version_":1672933224035123200}]
  }}

下面显示了添加了 [child] 转换器的相同查询。请注意,numFound 并未更改,我们仍在匹配同一组文档,但在返回这些文档时,嵌套子项也会作为伪字段返回。

$ curl 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=description_t:staplers&fl=*,[child]'
{
  "response":{"numFound":1,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
      {
        "id":"P11!prod",
        "name_s":"Swingline Stapler",
        "description_t":"The Cadillac of office staplers ...",
        "_version_":1672933224035123200,
        "skus":[
          {
            "id":"P11!S21",
            "color_s":"RED",
            "price_i":42,
            "_version_":1672933224035123200,
            "manuals":[
              {
                "id":"P11!D41",
                "name_s":"Red Swingline Brochure",
                "pages_i":1,
                "content_t":"...",
                "_version_":1672933224035123200}]},

          {
            "id":"P11!S31",
            "color_s":"BLACK",
            "price_i":3,
            "_version_":1672933224035123200}],
        "manuals":[
          {
            "id":"P11!D51",
            "name_s":"Quick Reference Guide",
            "pages_i":1,
            "content_t":"How to use your stapler ...",
            "_version_":1672933224035123200},

          {
            "id":"P11!D61",
            "name_s":"Warranty Details",
            "pages_i":42,
            "content_t":"... lifetime guarantee ...",
            "_version_":1672933224035123200}]}]
  }}

子查询解析器

{!child} 查询解析器可用于搜索与包装查询匹配的父文档的后代文档。有关此解析器的详细说明,请参阅部分 块连接子查询解析器

让我们再次考虑上面使用的 description_t:staplers 查询——如果我们将该查询包装在 {!child} 查询解析器中,那么我们不会“匹配”和返回产品级别文档,而是匹配原始查询的所有后代子文档

$ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' -d 'q={!child of="*:* -_nest_path_:*"}description_t:staplers'
{
  "response":{"numFound":5,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
      {
        "id":"P11!D41",
        "name_s":"Red Swingline Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1672933224035123200},
      {
        "id":"P11!S21",
        "color_s":"RED",
        "price_i":42,
        "_version_":1672933224035123200},
      {
        "id":"P11!S31",
        "color_s":"BLACK",
        "price_i":3,
        "_version_":1672933224035123200},
      {
        "id":"P11!D51",
        "name_s":"Quick Reference Guide",
        "pages_i":1,
        "content_t":"How to use your stapler ...",
        "_version_":1672933224035123200},
      {
        "id":"P11!D61",
        "name_s":"Warranty Details",
        "pages_i":42,
        "content_t":"... lifetime guarantee ...",
        "_version_":1672933224035123200}]
  }}

在此示例中,我们使用 *:* -_nest_path_:* 作为我们的 of 参数,以表明我们希望考虑所有没有嵌套路径的文档,即所有“根”级别文档,作为可能的父文档集。

通过更改 of 参数以匹配特定 _nest_path_ 级别上的祖先,我们可以缩小返回的子文档列表。在下面的查询中,我们搜索所有 skus 的后代(使用 of 参数标识所有 _nest_path_ 前缀不为 /skus/* 的文档),其 price_i 小于 50

$ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!child of="*:* -_nest_path_:\\/skus\\/*"}(+price_i:[* TO 50] +_nest_path_:\/skus)'
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[
      {
        "id":"P11!D41",
        "name_s":"Red Swingline Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1675662666752851968}]
  }}
of 中双重转义 _nest_path_ 斜杠

请注意,在上面的示例中, _nest_path_ 中的 / 字符在 of 参数中“双重转义”

  • 需要一层 \ 转义才能防止 / 被解释为 正则表达式查询

  • 需要对“转义字符”进行额外的转义,因为 of 本地参数是一个带引号的字符串;因此,我们需要第二个 \ 来确保第一个 \ 被保留并按原样传递给查询解析器。

(你可以看到,在查询字符串的主体中只需要一层 \ 转义即可防止正则表达式语法,因为它不是带引号的字符串本地参数)。

你可能会发现,将 参数引用其他解析器 结合使用会更方便,这些解析器不将 / 视为特殊字符,以便以更详细的形式表达相同的查询

$ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!child of=$block_mask}(+price_i:[* TO 50] +{!field f="_nest_path_" v="/skus"})' --data-urlencode 'block_mask=(*:* -{!prefix f="_nest_path_" v="/skus/"})'

父查询解析器

{!child} 查询解析器的反向是 {!parent} 查询解析器,它允许你搜索与包装查询匹配的某些子文档的祖先文档。有关此解析器的详细说明,请参阅 块连接父查询解析器 一节。

我们首先考虑这个示例,搜索所有正好有 1 页的“手册”类型文档

$ curl 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=pages_i:1'
{
  "response":{"numFound":3,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[
      {
        "id":"P11!D41",
        "name_s":"Red Swingline Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1676585794196733952},
      {
        "id":"P11!D51",
        "name_s":"Quick Reference Guide",
        "pages_i":1,
        "content_t":"How to use your stapler ...",
        "_version_":1676585794196733952},
      {
        "id":"P22!D42",
        "name_s":"Red Mont Blanc Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1676585794347728896}]
  }}

我们可以将该查询包装在 {!parent} 查询中,以返回所有手册的祖先产品的详细信息

$ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!parent which="*:* -_nest_path_:*"}(+_nest_path_:\/skus\/manuals +pages_i:1)'
{
  "response":{"numFound":2,"start":0,"maxScore":1.4E-45,"numFoundExact":true,"docs":[
      {
        "id":"P11!prod",
        "name_s":"Swingline Stapler",
        "description_t":"The Cadillac of office staplers ...",
        "_version_":1676585794196733952},
      {
        "id":"P22!prod",
        "name_s":"Mont Blanc Fountain Pen",
        "description_t":"A Premium Writing Instrument ...",
        "_version_":1676585794347728896}]
  }}

在此示例中,我们使用了 *:* -_nest_path_:* 作为我们的 which 参数,以表明我们希望将所有没有嵌套路径的文档(即所有“根”级别文档)视为可能的父级集合。

通过将 which 参数更改为匹配特定 _nest_path_ 级别上的祖先,我们可以更改返回的祖先类型。在下面的查询中,我们搜索 skus(使用 which 参数标识所有 没有 _nest_path_ 的文档,前缀为 /skus/*),它们是具有正好 1 页的 manuals 的祖先

$ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!parent which="*:* -_nest_path_:\\/skus\\/*"}(+_nest_path_:\/skus\/manuals +pages_i:1)'
{
  "response":{"numFound":2,"start":0,"maxScore":1.4E-45,"numFoundExact":true,"docs":[
      {
        "id":"P11!S21",
        "color_s":"RED",
        "price_i":42,
        "_version_":1676585794196733952},
      {
        "id":"P22!S22",
        "color_s":"RED",
        "price_i":89,
        "_version_":1676585794347728896}]
  }}

请注意,在上面的示例中, _nest_path_ 中的 / 字符在 which 参数中“双重转义”,原因 与上面讨论的 {!child} pasers `of 参数相同。

将 Block Join 查询解析器与 Child Doc Transformer 结合使用

这两个解析器与 `[child] transformer` 的组合可以无缝创建非常强大的查询。

例如,这里有一个查询

  • 返回的 (sku) 文档必须具有“RED”的颜色

  • 返回的 (sku) 文档必须是具有以下内容的根级别 (product) 文档的后代

    • 直接子级“manuals”文档具有

      • 在其内容中“终身保修”

  • 每个返回的 (sku) 文档还包括它拥有的任何后代 (manuals) 文档

$ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' -d 'fq=color_s:RED' --data-urlencode 'q={!child of="*:* -_nest_path_:*" filters=$parent_fq}' --data-urlencode 'parent_fq={!parent which="*:* -_nest_path_:*"}(+_nest_path_:"/manuals" +content_t:"lifetime guarantee")' -d 'fl=*,[child]'
{
  "response":{"numFound":1,"start":0,"maxScore":1.4E-45,"numFoundExact":true,"docs":[
      {
        "id":"P11!S21",
        "color_s":"RED",
        "price_i":42,
        "_version_":1676585794196733952,
        "manuals":[
          {
            "id":"P11!D41",
            "name_s":"Red Swingline Brochure",
            "pages_i":1,
            "content_t":"...",
            "_version_":1676585794196733952}]}]
  }}