经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » Elasticsearch » 查看文章
elasticsearch之日期类型有点怪
来源:cnblogs  作者:无风听海  时间:2023/2/15 9:21:15  对本文有异议

一、Date类型简介

elasticsearch通过JSON格式来承载数据的,而JSON中是没有Date对应的数据类型的,但是elasticsearch可以通过以下三种方式处理JSON承载的Date数据

  • 符合特定格式化的日期字符串;
  • 基于milliseconds-since-the-epoch的一个长整型数字;
  • 基于seconds-since-the-epoch的一个长整型数字;

索引数据的时候,elasticsearch内部会基于UTC时间,将传入的数据转化为基于milliseconds-since-the-epoch的一个长整型数字;查询数据的时候,elasticsearch内部会将查询转化为range查询;

二、测试数据准备

创建mapping,设置create_date的type为date

  1. PUT my_date_index
  2. {
  3. "mappings": {
  4. "_doc": {
  5. "properties": {
  6. "create_date": {
  7. "type": "date"
  8. }
  9. }
  10. }
  11. }
  12. }

索引以下三个document

  1. PUT my_date_index/_doc/1
  2. { "create_date": "2015-01-01" }
  3. PUT my_date_index/_doc/2
  4. { "create_date": "2015-01-01T12:10:30Z" }
  5. PUT my_date_index/_doc/3
  6. { "create_date": 1420070400001 }

三、日期查询的诡异之处

我们希望可以通过以下查询命中2015-01-01的记录

  1. POST my_date_index/_search
  2. {
  3. "query": {
  4. "term": {
  5. "create_date": "2015-01-01"
  6. }
  7. }
  8. }

查看执行结果发现命中了三条数据

  1. {
  2. "took" : 0,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 5,
  6. "successful" : 5,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : 3,
  12. "max_score" : 1.0,
  13. "hits" : [
  14. {
  15. "_index" : "my_date_index",
  16. "_type" : "_doc",
  17. "_id" : "2",
  18. "_score" : 1.0,
  19. "_source" : {
  20. "create_date" : "2015-01-01T12:10:30Z"
  21. }
  22. },
  23. {
  24. "_index" : "my_date_index",
  25. "_type" : "_doc",
  26. "_id" : "1",
  27. "_score" : 1.0,
  28. "_source" : {
  29. "create_date" : "2015-01-01"
  30. }
  31. },
  32. {
  33. "_index" : "my_date_index",
  34. "_type" : "_doc",
  35. "_id" : "3",
  36. "_score" : 1.0,
  37. "_source" : {
  38. "create_date" : 1420070400001
  39. }
  40. }
  41. ]
  42. }
  43. }

通过以下可以看到elasticsearch内部确实将查询重写为一个范围查询create_date:[1420070400000 TO 1420156799999]

  1. POST my_date_index/_search
  2. {
  3. "profile": "true",
  4. "query": {
  5. "term": {
  6. "create_date": "2015-01-01"
  7. }
  8. }
  9. }
  10. {
  11. "id" : "[eD2KQtMGSla7jzJQBQVAfQ][my_date_index][0]",
  12. "searches" : [
  13. {
  14. "query" : [
  15. {
  16. "type" : "IndexOrDocValuesQuery",
  17. "description" : "create_date:[1420070400000 TO 1420156799999]",
  18. "time_in_nanos" : 2101,
  19. "breakdown" : {
  20. "score" : 0,
  21. "build_scorer_count" : 0,
  22. "match_count" : 0,
  23. "create_weight" : 2100,
  24. "next_doc" : 0,
  25. "match" : 0,
  26. "create_weight_count" : 1,
  27. "next_doc_count" : 0,
  28. "score_count" : 0,
  29. "build_scorer" : 0,
  30. "advance" : 0,
  31. "advance_count" : 0
  32. }
  33. }
  34. ],
  35. "rewrite_time" : 2200,
  36. "collector" : [
  37. {
  38. "name" : "CancellableCollector",
  39. "reason" : "search_cancelled",
  40. "time_in_nanos" : 700,
  41. "children" : [
  42. {
  43. "name" : "SimpleTopScoreDocCollector",
  44. "reason" : "search_top_hits",
  45. "time_in_nanos" : 200
  46. }
  47. ]
  48. }
  49. ]
  50. }
  51. ],
  52. "aggregations" : [ ]
  53. }

接下来我们来分析一下Date数据类型的term查询

我们可以看到termQuery查询直接调用了rangeQuery,并将传入的日期参数作为range的两个范围值;

  1. DateFieldType
  2. @Override
  3. public Query termQuery(Object value, @Nullable QueryShardContext context) {
  4. Query query = rangeQuery(value, value, true, true, ShapeRelation.INTERSECTS, null, null, context);
  5. if (boost() != 1f) {
  6. query = new BoostQuery(query, boost());
  7. }
  8. return query;
  9. }

rangeQuery中会调用parseToMilliseconds计算查询的两个范围值

  1. DateFieldType
  2. @Override
  3. public Query rangeQuery(Object lowerTerm, Object upperTerm, boolean includeLower, boolean includeUpper, ShapeRelation relation,
  4. @Nullable DateTimeZone timeZone, @Nullable DateMathParser forcedDateParser, QueryShardContext context) {
  5. failIfNotIndexed();
  6. if (relation == ShapeRelation.DISJOINT) {
  7. throw new IllegalArgumentException("Field [" + name() + "] of type [" + typeName() +
  8. "] does not support DISJOINT ranges");
  9. }
  10. DateMathParser parser = forcedDateParser == null
  11. ? dateMathParser
  12. : forcedDateParser;
  13. long l, u;
  14. if (lowerTerm == null) {
  15. l = Long.MIN_VALUE;
  16. } else {
  17. l = parseToMilliseconds(lowerTerm, !includeLower, timeZone, parser, context);
  18. if (includeLower == false) {
  19. ++l;
  20. }
  21. }
  22. if (upperTerm == null) {
  23. u = Long.MAX_VALUE;
  24. } else {
  25. u = parseToMilliseconds(upperTerm, includeUpper, timeZone, parser, context);
  26. if (includeUpper == false) {
  27. --u;
  28. }
  29. }
  30. Query query = LongPoint.newRangeQuery(name(), l, u);
  31. if (hasDocValues()) {
  32. Query dvQuery = SortedNumericDocValuesField.newSlowRangeQuery(name(), l, u);
  33. query = new IndexOrDocValuesQuery(query, dvQuery);
  34. }
  35. return query;
  36. }

通过以下代码可以看到,左边界的值会覆盖new MutableDateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC)对应的位置的数字,右边界的值会覆盖ew MutableDateTime(1970, 1, 1, 23, 59, 59, 999, DateTimeZone.UTC)对应位置的数字;所以我们查询中输入2015-01-01,相当于查询这一天之内的所有记录;

  1. JodaDateMathParser
  2. private long parseDateTime(String value, DateTimeZone timeZone, boolean roundUpIfNoTime) {
  3. DateTimeFormatter parser = dateTimeFormatter.parser;
  4. if (timeZone != null) {
  5. parser = parser.withZone(timeZone);
  6. }
  7. try {
  8. MutableDateTime date;
  9. // We use 01/01/1970 as a base date so that things keep working with date
  10. // fields that are filled with times without dates
  11. if (roundUpIfNoTime) {
  12. date = new MutableDateTime(1970, 1, 1, 23, 59, 59, 999, DateTimeZone.UTC);
  13. } else {
  14. date = new MutableDateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC);
  15. }
  16. final int end = parser.parseInto(date, value, 0);
  17. if (end < 0) {
  18. int position = ~end;
  19. throw new IllegalArgumentException("Parse failure at index [" + position + "] of [" + value + "]");
  20. } else if (end != value.length()) {
  21. throw new IllegalArgumentException("Unrecognized chars at the end of [" + value + "]: [" + value.substring(end) + "]");
  22. }
  23. return date.getMillis();
  24. } catch (IllegalArgumentException e) {
  25. throw new ElasticsearchParseException("failed to parse date field [{}] with format [{}]", e, value,
  26. dateTimeFormatter.pattern());
  27. }
  28. }

一般我们使用的日期都是精确到秒,那么只要我们将输入数据精确到秒基本上就可以命中记录;如果还是命中多个记录,那么就需要将数据的精度提高到毫秒,并且查询输入的时候也需要带上毫秒;

  1. POST my_date_index/_search
  2. {
  3. "query": {
  4. "term": {
  5. "create_date": "2015-01-01T12:10:30Z"
  6. }
  7. }
  8. }
  9. {
  10. "took" : 10,
  11. "timed_out" : false,
  12. "_shards" : {
  13. "total" : 5,
  14. "successful" : 5,
  15. "skipped" : 0,
  16. "failed" : 0
  17. },
  18. "hits" : {
  19. "total" : 1,
  20. "max_score" : 1.0,
  21. "hits" : [
  22. {
  23. "_index" : "my_date_index",
  24. "_type" : "_doc",
  25. "_id" : "2",
  26. "_score" : 1.0,
  27. "_source" : {
  28. "create_date" : "2015-01-01T12:10:30Z"
  29. }
  30. }
  31. ]
  32. }
  33. }

四、自定义时间字符串的解析格式

elasticsearch中date默认的日期格式是表征epoch_millis的长整型数字或者符合strict_date_optional_time格式的字符串;

  1. public static final DateFormatter DEFAULT_DATE_TIME_FORMATTER = DateFormatter.forPattern("strict_date_optional_time||epoch_millis");

strict_date_optional_time
strict限制时间字符串中的年月日部分必须是4、2、2个数字,不足部分在前边补0,例如20230123;
date_optional_time则要求字符串可以不包含时间部分,但是必须包含日期部分;

strict_date_optional_time支持的完整的时间格式如下

  1. date-opt-time = date-element ['T' [time-element] [offset]]
  2. date-element = std-date-element | ord-date-element | week-date-element
  3. std-date-element = yyyy ['-' MM ['-' dd]]
  4. ord-date-element = yyyy ['-' DDD]
  5. week-date-element = xxxx '-W' ww ['-' e]
  6. time-element = HH [minute-element] | [fraction]
  7. minute-element = ':' mm [second-element] | [fraction]
  8. second-element = ':' ss [fraction]
  9. fraction = ('.' | ',') digit+

我们使用2015/01/01搜索的时候,elasticsearch无法解析就会报错

  1. POST my_date_index/_search
  2. {
  3. "profile": "true",
  4. "query": {
  5. "term": {
  6. "create_date": "2015/01/01"
  7. }
  8. }
  9. }
  10. {
  11. "error": {
  12. "root_cause": [
  13. {
  14. "type": "parse_exception",
  15. "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]"
  16. }
  17. ],
  18. "type": "search_phase_execution_exception",
  19. "reason": "all shards failed",
  20. "phase": "query",
  21. "grouped": true,
  22. "failed_shards": [
  23. {
  24. "shard": 0,
  25. "index": "my_date_index",
  26. "node": "eD2KQtMGSla7jzJQBQVAfQ",
  27. "reason": {
  28. "type": "query_shard_exception",
  29. "reason": "failed to create query: {\n \"term\" : {\n \"create_date\" : {\n \"value\" : \"2015/01/01\",\n \"boost\" : 1.0\n }\n }\n}",
  30. "index_uuid": "9MTRkZcMTnK8GgK9vKwUuA",
  31. "index": "my_date_index",
  32. "caused_by": {
  33. "type": "parse_exception",
  34. "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]",
  35. "caused_by": {
  36. "type": "illegal_argument_exception",
  37. "reason": "Unrecognized chars at the end of [2015/01/01]: [/01/01]"
  38. }
  39. }
  40. }
  41. }
  42. ],
  43. "caused_by": {
  44. "type": "parse_exception",
  45. "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]",
  46. "caused_by": {
  47. "type": "illegal_argument_exception",
  48. "reason": "Unrecognized chars at the end of [2015/01/01]: [/01/01]"
  49. }
  50. }
  51. },
  52. "status": 400
  53. }

我们可以在mapping或者在搜索的时候指定format

  1. POST my_date_index/_search
  2. {
  3. "query": {
  4. "range" : {
  5. "create_date" : {
  6. "gte": "2015/01/01",
  7. "lte": "2015/01/01",
  8. "format": "yyyy/MM/dd"
  9. }
  10. }
  11. }
  12. }
  13. {
  14. "took" : 1,
  15. "timed_out" : false,
  16. "_shards" : {
  17. "total" : 5,
  18. "successful" : 5,
  19. "skipped" : 0,
  20. "failed" : 0
  21. },
  22. "hits" : {
  23. "total" : 3,
  24. "max_score" : 1.0,
  25. "hits" : [
  26. {
  27. "_index" : "my_date_index",
  28. "_type" : "_doc",
  29. "_id" : "2",
  30. "_score" : 1.0,
  31. "_source" : {
  32. "create_date" : "2015-01-01T12:10:30Z"
  33. }
  34. },
  35. {
  36. "_index" : "my_date_index",
  37. "_type" : "_doc",
  38. "_id" : "1",
  39. "_score" : 1.0,
  40. "_source" : {
  41. "create_date" : "2015-01-01"
  42. }
  43. },
  44. {
  45. "_index" : "my_date_index",
  46. "_type" : "_doc",
  47. "_id" : "3",
  48. "_score" : 1.0,
  49. "_source" : {
  50. "create_date" : 1420070400001
  51. }
  52. }
  53. ]
  54. }
  55. }

原文链接:https://www.cnblogs.com/wufengtinghai/p/17121480.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号