诊断测试中的失败

副驾驶聊天可以分析测试失败并帮助识别潜在原因。

示例场景：测试在本地通过，但在 CI 中失败

考虑一个场景，在这个场景中，你的测试在本地计算机上通过，但在 CI 中有时会失败。副驾驶聊天可以帮助确定失败的原因。

在此示例中，所测试的代码定义了一个简单的订单服务（order.py），并且有一个相应的测试来检查订单是否是今天创建的（test_order_service.py）。

示例提示

以下提示提供含相关代码和测试文件（使用 #file:）的 Copilot，并且包含 CI 故障产生的相关摘录的复制/粘贴。

Copilot prompt

Please take a look at this CI failure message. The test passes locally, but intermittently fails in CI. Can you help me figure out if this looks like a code bug, environment issue, or a flaky test?

Failure:

___ TestOrderService.test_order_created_today ___
>       assert order["created_date"] == date.today()
E       AssertionError: assert datetime.date(2024, 1, 15) == datetime.date(2024, 1, 16)

test_order_service.py:45: AssertionError

#file:order.py
#file:test_order_service.py

Please take a look at this CI failure message. The test passes locally, but intermittently fails in CI. Can you help me figure out if this looks like a code bug, environment issue, or a flaky test?

Failure:

___ TestOrderService.test_order_created_today ___
>       assert order["created_date"] == date.today()
E       AssertionError: assert datetime.date(2024, 1, 15) == datetime.date(2024, 1, 16)

test_order_service.py:45: AssertionError

#file:order.py
#file:test_order_service.py

示例响应

注意

以下响应是示例。副驾驶聊天的回答是不确定的，因此你可能会得到与这里所显示的不同的回答。

副驾驶聊天注意到日期恰好相差一天，并识别到这可能是时区或时间边界问题。

本地计算机和 CI 运行程序可能使用不同的时区设置或派生 today 自不同的时钟（UTC 与本地时间），因此，当测试在午夜附近运行时， date.today() 可以在每个环境中返回不同的日期。

副驾驶聊天建议将故障视为环境/时间假设导致的测试不稳定性（而不是逻辑 Bug），并通过跨环境实现 today 计算方式标准化来解决这种情况。

示例方案 2：间歇性测试失败

假设测试有时通过，有时在同一台计算机上失败。副驾驶聊天可以比较成功和失败运行产生的日志，帮助识别原因。

在此示例中，受测代码使用 order_service.py 中的后台作业异步更新订单状态，并在 test_order_service.py 中测试断言最终状态为 "processed"。

示例提示

以下提示提供含失败消息、成功和失败运行的日志摘录以及相关代码文件（使用 #file:）的 Copilot。

Copilot prompt

This test passes sometimes and fails sometimes. Can you compare the logs and help me figure out why?

Failure message:

>       assert order.status == "processed"
E       AssertionError: assert "pending" == "processed"

test_order_service.py:62: AssertionError

Logs from a passing run:
[DEBUG] Created order #1234
[DEBUG] Background job started for order #1234
[DEBUG] Background job completed (52ms)
[DEBUG] Checking order status
[DEBUG] Order #1234 status: processed

Logs from the failing run:
[DEBUG] Created order #1234
[DEBUG] Background job started for order #1234
[DEBUG] Checking order status
[DEBUG] Order #1234 status: pending

#file:order_service.py
#file:test_order_service.py

This test passes sometimes and fails sometimes. Can you compare the logs and help me figure out why?

Failure message:

>       assert order.status == "processed"
E       AssertionError: assert "pending" == "processed"

test_order_service.py:62: AssertionError

Logs from a passing run:
[DEBUG] Created order #1234
[DEBUG] Background job started for order #1234
[DEBUG] Background job completed (52ms)
[DEBUG] Checking order status
[DEBUG] Order #1234 status: processed

Logs from the failing run:
[DEBUG] Created order #1234
[DEBUG] Background job started for order #1234
[DEBUG] Checking order status
[DEBUG] Order #1234 status: pending

#file:order_service.py
#file:test_order_service.py

示例响应

注意

以下响应是示例。副驾驶聊天的回答是不确定的，因此你可能会得到与这里所显示的不同的回答。

副驾驶聊天比较两个日志，注意到在成功运行中，后台作业在状态检查之前完成，而在失败运行中，在作业仍在运行时检查状态。副驾驶聊天注意到，由于测试不会等待后台作业完成，因此这是一种争用条件。

副驾驶聊天建议在断言之前添加一种机制，例如同步运行作业、等待完成（例如，通过回调）或轮询，以便确保作业完成。

在本文中

示例场景：测试在本地通过，但在 CI 中失败

示例提示

示例响应

示例方案 2：间歇性测试失败

示例提示

示例响应

延伸阅读