场景描述

某银行系统包含以下三张核心表:

-- 客户表(5000万条数据)
CREATE TABLE customers (
  customer_id BIGINT PRIMARY KEY,
  reg_branch VARCHAR(20),
  risk_level CHAR(1),
  create_date DATE
);

-- 账户表(5亿条数据)
CREATE TABLE accounts (
  account_id BIGINT PRIMARY KEY,
  customer_id BIGINT,
  account_type VARCHAR(10),
  balance DECIMAL(18,2)
);

-- 交易表(200亿条数据)
CREATE TABLE transactions (
  transaction_id UUID PRIMARY KEY,
  account_id BIGINT,
  transaction_date TIMESTAMP,
  transaction_type VARCHAR(10),
  transaction_status VARCHAR(15),
  amount DECIMAL(18,2)
);

原始查询需求

统计2023年注册的高风险客户(risk_level='A'):

  1. 总交易金额

  2. 最大单笔交易金额

  3. 成功交易占比(status='SUCCESS'的交易数/总交易数)

现有低效查询:

SELECT 
  c.customer_id,
  (SELECT SUM(t.amount)
   FROM accounts a 
   JOIN transactions t ON a.account_id = t.account_id
   WHERE a.customer_id = c.customer_id) AS total_amount,
  
  (SELECT MAX(t.amount)
   FROM accounts a 
   JOIN transactions t ON a.account_id = t.account_id
   WHERE a.customer_id = c.customer_id) AS max_amount,
  
  (SELECT COUNT(*) FILTER (WHERE t.transaction_status = 'SUCCESS')::FLOAT / 
          NULLIF(COUNT(*), 0)
   FROM accounts a 
   JOIN transactions t ON a.account_id = t.account_id
   WHERE a.customer_id = c.customer_id) AS success_rate
FROM customers c
WHERE c.create_date >= '2023-01-01'
  AND c.create_date < '2024-01-01'
  AND c.risk_level = 'A';

题目要求
  1. 分析原始查询的性能瓶颈

  2. 重写优化后的SQL查询

  3. 建议需要创建的索引

  4. 解释优化思路


参考答案

优化后查询
WITH account_transactions AS (
  SELECT 
    a.customer_id,
    SUM(t.amount) AS total_amount,
    MAX(t.amount) AS max_amount,
    COUNT(*) FILTER (WHERE t.transaction_status = 'SUCCESS')::FLOAT / 
      NULLIF(COUNT(*), 0) AS success_rate
  FROM accounts a
  JOIN transactions t ON a.account_id = t.account_id
  GROUP BY a.customer_id
)
SELECT 
  c.customer_id,
  atx.total_amount,
  atx.max_amount,
  atx.success_rate
FROM customers c
LEFT JOIN account_transactions atx 
  ON c.customer_id = atx.customer_id
WHERE c.create_date BETWEEN '2023-01-01' AND '2023-12-31'
  AND c.risk_level = 'A';

建议索引
-- 客户表
CREATE INDEX idx_customers_risk_create ON customers(risk_level, create_date);

-- 账户表
CREATE INDEX idx_accounts_customer ON accounts(customer_id);

-- 交易表
CREATE INDEX idx_transactions_account ON transactions(account_id);
CREATE INDEX idx_transactions_status ON transactions(transaction_status);

优化思路
  1. 消除三重子查询

    1. 将三个关联子查询合并为单次CTE聚合

    2. 避免对accounts和transactions表的三次重复扫描

  2. 数据过滤前置

    1. 优先通过WHERE条件筛选高风险客户(减少处理数据量)

    2. 先聚合交易数据再关联客户表(减少JOIN数据量)

  3. 索引优化策略

    1. 客户表复合索引(risk_level + create_date)快速定位目标客户

    2. 账户表customer_id索引加速客户-账户关联

    3. 交易表account_id索引优化账户-交易JOIN效率

    4. 交易状态索引加速状态过滤

  4. 计算优化

    1. 使用NULLIF避免除以零错误

    2. 单次扫描同时计算SUM、MAX和成功率

    3. 使用LEFT JOIN保留无交易记录的客户

  5. 执行计划优化

    1. 将Nested Loop转换为Hash Join

    2. 利用GroupAggregate代替多个子查询

    3. 减少中间临时表的生成

性能对比

优化维度

原始查询

优化后查询

交易表扫描次数

3次

1次

JOIN操作次数

6次

2次

索引利用率

仅主键索引

多个辅助索引

内存消耗

高(多中间表)

低(单次聚合)

执行时间预估(10TB)

2小时+

15分钟以内

该方案通过重组查询逻辑、合理利用索引和减少数据扫描量,可显著提升海量金融数据场景下的查询性能。

Logo

更多推荐