提问



我将使用一个具体的,但假设的例子。


每个订单通常只有一个订单项:


订单:


OrderGUID   OrderNumber
=========   ============
{FFB2...}   STL-7442-1      
{3EC6...}   MPT-9931-8A


了LineItem:


LineItemGUID   Order ID Quantity   Description
============   ======== ========   =================================
{098FBE3...}   1        7          prefabulated amulite
{1609B09...}   2        32         spurving bearing


但偶尔会有一个包含两个订单项的订单:


LineItemID   Order ID    Quantity   Description
==========   ========    ========   =================================
{A58A1...}   6,784,329   5          pentametric fan
{0E9BC...}   6,784,329   5          differential girdlespring 


通常在向用户显示订单时:


SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    INNER JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID


我想在订单上显示单个项目。但是,如果此订单偶尔包含两个(或更多)商品,那么订单将显示 重复:


OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         spurving bearing
KSG-0619-81   5          panametric fan
KSG-0619-81   5          differential girdlespring


我真正想要的是让SQL Server 选择一个,因为它足够好:


OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         differential girdlespring
KSG-0619-81   5          panametric fan


如果我喜欢冒险,我可能会向用户显示一个省略号,表示不止一个:


OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         differential girdlespring
KSG-0619-81   5          panametric fan, ...


所以问题是如何做到



  • 消除重复行

  • 仅加入其中一行,以避免重复



首次尝试



我的第一个天真的尝试是只加入 TOP 1 订单项:


SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    INNER JOIN (
       SELECT TOP 1 LineItems.Quantity, LineItems.Description
       FROM LineItems
       WHERE LineItems.OrderID = Orders.OrderID) LineItems2
    ON 1=1


但是这给出了错误:



  列或前缀Orders没有
  与表名或别名匹配
  在查询中使用。



大概是因为内部选择没有看到外表。

最佳参考


SELECT   Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM     Orders
JOIN     LineItems
ON       LineItems.LineItemGUID =
         (
         SELECT  TOP 1 LineItemGUID 
         FROM    LineItems
         WHERE   OrderID = Orders.OrderID
         )


SQL Server 2005及以上,你可以用CROSS APPLY替换INNER JOIN:


SELECT  Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM    Orders
CROSS APPLY
        (
        SELECT  TOP 1 LineItems.Quantity, LineItems.Description
        FROM    LineItems
        WHERE   LineItems.OrderID = Orders.OrderID
        ) LineItems2

其它参考1


我知道这个问题很久以前就得到了回答,但是在处理大型数据集时,嵌套查询的代价很高。这是一个不同的解决方案,嵌套查询只运行一次,而不是返回每一行。


SELECT 
  Orders.OrderNumber,
  LineItems.Quantity, 
  LineItems.Description
FROM 
  Orders
  INNER JOIN (
    SELECT
      Orders.OrderNumber,
      Max(LineItem.LineItemID) AS LineItemID
    FROM
      Orders INNER JOIN LineItems
      ON Orders.OrderNumber = LineItems.OrderNumber
    GROUP BY Orders.OrderNumber
  ) AS Items ON Orders.OrderNumber = Items.OrderNumber
  INNER JOIN LineItems 
  ON Items.LineItemID = LineItems.LineItemID

其它参考2


你可以这样做:


SELECT 
  Orders.OrderNumber, 
  LineItems.Quantity, 
  LineItems.Description
FROM 
  Orders INNER JOIN LineItems 
  ON Orders.OrderID = LineItems.OrderID
WHERE
  LineItems.LineItemID = (
    SELECT MIN(LineItemID) 
    FROM   LineItems
    WHERE  OrderID = Orders.OrderID
  )


这需要LineItems.LineItemID上的索引(或主键)和LineItems.OrderID上的索引,否则它将很慢。

其它参考3


@Quassnoi答案很好,在某些情况下(特别是如果外表很大),使用窗口函数可能会有更高效的查询,如下所示:


SELECT  Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM    Orders
LEFT JOIN 
        (
        SELECT  LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER()
                OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum
        FROM    LineItems

        ) LineItems2 ON LineItems2.OrderId = Orders.OrderID And RowNum = 1


有时您只需要测试哪个查询可以提供更好的性能。[29]

其它参考4


相关子查询是依赖于外部查询的子查询。它就像SQL中的for循环。对于外部查询中的每一行,子查询将运行一次:


select * from users join widgets on widgets.id = (
    select id from widgets
    where widgets.user_id = users.id
    order by created_at desc
    limit 1
)

其它参考5


编辑:没关系,Quassnoi有更好的答案。


对于SQL2K,这样的事情:


SELECT 
  Orders.OrderNumber
, LineItems.Quantity
, LineItems.Description
FROM (  
  SELECT 
    Orders.OrderID
  , Orders.OrderNumber
  , FirstLineItemID = (
      SELECT TOP 1 LineItemID
      FROM LineItems
      WHERE LineItems.OrderID = Orders.OrderID
      ORDER BY LineItemID -- or whatever else
      )
  FROM Orders
  ) Orders
JOIN LineItems 
  ON LineItems.OrderID = Orders.OrderID 
 AND LineItems.LineItemID = Orders.FirstLineItemID

其它参考6


,另一个使用公用表表达式的方法:


with firstOnly as (
    select Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() over (partiton by Orders.OrderID order by Orders.OrderID) lp
    FROM Orders
        join LineItems on Orders.OrderID = LineItems.OrderID
) select *
  from firstOnly
  where lp = 1


或者,最后也许你想显示所有加入的行?


逗号分隔版本:


  select *
  from Orders o
    cross apply (
        select CAST((select l.Description + ','
        from LineItems l
        where l.OrderID = s.OrderID
        for xml path('')) as nvarchar(max)) l
    ) lines

其它参考7


我通过使用LEFT JOIN和GROUP BY Orders.OrderNumber解决了类似的问题。有没有理由不这样做?


SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    LEFT JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID
GROUP BY Orders.OrderNumber





我会在你自己的问题中回答你的回答问题:


Orders             LineItems
+-------------+    +---------+----------+---------------+
| OrderNumber |    | OrderID | Quantity | Description   |
+-------------+    +---------+----------+---------------+
| 22586       |    | 22586   | 17       | Trunion       |
+-------------+    | 22586   | 3        | Girdle Spring |
                   +---------+----------+---------------+


在OrderNumber上将两者连接在一起给出:


OrderNumber  Quantity  Description
-----------  --------  -------------
22586        17        Trunion
22586        3         Girdle Spring

2 row(s) affected


我们希望它只返回一行:


OrderNumber  Quantity  Description
-----------  --------  -------------
22586        17        Trunion

1 row(s) affected





这就是我使用GROUP BY Orders.OrderNumber的原因,它只返回每个OrderNumber一行。

其它参考8


试过十字架,效果很好,但需要稍微长一些。调整后的行列具有最大值和添加组,可以保持速度并减少额外记录。


这是调整后的查询:


SELECT Orders.OrderNumber, max(LineItems.Quantity), max(LineItems.Description)
FROM Orders
    INNER JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID
Group by Orders.OrderNumber

其它参考9


我最喜欢的运行此查询的方法是使用not exists子句。我相信这是运行此类查询的最有效方法:


select o.OrderNumber,
       li.Quantity,
       li.Description
from Orders as o
inner join LineItems as li
on li.OrderID = o.OrderID
where not exists (
    select 1
    from LineItems as li_later
    where li_later.OrderID = o.OrderID
    and li_later.LineItemGUID > li.LineItemGUID
    )


但我没有针对此处建议的其他方法测试此方法。