对象存储系统Swift技术详解：综述与概念（上）

about云腾讯认证空间

本帖最后由 pig2 于 2014-10-7 00:37 编辑
问题导读

1.Proxy Server的作用是什么？
2.对象服务器作用是什么？
3.容器服务器首要工作是什么？

1.Swift架构概述

1.1 Proxy Server 代理服务器

代理服务器负责Swift架构的其余组件间的相互通信。对于每个客户端的请求，它将在环中查询帐号、容器或者对象的位置并且相应地转发请求。也可以使用公共API向代理服务器发送请求。

代理服务器也处理大量的失败请求。例如，如果对于某个对象PUT请求时，某个存储节点不可用，它将会查询环可传送的服务器并转发请求。

对象以流的形式到达（来自）对象服务器，它们直接从代理服务器传送到（来自）用户—代理服务器并不缓冲它们。

1.2 The Ring 环

环表示存储在硬盘上的实体名称和物理位置间的映射。帐号、容器、对象都有相应的环。当swift的其它组件(比如复制)要对帐号、容器或对象操作时，需要查询相应的环来确定它在集群上的位置。

环使用区域、设备、虚节点和副本来维护这些映射信息。环中每个虚节点在集群中都(默认)有3个副本。每个虚节点的位置由环来维护,并存储在映射中。当代理服务器转发的客户端请求失败时，环也负责决定由哪一个设备来接手请求。

环使用了区域的概念来保证数据的隔离。每个虚节点的副本都确保放在了不同的区域中。一个区域可以是一个磁盘，一个服务器，一个机架，一个交换机，甚至是一个数据中心。

在swift安装的时候，环的虚节点会均衡地划分到所有的设备中。当虚节点需要移动时(例如新设备被加入到集群)，环会确保一次移动最少数量的虚节点数，并且一次只移动一个虚节点的一个副本。

权重可以用来平衡集群中虚节点在驱动器上的分布。例如，当不同大小的驱动器被用于集群中时就显得非常有用。

ring被代理服务器和一些后台程序使用（如replication）。

1.3 Object Server 对象服务器

对象服务器是一个简单的二进制大对象存储服务器，可以用来存储、检索和删除本地设备上的对象。在文件系统上，对象以二进制文件的形式存储，它的元数据存储在文件系统的扩展属性(xattrs)中。这要求用于对象服务器的文件系统需要支持文件有扩展属性。一些文件系统，如ext3，它的xattrs属性默认是关闭的。

每个对象使用对象名称的哈希值和操作的时间戳组成的路径来存储。最后一次写操作总可以成功，并确保最新一次的对象版本将会被处理。删除也被视为文件的一个版本（一个以".ts"结尾的0字节文件，ts表示墓碑）。这确保了被删除的文件被正确地复制并且不会因为遭遇故障场景导致早些的版本神奇再现。

1.4 Container Server 容器服务器

容器服务器的首要工作是处理对象的列表。容器服务器并不知道对象存在哪，只知道指定容器里存的哪些对象。这些对象信息以sqlite数据库文件的形式存储，和对象一样在集群上做类似的备份。容器服务器也做一些跟踪统计，比如对象的总数，容器的使用情况。

1.5 Account Server 帐号服务器

帐号服务器与容器服务器非常相似，除了它是负责处理容器的列表而不是对象。

1.6 Replication 复制

复制是设计在面临如网络中断或者驱动器故障等临时性故障情况时来保持系统的一致性。

复制进程将本地数据与每个远程拷贝比较以确保它们都包含有最新的版本。对象复制使用一个哈希列表来快速地比较每个虚节点的子段，容器和帐号的复制使用哈希值和共享的高水位线的组合进行版本比较。

复制更新基于推模式的。对于对象的复制，更新只是使用rsync同步文件到对等节点。帐号和容器的复制通过HTTP或rsync来推送整个数据库文件上丢失的记录。

复制器也确保数据已从系统中移除。当有一项（对象、容器、或者帐号）被删除，则一个墓碑文件被设置作为该项的最新版本。复制器将会检测到该墓碑文件并确保将它从整个系统中移除。

1.7 Updaters 更新器

在一些情况下，容器或帐号中的数据不会被立即更新。这种情况经常发生在系统故障或者是高负荷的情况下。如果更新失败，该次更新在本地文件系统上会被加入队列，然后更新器会继续处理这些失败了的更新工作。最终，一致性窗口将会起作用。例如，假设一个容器服务器处于负荷之下，此时一个新的对象被加入到系统。当代理服务器成功地响应客户端的请求，这个对象将变为直接可用的。但是容器服务器并没有更新对象列表，因此此次更新将进入队列等待延后的更新。所以，容器列表不可能马上就包含这个新对象。

在实际使用中，一致性窗口的大小和更新器的运行频度一致，因为代理服务器会转送列表请求给第一个响应的容器服务器，所以可能不会被注意到。当然，负载下的服务器不应该再去响应后续的列表请求，其他2个副本中的一个应该处理这些列表请求。

1.8 Auditors 审计器

审计器会在本地服务器上反复地爬取来检测对象、容器、帐号的完整性。一旦发现不完整的数据(例如，发生了bit rot的情况：可能改变代码)，该文件就会被隔离，然后复制器会从其他的副本那里把问题文件替换。如果其他错误出现(比如在任何一个容器服务器中都找不到所需的对象列表)，还会记录进日志。

2. The Rings 环

环决定数据在集群中的位置。帐号数据库、容器数据库和单个对象的环都有独立的环管理，不过每个环均以相同的方式工作。这些环被外部工具管理，服务器进程并不修改环，而是由其他工具修改并传送新的环。

环从路径的MD5哈希值中使用可配置的比特数，该比特位作为一个虚节点的索引来指派设备。从该哈希值中保留的比特数称为虚节点的幂，并且2的虚节点的幂次方表示虚节点的数量。使用完全MD5哈希值来划分，环允许集群的其他组件一次以分批的项来工作，这将更有效率地完成，或者至少比独立地处理每一个项或者整个集群同时工作的复杂度更低。

另一个可配置的值是副本数量，表示有多少个虚节点->设备分派来构成单个环。给定一个虚节点编号，每个副本的设备将不会与其它副本的设备在同一个区域内。区域可以基于物理位置、电力分隔、网络分隔或者其它可以减少多个副本在同个时间点上失效的属性用来聚合设备。

2.1 Ring Builder 环构造器

使用工具ring-builder来手动地构建和管理环。ring-builder将虚节点分配到设备并且生成一个优化的Python结构，之后打包(gzipped)、序列化(pickled)，保存到磁盘上，用以服务器的传送。服务器进程只是不定时地检测文件的修改时间，如果需要就重新加载环结构在内存中的拷贝。因为ring-builder管理环的变化的方式，使用一个稍旧的环仅意味对于的一小部分的虚节点，它的3个副本中的一个不正确，这还是容易解决的。

ring-builder也存有它本身关于环信息的构造器文件和额外所需用来构建新环的数据。保存多份构建器文件的备份拷贝非常重要。一种选择是当复制这些环文件时，复制这些构造器文件到每个服务器上。另一种这是上传构造器文件到集群中。构造器文件的完整性受损将意味着要重新创建一些新的环，几乎所有的虚节点将最终分配到不同的设备，因此几乎所有的数据将不得不复制到新的位置上。所以，从一个受损的构建器文件恢复是有可能的，但是会造成数据在一段时间内不可用。

2.2 Ring Data Structure 环数据结构

环的数据结构由三个顶层域组成：在集群中设备的列表；设备id列表的列表，表示虚节点到设备的指派；以及表示MD5 hash值位移的位数来计算该哈希值对应的虚节点。

2.2.1 List of Devices 设备列表

设备的列表在Ring类内部被称为devs。设备列表中的每一项为带有以下键的字典：

id	integer	所列设备中的索引
zone	integer	设备所在的区域
weight	float	该设备与其他设备的相对权重。这常常直接与设备的磁盘空间数量和其它设备的磁盘空间数量的比有关。例如，一个1T大小的设备有100的权重而一个2T大小的磁盘将有200的权重。这个权重也可以被用于恢复一个超出或少于所需数据的磁盘。一个良好的平均权重100考虑了灵活性，如果需要日后可以降低该权重。
ip	string	包含该设备的服务器IP地址
port	int	服务器进程所使用的TCP端口用来提供该设备的服务请求
device	string	sdb1 服务器上设备的磁盘名称。例如：sdb1
meta	string	存储设备额外信息的通用字段。该信息并不直接被服务器进程使用，但是在调试时会派上用场。例如，安装的日期和时间和硬件生产商可以存储在这。

注意：设备的列表可能包含了holes，或设为None的索引，表示已经从集群移除的设备。一般地，设备的id不会被重用。一些设备也可以通过设置权重为0.0来暂时地被禁用。为了获得有效设备的列表（例如，用于运行时间轮询），Python代码如下：devices = [device for device in self.devs if device and device['weight']]

2.2.2 Partition Assignment List 虚节点分配列表

这是设备id的array('I')组成的列表。列表中包含了每个副本的数组array('I')。每个array('I')的长度等于环的虚节点数。在array('I')中的每个整数是到上面设备列表的索引。虚节点列表在Ring类内部被称为_replica2part2dev_id。

因此，创建指派到一个虚节点的设备字典的列表，Python代码如下：devices =[self.devs[part2dev_id[partition]] for part2dev_id in self._replica2part2dev_id]

array('I')适合保存在内存中，因为可能有几百万个虚节点。

2.2.3 Partition Shift Value 虚节点位移值

虚节点的位移值在Ring类内部称为_part_shift。这个值用于转换一个MD5的哈希值来计算虚节点，对于那个哈希值是哪个数据。仅哈希值的前4个字节被用于这个过程。例如，为了计算路径/account/container/object的虚节点，Python代码如下：

partition = unpack_from('>I',md5('/account/container/object').digest())[0] >> self._part_shift

2.3 Building the Ring 构建环

环的初始化构建首先基于设备的权重来计算理想情况下分配给每个设备的虚节点数量。例如，如虚节点幂为20，则环有1,048,576个虚节点。如果有1000个相同权重的设备，那么它们每个分到1,048.576个虚节点。设备通过它们要求的虚节点数来排序，并在整个初始化过程中保持顺序。

然后，环构建器根据最适合的原则将每个虚节点的副本分配到设备，限制拥有相同虚节点的副本的设备不能在同一个区域中。每分配一次，设备要求的虚节点数减1并且移动到在设备列表中新的已排序的位置，然后进程继续执行。

当基于旧环来构造新环时，每个设备所需的虚节点数量被重新计算。接下来，将需要被重新分配的虚节点收集起来。所有被移除的设备将它们已分配的虚节点取消分配并把这些虚节点添加到收集列表。任何一个拥有比目前所需的虚结点数多的设备随机地取消分配虚结点并添加到收集列表中。最后，收集列表中的虚节点使用与上述初始化分配类似的方法被重新分配。

每当有虚节点的副本被重新分配，重分配的时间将被记录。我们考虑了当收集虚节点来重新分配时，没有虚节点在可配置的时间内被移动两次。这个可配置的时间数量在RingBuilder类内称为min_part_hours。这一限制对于已被移除的设备上的虚节点的副本被忽略，因为移除设备仅发生在设备故障并且此时别无择选只能进行重新分配。

由于收集虚节点用来重新分配的随机本性，以上的进程并不总可以完美地重新平衡一个环。为了帮助达到一个更平衡的环，重平衡进程被重复执行直到接近完美(小于1%）或者当平衡的提升达不到最小值1%（表明由于杂乱不平衡的区域或最近移动的虚节点数过多，我们可能不能获得完美的平衡）。

2.4 History 发展史

环的代码在到达当前版本并保持一段时间的稳定前发生了多次反复的修改，如果有新的想法产生，环的算法可能发生改变甚至从根本上发生变化。这一章节将会描述先前尝试过的想法并且解释为何它们被废弃了。

A “live ring” option was considered where each server could maintain its own copy of the ring and the servers would use a gossip protocol to communicate the changes they made. This was discarded as too complex and error prone to code correctly in the project time span available. One bug could easily gossip bad data out to the entire cluster and be difficult to recover from. Having an externally managed ring simplifies the process, allows full validation of data before it’s shipped out to the servers, and guarantees each server is using a ring from the same timeline. It also means that the servers themselves aren’t spending a lot of resources maintaining rings.

曾考虑过"live ring"选项，其中每个服务器自己可以维护环的副本并且服务器将使用gossip协议进行通讯它们所作做的变化。该方法由于过于复杂并且在工程有效时间内正确编写代码容易产生错误而被废弃。一个Bug是可以很容易把坏数据gossip到整个集群而恢复很困难。通过外部管理环可以简化这一过程，允许数据在传输到服务器前进行数据的完整验证，并且保证每个服务器使用相同时间线的环。这也意味着服务器本身不用花费大量的资源来维护环。

A couple of “ring server” options were considered. One was where all ring lookups would be done by calling a service on a separate server or set of servers, but this was discarded due to the latency involved. Another was much like the current process but where servers could submit change requests to the ring server to have a new ring built and shipped back out to the servers. This was discarded due to project time constraints and because ring changes are currently infrequent enough that manual control was sufficient. However, lack of quick automatic ring changes did mean that other parts of the system had to be coded to handle devices being unavailable for a period of hours until someone could manually update the ring.

有一对"ring server"选项曾被考虑过。一个是所有的环查询可以由调用独立的服务器或服务器集上的服务器来完成，但是由于涉及到延迟被弃用了。另一个更类似于当前的过程，不过其中服务器可以提交改变的请求到环服务器来构建一个新的环，然后运回到服务器上。由于工程时间的约束以及就目前来说，环的改变的频繁足够低到人工控制就可以满足而被弃用。然后，缺乏快速自动的环改变意味着系统的其他部件不得不花上数个小时编码来处理失效的设备直到有人可以手动地升级环。

The current ring process has each replica of a partition independently assigned to a device. A version of the ring that used a third of the memory was tried, where the first replica of a partition was directly assigned and the other two were determined by “walking” the ring until finding additional devices in other zones. This was discarded as control was lost as to how many replicas for a given partition moved at once. Keeping each replica independent allows for moving only one partition replica within a given time window (except due to device failures). Using the additional memory was deemed a good tradeoff for moving data around the cluster much less often.

当前的环程序将一个虚节点的每个副本独立地分配给一个设备。某个环程序版本中尝试使用1/3的内存，其中虚节点的第一个副本被直接分配而另外两个则在环中“行走”直到在其它区域找到额外的设备。这个方法因为对于给定虚节点的多个副本立刻移动会使得控制失效而被废除。（不是很通顺啊）保持每个副本的独立性考虑在给定的时间窗口内仅移动一个虚节点副本（除了由于设备故障）。使用额外的内存看起来是一个不错的权衡，在集群中可以更低频率地移动数据。

Another ring design was tried where the partition to device assignments weren’t stored in a big list in memory but instead each device was assigned a set of hashes, or anchors. The partition would be determined from the data item’s hash and the nearest device anchors would determine where the replicas should be stored. However, to get reasonable distribution of data each device had to have a lot of anchors and walking through those anchors to find replicas started to add up. In the end, the memory savings wasn’t that great and more processing power was used, so the idea was discarded.

另一个被尝试过的环设计是不把虚节点到设备的分配存储在内存中的大列表里而是为每个设备分配一个哈希集合或锚。虚节点将会来自数据项的哈希值来决定并且最近的设备锚将决定副本存储的位置。然而，为了获得更合理的数据分布，每个设备不得不用于大量的锚并且沿着这些锚来寻找副本开始合计。最后，由于内存存储没有那么大并且花费了更多的处理能力，这个想法被废弃了。

A completely non-partitioned ring was also tried but discarded as the partitioning helps many other parts of the system, especially replication. Replication can be attempted and retried in a partition batch with the other replicas rather than each data item independently attempted and retried. Hashes of directory structures can be calculated and compared with other replicas to reduce directory walking and network traffic.

一个完整的无虚节点的环也被尝试，但是由于虚节点有助于系统的许多其他部件，尤其是复制而被废弃。复制可以在虚节点与其它副本的批处理中被尝试和重试，而不是每个数据项独立地被尝试和重试。目录结构的哈希值可以被计算并用来与其它副本比较来减少目录的遍历和网络流量。

Partitioning and independently assigning partition replicas also allowed for the best balanced cluster. The best of the other strategies tended to give +-10% variance on device balance with devices of equal weight and +-15% with devices of varying weights. The current strategy allows us to get +-3% and +-8% respectively.

虚节点和独立地分配虚节点的副本也考虑了最佳平衡的集群。其他策略的最佳平衡集群在设备平衡上倾向于对于平等权重的设备给出+-10%的变化而对于变化权重的设备则给出+-15%。当前的策略允许我们获得相应+-3%和+-8%的变化。

Various hashing algorithms were tried. SHA offers better security, but the ring doesn’t need to be cryptographically secure and SHA is slower. Murmur was much faster, but MD5 was built-in and hash computation is a small percentage of the overall request handling time. In all, once it was decided the servers wouldn’t be maintaining the rings themselves anyway and only doing hash lookups, MD5 was chosen for its general availability, good distribution, and adequate speed.

各种哈希的算法被尝试过。SHA提供更好的安全，但是环并不需要安全可靠地加密而且SHA比较慢。Murmur更快，但是MD5是Python内建的库并且哈希计算只是整个请求处理时间中只是一小部分。总之，一旦环被确定，服务器不用自己来维护环而且仅作哈希查找，MD5被选择是因为它的通用性，良好的分布以及足够快的速度。

3. The Account Reaper 账号收割器

The Account Reaper removes data from deleted accounts in the background.

账号收割器运行在后台从要被删除账号中移除数据。

An account is marked for deletion by a reseller through the services server’s remove_storage_account XMLRPC call. This simply puts the value DELETED into the status column of the account_stat table in the account database (and replicas), indicating the data for the account should be deleted later. There is no set retention time and no undelete; it is assumed the reseller will implement such features and only call remove_storage_account once it is truly desired the account’s data be removed.

通过服务器的remove_storage_account的XMLRPC调用，账号被reseller标记为删除。这一行为简单地将值DELETED放入到账号数据库(和副本)的表account_stat的status列；表示账号数据未来将被删除。没有保留时间和取消删除的设置；它假设reseller将会实现这样的特性并且一旦调用remove_storage_account，该账号的数据就真地被移除。

The account reaper runs on each account server and scans the server occasionally for account databases marked for deletion. It will only trigger on accounts that server is the primary node for, so that multiple account servers aren’t all trying to do the same work at the same time. Using multiple servers to delete one account might improve deletion speed, but requires coordination so they aren’t duplicating effort. Speed really isn’t as much of a concern with data deletion and large accounts aren’t deleted that often.

账号收割器运行在每个账号服务器上，不定期地扫描服务器的账号数据库中标记为删除的数据。它仅会在当前服务器为主节点的账号上触发，因此多个账号服务器并不都尝试着在相同时间内做相同的工作。使用多个服务器来删除一个账号可能会提升删除的速度，但是需要协作以避免重复删除。实际上，在数据删除的速度上并没有给予过多的关注，因为大多数账号并没有那么频繁地被删除。

The deletion process for an account itself is pretty straightforward. For each container in the account, each object is deleted and then the container is deleted. Any deletion requests that fail won’t stop the overall process, but will cause the overall process to fail eventually (for example, if an object delete times out, the container won’t be able to be deleted later and therefore the account won’t be deleted either). The overall process continues even on a failure so that it doesn’t get hung up reclaiming cluster space because of one troublesome spot. The account reaper will keep trying to delete an account until it eventually becomes empty, at which point the database reclaim process within the db_replicator will eventually remove the database files.

删除账号的过程是相当直接的。对于每个账号中的容器，每个对象先被删除然后容器被删除。任何失败的删除请求将不会阻止整个过程，但是将会导致整个过程最终失败（例如，如果一个对象的删除超时，容器将不能被删除，因此账号也不能被删除）。整个处理过程即使遭遇失败也继续执行，这样它不会因为一个麻烦的问题而中止恢复集群空间。账号收割器将会继续不断地尝试删除账号直到它最终变为空，此时数据库在db_replicator中回收处理，最终移除这个数据库文件。

3.1 History 发展史

At first, a simple approach of deleting an account through completely external calls was considered as it required no changes to the system. All data would simply be deleted in the same way the actual user would, through the public ReST API. However, the downside was that it would use proxy resources and log everything when it didn’t really need to. Also, it would likely need a dedicated server or two, just for issuing the delete requests.

最初的时候，一个通过完全地外部调用来删除帐号的简单方法被考虑因为它不需要对系统改变。实际的用户可以通过公共的ReST的API以相同的方式来简易地删除所有的数据。然而，坏处是因为它将使用代理的资源并且记录任何信息即使是不需要的日志。此外，它可能需要一个或两个专用的服务器，仅分配给处理删除请求。

A completely bottom-up approach was also considered, where the object and container servers would occasionally scan the data they held and check if the account was deleted, removing the data if so. The upside was the speed of reclamation with no impact on the proxies or logging, but the downside was that nearly 100% of the scanning would result in no action creating a lot of I/O load for no reason.

一个完全地自底向下的方法也被考虑过，其中对象和容器服务器将不定期地扫面它们的数据并且检测是否该对象被删除了，如果是的话就删除它的数据。好处是回收的速度对于代理或日志没有影响，不过坏事是几乎100%的扫描将会导致无端地没有活动地造成大量的I/O负载。

A more container server centric approach was also considered, where the account server would mark all the containers for deletion and the container servers would delete the objects in each container and then themselves. This has the benefit of still speedy reclamation for accounts with a lot of containers, but has the downside of a pretty big load spike. The process could be slowed down to alleviate the load spike possibility, but then the benefit of speedy reclamation is lost and what’s left is just a more complex process. Also, scanning all the containers for those marked for deletion when the majority wouldn’t be seemed wasteful. The db_replicator could do this work while performing its replication scan, but it would have to spawn and track deletion processes which seemed needlessly complex.

一个容器服务器中心的方法也曾被考虑，其中账号服务器将会标记所有的要被删除的容器，然后容器服务器将会删除每个容器中的对象接着删除容器。这对于带有大量容器的账号的快速回收大有裨益，但坏处是有相当大的负载峰值。该过程可以被放缓来减轻负载峰值的可能性，不过那样的话快速回收的优点就丧失了并且剩下的只是更复杂的过程。同样的，扫描所有的容器中标记来删除的当大多数的将不会视为浪费的。db_replicator可以在执行复制扫面时完成这些工作，但是它将不得产生和记录删除过程这些看起来不必要的复杂性。

In the end, an account server centric approach seemed best, as described above.

最后如上所述，账号服务器中心方法看起来是最佳的。

4. The Auth System 认证系统

4.1 TempAuth

The auth system for Swift is loosely based on the auth system from the existing Rackspace architecture – actually from a few existing auth systems – and is therefore a bit disjointed. The distilled points about it are:

Swift的认证系统松散地基于已存在的Rackspace架构的认证系统—实际上来自于一些已存在的认证系统—所以有些不连贯。关于此认证系统的要点有以下4点：

1.认证/授权部分可以作为一个运行在Swift中作为WSGI中间件的外部系统或子系统

2.Swift用户在每个请求中会附加认证令牌。

3.Swift用外部的认证系统或者认证子系统来验证每个令牌并且缓存结果

4.令牌不是每次请求都会变化，但是存在有效期

The token can be passed into Swift using the X-Auth-Token or the X-Storage-Token header. Both have the same format: just a simple string representing the token. Some auth systems use UUID tokens, some an MD5 hash of something unique, some use “something else” but the salient point is that the token is a string which can be sent as-is back to the auth system for validation.

令牌可以通过使用X-Auth-Token或者X-Storage-Token头部被传入Swift。两者都有相同的格式：仅使用简单的字符串来表示令牌。一些认证系统使用UUID令牌，一些使用唯一的MD5哈希值，一些则使用其它的方法，不过共同点是令牌是可以发送回认证系统进行证实有效性的字符串。

Swift will make calls to the auth system, giving the auth token to be validated. For a valid token, the auth system responds with an overall expiration in seconds from now. Swift will cache the token up to the expiration time.

Swift将会调用认证系统，给出要验证的认证令牌。对于一个正确的令牌，认证系统回应一个从当前开始的总有效期秒数。Swift将会缓存令牌直到有效期结束。

The included TempAuth also has the concept of admin and non-admin users within an account. Admin users can do anything within the account. Non-admin users can only perform operations per container based on the container’s X-Container-Read and X-Container-Write ACLs. For more information on ACLs, see swift.common.middleware.acl.

其包含的TempAuth，对于account而言，也有admin和non-admin用户的概念。admin用户拥有账号的所有操作权限。non-admin用户仅可以基于每个容器执行基于容器的X-Container-Read and X-Container-Write的访问控制列表进行操作。对于更多关于ACLs的信息，参见swift.common.middleware.acl

Additionally, if the auth system sets the request environ’s swift_owner key to True, the proxy will return additional header information in some requests, such as the X-Container-Sync-Key for a container GET or HEAD.

此外，如果认证系统设置request environ的swift_owner键为True，该代理服务器将在某些请求中返回额外的头部信息，诸如用于容器的GET或HEAD的X-Container-Sync-Key。

The user starts a session by sending a ReST request to the auth system to receive the auth token and a URL to the Swift system.

用户通过发送一个ReST请求到认证系统来接受认证令牌和一个URL到Swift系统来开始会话。

4.2 Extending Auth 扩展认证

TempAuth is written as wsgi middleware, so implementing your own auth is as easy as writing new wsgi middleware, and plugging it in to the proxy server. The KeyStone project and the Swauth project are examples of additional auth services.

Also, see Auth Server and Middleware.

TempAuth被作为wsgi中间件，因此实现你自己的认证系统就如同写一个新的wsgi中间件一样容易，然后把它安装到代理服务器上。KeyStone和Swauth项目是认证服务器的另外例子。也可以参见 Auth Server and Middleware.

下一篇：对象存储系统Swift技术详解：综述与概念（下）

引用：http://www.cnblogs.com/yuxc/archive/2011/12/06/2278303.html