blob: 72eb7178524ac24e6baabfbd9450a5448546f4d4 [file] [log] [blame] [edit]
Internet Engineering Task Force M. Sustrik, Ed.
Internet-Draft GoPivotal Inc.
Intended status: Informational August 2013
Expires: February 02, 2014
TCP Mapping for Scalability Protocols
sp-tcp-mapping-01
Abstract
This document defines the TCP mapping for scalability protocols. The
main purpose of the mapping is to turn the stream of bytes into
stream of messages. Additionaly, the mapping provides some
additional checks during the connection establishment phase.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 02, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Sustrik Expires February 02, 2014 [Page 1]
Internet-Draft TCP mapping for SPs August 2013
1. Underlying protocol
This mapping should be layered directly on the top of TCP or,
alternatively, on the top of ETSN (which itself is a thin layer on
the top of TCP).
In the former case there's no fixed TCP port to use for the
communication. Instead, port number are assigned to individual
services by the user. In the latter case the communication happens
on the TCP port assigned to ETSN by IANA. User identifies individual
services using ETSN service names.
2. Connection initiation
As soon as the underlying connection, whether TCP or ETSN, is
established, both parties MUST send the protocol header (described in
detail below) immediately. Both endpoints MUST then wait for the
protocol header from the peer before proceeding on.
The goal of this design is to keep connection establishment as fast
as possible by avoiding any additional protocol handshakes, i.e.
network round-trips. Specifically, the protocol headers can be
bundled directly with to the last packets of TCP handshake and thus
have virtually zero performance impact.
The protocol header is 8 bytes long and looks like this:
+------+------+------+--------------+------------+----------------+
| 0x00 | 0x53 | 0x50 | version (8b) | type (16b) | reserved (16b) |
+------+------+------+--------------+------------+----------------+
First four bytes of the protocol header are used to make sure that
the peer's protocol is compatible with the protocol used by the local
endpoint. Keep in mind that this protocol is designed to run on an
arbitrary TCP port, thus the standard compatibility check -- if it
runs on port X and protocol Y is assigned to X by IANA, it speaks
protocol Y -- does not apply. We have to use an alternative
mechanism.
First four bytes of the protocol header MUST be set to 0x00, 0x53,
0x50 and 0x01 respectively. If the protocol header received from the
peer differs, the TCP connection MUST be closed immediately.
The fact that the first byte of the protocol header is binary zero
eliminates any text-based protocols that were accidentally connected
to the endpiont. Subsequent two bytes make the check even more
rigorous. At the same time they can be used as a debugging hint to
Sustrik Expires February 02, 2014 [Page 2]
Internet-Draft TCP mapping for SPs August 2013
indicate that the connection is supposed to use one of the
scalability protocols -- ASCII representation of these bytes is 'SP'
that can be easily spotted in when capturing the network traffic.
Finally, the fourth byte rules out any incompatible versions of this
protocol.
Fifth and sixth bytes of the header form a 16-bit unsigned integer in
network byte order representing the type of SP endpoint on the layer
above. The value SHOULD NOT be interpreted by the mapping, rather
the interpretation should be delegated to the scalability protocol
above the mapping. For informational purposes, it should be noted
that the field encodes information such as SP protocol ID, protocol
version and the role of endpoint within the protocol. Individual
values are assigned by IANA.
Finally, the last two bytes of the protocol header are reserved for
future use and must be set to binary zeroes. If the protocol header
from the peer contains anything else than zeroes in this field, the
implementation MUST close the underlying TCP connection.
3. Message delimitation
Once the protocol header is accepted, endpoint can send and receive
messages. Message is an arbitrarily large chunk of binary data.
Every message starts with 64-bit unsigned integer in network byte
order representing the size, in bytes, of the remaining part of the
message. Thus, the message payload can be from 0 to 2^64-1 bytes
long. The payload of the specified size follows directly after the
size field:
+------------+-----------------+
| size (64b) | payload |
+------------+-----------------+
It may seem that 64 bit message size is excessive and consumes too
much of valueable bandwidth, especially given that most scenarios
call for relatively small messages, in order of bytes or kilobytes.
Variable length field may seem like a better solution, however, our
experience is that variable length size field doesn't provide any
performance benefit in the real world.
For large messages, 64 bits used by the field form a negligible
portion of the message and the performance impact is not even
measurable.
Sustrik Expires February 02, 2014 [Page 3]
Internet-Draft TCP mapping for SPs August 2013
For small messages, the overal throughput is heavily CPU-bound, never
I/O-bound. In other words, CPU processing associated with each
individual message limits the message rate in such a way that network
bandwidth limit is never reached. In the future we expect it to be
even more so: network bandwidth is going to grow faster than CPU
speed. All in all, some performance improvement could be achieved
using variable length size field with huge streams of very small
messages on very slow networks. We consider that scenario to be a
corner case that's almost never seen in a real world.
On the other hand, it may be argued that limiting the messages to
2^64-1 bytes can prove insufficient in the future. However,
extrapolating the message size growth size seen in the past indicates
that 64 bit size should be sufficient for the expected lifetime of
the protocol (30-50 years).
Finally, it may be argued that chaining arbitrary number of smaller
data chunks can yield unlimited message size. The downside of this
approach is that the message payload cannot be continuous on the
wire, it has to be interleaved with chunk headers. That typically
requires one more copy of the data in the receiving part of the stack
which may be a problem for very large messages.
4. Note on multiplexing
Several modern general-purpose protocols built on top of TCP provide
multiplexing capability, i.e. a way to transfer multiple independent
message streams over a single TCP connection. This mapping
deliberately opts to provide no such functionality. Instead,
independent message streams should be implemented as different TCP
connections. This section provides the rationale for the design
decision.
First of all, multiplexing is typically added to protocols to avoid
the overhead of establishing additional TCP connections. This need
arises in environments where the TCP connections are extremely short-
lived, often used only for a single handshake between the peers.
Scalability protocols, on the other hand, require long-lived
connections which doesn't make the feature necessary.
At the same time, multiplexing on top of TCP, while doable, is
inferior to the real multiplexing done using multiple TCP
connections. Specifically, TCP's head-of-line blocking feature means
that a single lost TCP packet will hinder delivery for all the
streams on the top of the connection, not just the one the missing
packets belonged to.
Sustrik Expires February 02, 2014 [Page 4]
Internet-Draft TCP mapping for SPs August 2013
At the same time, implementing multiplexing is a non-trivial matter
and results in increased development cost, more bugs and larger
attack surface.
Finally, for multiplexing to work properly, large messages have to be
split into smaller data chunks interleaved by chunk headers, which
makes receiving stack less efficient, as already discussed above.
5. IANA Considerations
This memo includes no request to IANA.
6. Security Considerations
The mapping isn't intended to provide any additional security in
addition to what TCP does. DoS concerns are addressed within the
specification.
7. References
Author's Address
Martin Sustrik (editor)
GoPivotal Inc.
Email: msustrik@gopivotal.com
Sustrik Expires February 02, 2014 [Page 5]