rfc/sp-tcp-mapping-01.xml - third_party/nanomsg - Git at Google

 <?xml version="1.0" encoding="US-ASCII"?>
 <!DOCTYPE rfc SYSTEM "rfc2629.dtd">

 <rfc category="info" docName="sp-tcp-mapping-01">

   <front>

     <title abbrev="TCP mapping for SPs">
     TCP Mapping for Scalability Protocols
     </title>

     <author fullname="Martin Sustrik" initials="M." role="editor"
             surname="Sustrik">
       <address>
         <email>sustrik@250bpm.com</email>
       </address>
     </author>

     <date month="March" year="2014" />

     <area>Applications</area>
     <workgroup>Internet Engineering Task Force</workgroup>

     <keyword>TCP</keyword>
     <keyword>SP</keyword>

     <abstract>
       <t>This document defines the TCP mapping for scalability protocols.
          The main purpose of the mapping is to turn the stream of bytes
          into stream of messages. Additionally, the mapping provides some
          additional checks during the connection establishment phase.</t>
     </abstract>

   </front>

   <middle>

     <section title = "Underlying protocol">

       <t>This mapping should be layered directly on the top of TCP.</t>

       <t>There's no fixed TCP port to use for the communication. Instead, port
          numbers are assigned to individual services by the user.</t>

     </section>

     <section title = "Connection initiation">

       <t>As soon as the underlying TCP connection is established, both parties
          MUST send the protocol header (described in detail below) immediately.
          Both endpoints MUST then wait for the protocol header from the peer
          before proceeding on.</t>

       <t>The goal of this design is to keep connection establishment as
          fast as possible by avoiding any additional protocol handshakes,
          i.e. network round-trips. Specifically, the protocol headers
          can be bundled directly with to the last packets of TCP handshake
          and thus have virtually zero performance impact.</t>

       <t>The protocol header is 8 bytes long and looks like this:</t>

       <figure>
         <artwork>
  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      0x00     |      0x53     |      0x50     |    version    |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |             type              |           reserved            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         </artwork>
       </figure>

       <t>First four bytes of the protocol header are used to make sure that
          the peer's protocol is compatible with the protocol used by the local
          endpoint. Keep in mind that this protocol is designed to run on an
          arbitrary TCP port, thus the standard compatibility check -- if it runs
          on port X and protocol Y is assigned to X by IANA, it speaks protocol Y
          -- does not apply. We have to use an alternative mechanism.</t>

       <t>First four bytes of the protocol header MUST be set to 0x00, 0x53, 0x50
          and 0x00 respectively. If the protocol header received from the peer
          differs, the TCP connection MUST be closed immediately.</t>

       <t>The fact that the first byte of the protocol header is binary zero
          eliminates any text-based protocols that were accidentally connected
          to the endpoint. Subsequent two bytes make the check even more
          rigorous. At the same time they can be used as a debugging hint to
          indicate that the connection is supposed to use one of the scalability
          protocols -- ASCII representation of these bytes is 'SP' that can
          be easily spotted in when capturing the network traffic. Finally,
          the fourth byte rules out any incompatible versions of this
          protocol.</t>

       <t>Fifth and sixth bytes of the header form a 16-bit unsigned integer in
          network byte order representing the type of SP endpoint on the layer
          above. The value SHOULD NOT be interpreted by the mapping, rather
          the interpretation should be delegated to the scalability protocol
          above the mapping. For informational purposes, it should be noted that
          the field encodes information such as SP protocol ID, protocol version
          and the role of endpoint within the protocol. Individual values are
          assigned by IANA.</t>

       <t>Finally, the last two bytes of the protocol header are reserved for
          future use and must be set to binary zeroes. If the protocol header
          from the peer contains anything else than zeroes in this field, the
          implementation MUST close the underlying TCP connection.</t>

     </section>

     <section title = "Message delimitation">

       <t>Once the protocol header is accepted, endpoint can send and receive
          messages. Message is an arbitrarily large chunk of binary data. Every
          message starts with 64-bit unsigned integer in network byte order
          representing the size, in bytes,  of the remaining part of the message.
          Thus, the message payload can be from 0 to 2^64-1 bytes long.
          The payload of the specified size follows directly after the size
          field:</t>

       <figure>
         <artwork>
 +------------+-----------------+
 | size (64b) |     payload     |
 +------------+-----------------+
         </artwork>
       </figure>

       <t>It may seem that 64 bit message size is excessive and consumes too much
          of valuable bandwidth, especially given that most scenarios call for
          relatively small messages, in order of bytes or kilobytes.</t>

       <t>Variable length field may seem like a better solution, however, our
          experience is that variable length size field doesn't  provide any
          performance benefit in the real world.</t>

       <t>For large messages, 64 bits used by the field form a negligible portion
          of the message and the performance impact is not even measurable.</t>

       <t>For small messages, the overall throughput is heavily CPU-bound, never
          I/O-bound. In other words, CPU processing associated with each
          individual message limits the message rate in such a way that network
          bandwidth limit is never reached. In the future we expect it to be
          even more so: network bandwidth is going to grow faster than CPU speed.
          All in all, some performance improvement could be achieved using
          variable length size field with huge streams of very small messages
          on very slow networks. We consider that scenario to be a corner case
          that's almost never seen in a real world.</t>

       <t>On the other hand, it may be argued that limiting the messages to
          2^64-1 bytes can prove insufficient in the future. However,
          extrapolating the message size growth size seen in the past indicates
          that 64 bit size should be sufficient for the expected lifetime of
          the protocol (30-50 years).</t>

       <t>Finally, it may be argued that chaining arbitrary number of smaller
          data chunks can yield unlimited message size. The downside of this
          approach is that the message payload cannot be continuous on the wire,
          it has to be interleaved with chunk headers. That typically requires
          one more copy of the data in the receiving part of the stack which
          may be a problem for very large messages.</t>

     </section>

     <section title = "Note on multiplexing">

       <t>Several modern general-purpose protocols built on top of TCP provide
          multiplexing capability, i.e. a way to transfer multiple independent
          message streams over a single TCP connection. This mapping deliberately
          opts to provide no such functionality. Instead, independent message
          streams should be implemented as different TCP connections. This
          section provides the rationale for the design decision.</t>

       <t>First of all, multiplexing is typically added to protocols to avoid
          the overhead of establishing additional TCP connections. This need
          arises in environments where the TCP connections are extremely
          short-lived, often used only for a single handshake between the peers.
          Scalability protocols, on the other hand, require long-lived
          connections which doesn't make the feature necessary.</t>

       <t>At the same time, multiplexing on top of TCP, while doable, is inferior
          to the real multiplexing done using multiple TCP connections.
          Specifically, TCP's head-of-line blocking feature means that a single
          lost TCP packet will hinder delivery for all the streams on the top of
          the connection, not just the one the missing packets belonged to.</t>

       <t>At the same time, implementing multiplexing is a non-trivial matter
          and results in increased development cost, more bugs and larger
          attack surface.</t>

       <t>Finally, for multiplexing to work properly, large messages have to be
          split into smaller data chunks interleaved by chunk headers, which
          makes receiving stack less efficient, as already discussed above.</t>

     </section>

     <section anchor="IANA" title="IANA Considerations">
       <t>This memo includes no request to IANA.</t>
     </section>

     <section anchor="Security" title="Security Considerations">
       <t>The mapping isn't intended to provide any additional security in
          addition to what TCP does. DoS concerns are addressed within
          the specification.</t>
     </section>

   </middle>

 </rfc>
	<?xml version="1.0" encoding="US-ASCII"?>
	<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

	<rfc category="info" docName="sp-tcp-mapping-01">

	<front>

	<title abbrev="TCP mapping for SPs">
	TCP Mapping for Scalability Protocols
	</title>

	<author fullname="Martin Sustrik" initials="M." role="editor"
	surname="Sustrik">
	<address>
	<email>sustrik@250bpm.com</email>
	</address>
	</author>

	<date month="March" year="2014" />

	<area>Applications</area>
	<workgroup>Internet Engineering Task Force</workgroup>

	<keyword>TCP</keyword>
	<keyword>SP</keyword>

	<abstract>
	<t>This document defines the TCP mapping for scalability protocols.
	The main purpose of the mapping is to turn the stream of bytes
	into stream of messages. Additionally, the mapping provides some
	additional checks during the connection establishment phase.</t>
	</abstract>

	</front>

	<middle>

	<section title = "Underlying protocol">

	<t>This mapping should be layered directly on the top of TCP.</t>

	<t>There's no fixed TCP port to use for the communication. Instead, port
	numbers are assigned to individual services by the user.</t>

	</section>

	<section title = "Connection initiation">

	<t>As soon as the underlying TCP connection is established, both parties
	MUST send the protocol header (described in detail below) immediately.
	Both endpoints MUST then wait for the protocol header from the peer
	before proceeding on.</t>

	<t>The goal of this design is to keep connection establishment as
	fast as possible by avoiding any additional protocol handshakes,
	i.e. network round-trips. Specifically, the protocol headers
	can be bundled directly with to the last packets of TCP handshake
	and thus have virtually zero performance impact.</t>

	<t>The protocol header is 8 bytes long and looks like this:</t>

	<figure>
	<artwork>
	0 1 2 3
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	\| 0x00 \| 0x53 \| 0x50 \| version \|
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	\| type \| reserved \|
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	</artwork>
	</figure>

	<t>First four bytes of the protocol header are used to make sure that
	the peer's protocol is compatible with the protocol used by the local
	endpoint. Keep in mind that this protocol is designed to run on an
	arbitrary TCP port, thus the standard compatibility check -- if it runs
	on port X and protocol Y is assigned to X by IANA, it speaks protocol Y
	-- does not apply. We have to use an alternative mechanism.</t>

	<t>First four bytes of the protocol header MUST be set to 0x00, 0x53, 0x50
	and 0x00 respectively. If the protocol header received from the peer
	differs, the TCP connection MUST be closed immediately.</t>

	<t>The fact that the first byte of the protocol header is binary zero
	eliminates any text-based protocols that were accidentally connected
	to the endpoint. Subsequent two bytes make the check even more
	rigorous. At the same time they can be used as a debugging hint to
	indicate that the connection is supposed to use one of the scalability
	protocols -- ASCII representation of these bytes is 'SP' that can
	be easily spotted in when capturing the network traffic. Finally,
	the fourth byte rules out any incompatible versions of this
	protocol.</t>

	<t>Fifth and sixth bytes of the header form a 16-bit unsigned integer in
	network byte order representing the type of SP endpoint on the layer
	above. The value SHOULD NOT be interpreted by the mapping, rather
	the interpretation should be delegated to the scalability protocol
	above the mapping. For informational purposes, it should be noted that
	the field encodes information such as SP protocol ID, protocol version
	and the role of endpoint within the protocol. Individual values are
	assigned by IANA.</t>

	<t>Finally, the last two bytes of the protocol header are reserved for
	future use and must be set to binary zeroes. If the protocol header
	from the peer contains anything else than zeroes in this field, the
	implementation MUST close the underlying TCP connection.</t>

	</section>

	<section title = "Message delimitation">

	<t>Once the protocol header is accepted, endpoint can send and receive
	messages. Message is an arbitrarily large chunk of binary data. Every
	message starts with 64-bit unsigned integer in network byte order
	representing the size, in bytes, of the remaining part of the message.
	Thus, the message payload can be from 0 to 2^64-1 bytes long.
	The payload of the specified size follows directly after the size
	field:</t>

	<figure>
	<artwork>
	+------------+-----------------+
	\| size (64b) \| payload \|
	+------------+-----------------+
	</artwork>
	</figure>

	<t>It may seem that 64 bit message size is excessive and consumes too much
	of valuable bandwidth, especially given that most scenarios call for
	relatively small messages, in order of bytes or kilobytes.</t>

	<t>Variable length field may seem like a better solution, however, our
	experience is that variable length size field doesn't provide any
	performance benefit in the real world.</t>

	<t>For large messages, 64 bits used by the field form a negligible portion
	of the message and the performance impact is not even measurable.</t>

	<t>For small messages, the overall throughput is heavily CPU-bound, never
	I/O-bound. In other words, CPU processing associated with each
	individual message limits the message rate in such a way that network
	bandwidth limit is never reached. In the future we expect it to be
	even more so: network bandwidth is going to grow faster than CPU speed.
	All in all, some performance improvement could be achieved using
	variable length size field with huge streams of very small messages
	on very slow networks. We consider that scenario to be a corner case
	that's almost never seen in a real world.</t>

	<t>On the other hand, it may be argued that limiting the messages to
	2^64-1 bytes can prove insufficient in the future. However,
	extrapolating the message size growth size seen in the past indicates
	that 64 bit size should be sufficient for the expected lifetime of
	the protocol (30-50 years).</t>

	<t>Finally, it may be argued that chaining arbitrary number of smaller
	data chunks can yield unlimited message size. The downside of this
	approach is that the message payload cannot be continuous on the wire,
	it has to be interleaved with chunk headers. That typically requires
	one more copy of the data in the receiving part of the stack which
	may be a problem for very large messages.</t>

	</section>

	<section title = "Note on multiplexing">

	<t>Several modern general-purpose protocols built on top of TCP provide
	multiplexing capability, i.e. a way to transfer multiple independent
	message streams over a single TCP connection. This mapping deliberately
	opts to provide no such functionality. Instead, independent message
	streams should be implemented as different TCP connections. This
	section provides the rationale for the design decision.</t>

	<t>First of all, multiplexing is typically added to protocols to avoid
	the overhead of establishing additional TCP connections. This need
	arises in environments where the TCP connections are extremely
	short-lived, often used only for a single handshake between the peers.
	Scalability protocols, on the other hand, require long-lived
	connections which doesn't make the feature necessary.</t>

	<t>At the same time, multiplexing on top of TCP, while doable, is inferior
	to the real multiplexing done using multiple TCP connections.
	Specifically, TCP's head-of-line blocking feature means that a single
	lost TCP packet will hinder delivery for all the streams on the top of
	the connection, not just the one the missing packets belonged to.</t>

	<t>At the same time, implementing multiplexing is a non-trivial matter
	and results in increased development cost, more bugs and larger
	attack surface.</t>

	<t>Finally, for multiplexing to work properly, large messages have to be
	split into smaller data chunks interleaved by chunk headers, which
	makes receiving stack less efficient, as already discussed above.</t>

	</section>

	<section anchor="IANA" title="IANA Considerations">
	<t>This memo includes no request to IANA.</t>
	</section>

	<section anchor="Security" title="Security Considerations">
	<t>The mapping isn't intended to provide any additional security in
	addition to what TCP does. DoS concerns are addressed within
	the specification.</t>
	</section>

	</middle>

	</rfc>