| <?xml version="1.0" encoding="US-ASCII"?> |
| <!DOCTYPE rfc SYSTEM "rfc2629.dtd"> |
| |
| <rfc category="info" docName="sp-zerotier-mapping-01"> |
| |
| <front> |
| |
| <title abbrev="ZeroTier mapping for SPs"> |
| ZeroTier Mapping for Scalability Protocols |
| </title> |
| |
| <author fullname="Garrett D'Amore" initials="G." role="editor" |
| surname="D'Amore"> |
| <address> |
| <email>garrett@damore.org</email> |
| </address> |
| </author> |
| |
| <date month="October" year="2016" /> |
| |
| <area>Applications</area> |
| <workgroup>Internet Engineering Task Force</workgroup> |
| |
| <keyword>ZeroTier</keyword> |
| <keyword>SP</keyword> |
| |
| <abstract> |
| <t>This document defines the ZeroTier mapping for scalability protocols.</t> |
| </abstract> |
| |
| </front> |
| |
| <middle> |
| |
| <section title="Underlying protocol"> |
| |
| <t>ZeroTier expresses an 802.3 style layer 2, where frames |
| maybe exchanged as if they were Ethernet frames. Virtual broadcast |
| domains are created within a numbered "network", and frames may then |
| be exchanged with any peers on that network.</t> |
| |
| <t>Frames may arrive in any order, or be lost, just a with Ethernet |
| (best effort delivery), but they are strongly protected by a |
| cryptographic checksum, so frames that do arrive will be uncorrupted. |
| Furthermore, ZeroTier guarantees that a given frame will |
| be received at most once.</t> |
| |
| <t>Each application on a ZeroTier network has its own address, called |
| a ZeroTier ID (ZTID), which is globally unique -- this is generated from |
| a hash of the public key associated with the application.</t> |
| |
| <t>A given application may participate in multiple ZeroTier networks.</t> |
| |
| <t>We assume each nanomsg application will have it's own ZeroTier ID, |
| and will not use more than one. Management of these IDs, as well as |
| the underlying key pairs, is out of scope of this document.</t> |
| |
| <t>ZeroTier networks have a maximum payload MTU of 2800 bytes. |
| However they also have an "optimum" MTU, based upon the underlying |
| networks (typically UDP) and overheads that are used to exchange |
| such packets. For our purposes we will assume this to be approximately |
| 1400 bytes. These values can change on different networks.</t> |
| |
| </section> |
| |
| <section title="Packet layout"> |
| |
| <t>Each nanomsg message sent over ZeroTier will be comprised of one |
| or more fragments, where each fragment is mapped to a single |
| underlying ZeroTier L2 frame. We use the EtherType field |
| of 0901 to indicate nanomsg over ZeroTier protocol (number to |
| registered with IEEE).</t> |
| |
| <t>Each frame shall be prepended with the following header:</t> |
| |
| <figure> |
| <artwork> |
| 0 1 2 3 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | destination mac address (bytes 0-3) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | destination mac (bytes 4-5) | source mac (bytes 0-1) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | source mac address (bytes 2-5) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | 0x90 | 0x01 | op | flags | version | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | fragment offset | fragment length | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | destination port | source port | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | type | message ID | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | payload... |
| +-+-+-+-+-+-+-+- |
| |
| </artwork> |
| </figure> |
| |
| <t>All numeric fields are in big-endian byte order.</t> |
| |
| <t>As above, the start of each frame is just as a normal Ethernet |
| frame, with destination, source, and type of 0x901.</t> |
| |
| <t>The op is a field that indicates the type of message |
| being sent. The following values are defined: DATA (0), |
| CONN (1), DISC (2), PING (3), and ERR (4). These are discussed |
| further below. Implementations MUST discard messages where the |
| op is not one of these.</t> |
| |
| <t>There are two flags defined. The first is MF (1), which |
| indicates that a message is fragmented, and more fragments follow. |
| This flag may only be set on DATA messages. The last fragment |
| of a message will not have this flag set.</t> |
| |
| <t>The second flag is AK (2), which indicates that a |
| given message is a reply to an earlier message. This is |
| only valid for the CONN, DISC, and PING message types.</t> |
| |
| <t>Note that the MF and the AK flag bits are mutually exclusive.</t> |
| |
| <t>The version byte MUST be set to 0x1. Implementations MUST |
| discard any messages received for any other version.</t> |
| |
| <t>The fragment length and offset are given in terms of octets, |
| and only include the payload. For example, the first fragment |
| of a message bearing a 2000 byte payload, itself only carrying |
| 1400 bytes of that payload would have the MF bit set in flags, |
| offset 0, and length 1400. The second fragment would have the MF |
| bit clear, length 600, and offset 1400.</t> |
| |
| <t>As a single fragment cannot exceed the size of a ZeroTier frame, |
| the high order six bits of the fragment length MUST be zero, and |
| the value encoded MUST be less than 2800.</t> |
| |
| <t>Each fragment for a given message must carry the same |
| message ID. Implementations MUST initialize this to a random |
| value, and MUST increment this each time a new message is sent.</t> |
| |
| <t>The port fields are used to discriminate different uses, allowing |
| one application to have multiple connections or sockets open. |
| The purpose is analogous to TCP port numbers, except that instead |
| of the operating system performing the discrimination the application |
| or library code must do so.</t> |
| |
| <t>The type field is the numeric SP protocol ID, in big-endian |
| form. When receiving a message for a port, if the SP protocol ID |
| does not match the SP protocol expected on that port, the |
| implementation MUST discard this message.</t> |
| |
| <t>The maximum payload size, and hence the maximum SP message that |
| may be transmitted using this transport, is 65535 octets. |
| Implementations are encouraged to restrict this further.</t> |
| |
| <t>Note that it is not by accident that the payload is 32-bit |
| aligned in this message format.</t> |
| |
| <t>Source and destination MAC addresses shall be constructed |
| algorithmically from the relevant ZeroTier IDs.</t> |
| |
| <t>Note that at this time, broadcast and multicast is not supported |
| by this mapping. (A future update may resolve this.)</t> |
| |
| </section> |
| |
| <section title = "DATA messages"> |
| |
| <t>DATA messages carry SP protocol payload data. They can only |
| be sent on an established session (see CONN messages below), |
| and are never acknowledged.</t> |
| </section> |
| |
| <section title = "CONN messages"> |
| |
| <t>CONN frames represent a session establishment. They allow |
| a peer to advertise its port number to a remote peer, and to verify |
| that a peer is responsive. The payload for the CONN frame is a 4 |
| byte (big-endian) value, consisting of the SP protocol ID of the |
| sender.</t> |
| |
| <t>The connection is initiated by the initiator sending this |
| message, with its own SP protocol ID. The AK flag will in this |
| case be clear.</t> |
| |
| <t>The responder will acknowledge this by replying with its |
| SP protocol ID in the 4-byte payload, with the AK flag set.</t> |
| |
| <t>Alternatively, a responder may reject the connection attempt by |
| sending a suitably formed ERR message (see below).</t> |
| |
| <t>If a sender does not receive a reply, it SHOULD retry this |
| message before giving up and reporting an error to the user.</t> |
| |
| <t>If a CONN frame is received for a session that already |
| exists, the receiver MUST reply. The CONN request is idempotent.</t> |
| </section> |
| |
| |
| <section title = "DISC messages"> |
| |
| <t>DISC messages are used to request a session be terminated. |
| This notifies the remote sender that no more data will be |
| sent or accepted, and the session resources may be released. |
| There is no payload. The party closing the session sends this |
| with the AK flag clear. There is no acknowledgement.</t> |
| |
| </section> |
| |
| <section title = "PING messages"> |
| |
| <t>In order to keep session state, implementations will generally |
| store data for each session. In order to prevent a stale session |
| from consuming these resources forever, and in order to keep |
| underlying ZeroTier sessions alive, a PING message may be sent. |
| This message has no payload.</t> |
| |
| <t>The sender MUST leave the AK bit clear. If the PING is |
| is successful, then the responder MUST reply with a PING message |
| with the AK bit set.</t> |
| |
| <t>In the event of an error, an implemenation MAY reply with an |
| ERR message.</t> |
| |
| <t>Implementations MUST not initiate PING messages if they have |
| either received or sent other session messages recently.</t> |
| |
| <t>Implemenations shall use a timeout T1 seconds of be used |
| before initiating a message the first time, and that in the absence |
| of a reply, up to N further attempts be made, separated by T2 seconds. |
| If no reply to the Nth attempt is received after T2 seconds have |
| passed, then the remote peer should be assumed offline or dead, and |
| the session closed.</t> |
| |
| <t>It is recommended that T1, T2, and N all be configurable, with |
| recommended default values of 60, 10, and 5. With these values, |
| sessions that appear dead after 2 minutes will be closed, and their |
| resources reclaimed.</t> |
| |
| </section> |
| |
| <section title = "ERR messages"> |
| |
| <t>ERR messages indicate a failure in the session, and abruptly |
| terminates the session. The payload for these messages consists |
| of a single byte error code, followed by an ASCII message describing |
| the error (not terminated by zero). This message shall not be |
| more than 128 bytes in length.</t> |
| |
| <t>The following error codes are defined: |
| |
| <list> |
| <t>0x01 No party listening at that address or port.</t> |
| <t>0x02 No such session found.</t> |
| <t>0x03 SP protocol ID invalid.</t> |
| <t>0x04 Generic protocol error.</t> |
| <t>0x05 Message size too big.</t> |
| <t>0xff Other uncategorized error.</t> |
| </list> |
| </t> |
| |
| <t>Implemenations MUST discard any session state upon receiving an |
| ERR message. These messages are not acknowledged.</t> |
| |
| </section> |
| |
| <section title = "Reassembly Guidelines"> |
| |
| <t>Implementations MUST accept and reassemble fragmented DATA messages. |
| Implementations MUST discard fragmented messages of other types.</t> |
| |
| <t>Messages larger than the ZeroTier MTU (2800) MUST be fragmented.</t> |
| |
| <t>Implementations SHOULD limit the number of unassembled messages |
| retained for reassembly, to minimize the likelihood of intentional |
| abuse. It is suggested that at most 2 unassembled messages be |
| retained. It is further suggested that if 2 or more unfragmented |
| messages arrive before a message is reassembled, or more than 5 |
| seconds pass before the reassembly is complete, that the unassembled |
| fragments be discarded.</t> |
| </section> |
| |
| <section title="Ports"> |
| <t>The port numbers are 16-bit fields, allowing a single ZT ID to |
| service multiple application layer protocols, which could be |
| treated as seperate end points, or as separate sockets in the |
| application. The implementation is responsible for discriminating |
| on these and delivering to the appropriate consumer.</t> |
| |
| <t>As with UDP or TCP, it is intended that each party have its own |
| port number, and that a pair of ports (combined with ZeroTier IDs) |
| be used to identify a single conversation.</t> |
| |
| <t>An SP server should allocate a port for number advertisement. |
| It is expected cliets will generate ephemeral port numbers.</t> |
| |
| <t>Implementations are free to choose how to allocate port numbers, |
| but it is recommended manually configured port numbers are small, |
| with the high order bit clear, and that numbers > 32768 (high order |
| bit set) be used for ephemeral allocations.</t> |
| |
| <t>It is recommended that separate short queues (perhaps just one |
| or two messages long) be kept per local port in implementations, to |
| prevent head-of-line blocking issues where backpressure on one |
| consumer (perhaps just a single thread or socket) blocks others.</t> |
| |
| </section> |
| |
| <section anchor="URI" title="URI Format"> |
| <t>The URI scheme used to represent ZeroTier addresses makes |
| use of ZeroTier IDs, ZeroTier network IDs, and our own 16-bit |
| ports.</t> |
| |
| <t>The format shall be nnzt://<nwid>/<ztid>:<port>, |
| where the <nwid> |
| component represents the 16-digit hexadecimal ZeroTier network ID, the |
| <ztid> represents the 10-digit hexadecimal ZeroTier Device ID, and the |
| <port> is the 16-bit port number previously described.</t> |
| |
| </section> |
| |
| <section anchor="IANA" title="IANA Considerations"> |
| <t>This memo includes no request to IANA.</t> |
| </section> |
| |
| <section anchor="Security" title="Security Considerations"> |
| <t>The mapping isn't intended to provide any additional security in |
| addition to what ZeroTier does.</t> |
| </section> |
| |
| </middle> |
| |
| </rfc> |
| |