| <?xml version="1.0" encoding="US-ASCII"?> |
| <!DOCTYPE rfc SYSTEM "rfc2629.dtd"> |
| |
| <rfc category="info" docName="sp-request-reply-01"> |
| |
| <front> |
| |
| <title abbrev="Request/Reply SP"> |
| Request/Reply Scalability Protocol |
| </title> |
| |
| <author fullname="Martin Sustrik" initials="M." role="editor" |
| surname="Sustrik"> |
| <organization>GoPivotal Inc.</organization> |
| <address> |
| <email>msustrik@gopivotal.com</email> |
| </address> |
| </author> |
| |
| <date month="August" year="2013" /> |
| |
| <area>Applications</area> |
| <workgroup>Internet Engineering Task Force</workgroup> |
| |
| <keyword>Request</keyword> |
| <keyword>Reply</keyword> |
| <keyword>REQ</keyword> |
| <keyword>REP</keyword> |
| <keyword>stateless</keyword> |
| <keyword>service</keyword> |
| <keyword>SP</keyword> |
| |
| <abstract> |
| <t>This document defines a scalability protocol used for distributing |
| processing tasks among arbitrary number of stateless processing nodes |
| and returning the results of the processing.</t> |
| </abstract> |
| |
| </front> |
| |
| <middle> |
| |
| <section title = "Introduction"> |
| |
| <t>One of the most common problems in distributed applications is how to |
| delegate a work to another processing node and get the result back to |
| the original node. In other words, the goal is to utilise the CPU |
| power of a remote node.</t> |
| |
| <t>There's a wide range of RPC systems addressing the problem, however, |
| instead of relying on simple RPC algorithm, we will aim at solving a |
| more general version of the problem. First, we want to issue processing |
| requests from multiple clients, not just a single one. Second, we want |
| to distribute the tasks to any number processing nodes instead of a |
| single one so that the processing can be scaled up by adding new |
| processing nodes as necessary.</t> |
| |
| <t>Solving the generalised problem requires that the algorithm |
| executing the task in question -- also known as "service" -- is |
| stateless.</t> |
| |
| <t>To put it simply, the service is called "stateless" when there's no |
| way for the user to distinguish whether a request was processed by |
| one instance of the service or another one.</t> |
| |
| <t>So, for example, a service which accepts two integers and multiplies |
| them is stateless. Request for "2x2" is always going to produce "4", |
| no matter what instance of the service have computed it.</t> |
| |
| <t>Service that accepts empty requests and produces the number |
| of requests processed so far (1, 2, 3 et c.), on the other hand, is |
| not stateless. To prove it you can run two instances of the service. |
| First reply, no matter which instance produces it is going to be 1. |
| Second reply though is going to be either 2 (if processed by the same |
| instance as the first one) or 1 (if processed by the other instance). |
| You can distinguish which instance produced the result. Thus, |
| according to the definition, the service is not stateless.</t> |
| |
| <t>Despite the name, being "stateless" doesn't mean that the service has |
| no state at all. Rather it means that the service doesn't retain any |
| business-logic-related state in-between processing two subsequent |
| requests. The service is, of course, allowed to have state while |
| processing a single request. It can also have state that is unrelated |
| to its business logic, say statistics about the processing that are |
| used for administrative purposes and never returned to the clients.</t> |
| |
| <t>Also note that "stateless" doesn't necessarily mean "fully |
| deterministic". For example, a service that generates random numbers is |
| non-deterministic. However, the client, after receiving a new random |
| number cannot tell which instance has produced it, thus, the service |
| can be considered stateless.</t> |
| |
| <t>While stateless services are often implemented by passing the entire |
| state inside the request, they are not required to do so. Especially |
| when the state is large, passing it around in each request may be |
| impractical. In such cases, it's typically just a reference to the |
| state that's passed in the request, such as ID or path. The state |
| itself can then be retrieved by the service from a shared database, |
| a network file system or similar storage mechanism.</t> |
| |
| <t>Requiring services to be stateless serves a specific purpose. |
| It allows for using any number of service instances to handle |
| the processing load. After all, the client won't be able to tell the |
| difference between replies from instance A and replies from instance B. |
| You can even start new instances on the fly and get away with it. |
| The client still won't be able to tell the difference. In other |
| words, statelessness is a prerequisite to make your service cluster |
| fully scalable.</t> |
| |
| <t>Once it is ensured that the service is stateless there are several |
| topologies for a request/reply system to form. What follows are |
| the most common: |
| |
| <list style = "numbers"> |
| <t>One client sends a request to one server and gets a reply. |
| The common RPC scenario.</t> |
| <t>Many clients send requests to one server and get replies. The |
| classic client/server model. Think of a database server and |
| database clients. Alternatively think of a messaging broker and |
| messaging clients.</t> |
| <t>One client send requests to many servers and gets replies. |
| The load-balancer model. Think of HTTP load balancers.</t> |
| <t>Many clients send requests to be processed by many servers. |
| The "enterprise service bus" model. In the simplest case the bus |
| can be implemented as a simple hub-and-spokes topology. In complex |
| cases the bus can span multiple physical locations or multiple |
| oraganisations with intermediate nodes at the boundaries connecting |
| different parts of the topology.</t> |
| </list> |
| |
| </t> |
| |
| <t>In addition to distributing tasks to processing nodes, request/reply |
| model comes with full end-to-end reliability. The reliability guarantee |
| can be defined as follows: As long as the client is alive and there's |
| at least one server accessible from the client, the task will |
| eventually get processed and the result will be delivered back to |
| the client.</t> |
| |
| <t>End-to-end reliability is achieved, similar to TCP, by re-sending the |
| request if the client believes the original instance of the request |
| has failed. Typically, request is believed to have failed when there's |
| no reply received within a specified time.</t> |
| |
| <t>Note that, unlike with TCP, the reliability algorithm is resistant to |
| a server failure. Even if server fails while processing a request, the |
| request will be re-sent and eventually processed by a different |
| instance of the server.</t> |
| |
| <t>As can be seen from the above, one request may be processed multiple |
| times. For example, reply may be lost on its way back to the client. |
| Client will assume that the request was not processed yet, it will |
| resend it and thus cause duplicit execution of the task.</t> |
| |
| <t>Some applications may want to prevent duplicit execution of tasks. It |
| often turns out that hardening such applications to be idempotent is |
| relatively easy as they already possess the tools to do so. For |
| example, a payment processing server already has access to a shared |
| database which it can use to verify that the payment with specified ID |
| was not yet processed.</t> |
| |
| <t>On the other hand, many applications don't care about occasional |
| duplicitly processed tasks. Therefore, request/reply protocol does not |
| require the service to be idempotent. Instead, the idempotancy issue |
| is left to the user to decide on.</t> |
| |
| <t>Finally, it should be noted that this specification discusses several |
| features that are of little use in simple topologies and are rather |
| aimed at large, geographically or organisationally distributed |
| topologies. Features like channel prioritisation and loop avoidance |
| fall into this category.</t> |
| |
| </section> |
| |
| <section title = "Underlying protocol"> |
| |
| <t>The request/reply protocol can be run on top of any SP mapping, |
| such as, for example, SP TCP mapping.</t> |
| |
| <t>Also, given that SP protocols describe the behaviour of entire |
| arbitrarily complex topology rather than of a single node-to-node |
| communication, several underlying protocols can be used in parallel. |
| For example, a client may send a request via WebSocket, then, on the |
| edge of the company network an intermediary node may retransmit it |
| using TCP et c.</t> |
| |
| <figure> |
| <artwork> |
| +---+ WebSocket +---+ TCP +---+ |
| | |-------------| |-----------| | |
| +---+ +---+ +---+ |
| | | |
| +---+ IPC | | SCTP +---+ DCCP +---+ |
| | |---------+ +--------| |-----------| | |
| +---+ +---+ +---+ |
| </artwork> |
| </figure> |
| |
| </section> |
| |
| <section title = "Overview of the algorithm"> |
| |
| <t>Request/reply protocol defines two different endpoint types: |
| The requester or REQ (the client) and the replier or REP (the |
| service).</t> |
| |
| <t>REQ endpoint can be connected only to a REP endpoint. REP endpoint |
| can be connected only to the REQ endpoint. If the underlying protocol |
| indicates that there's an attempt to create a channel to an |
| incompatible endpoint, the channel MUST NOT be used. In the case of |
| TCP mapping, for example, the underlying TCP connection MUST |
| be closed.</t> |
| |
| <t>When creating more complex topologies, REQ and REP endpoints are |
| paired in the intermediate nodes to form a forwarding component, |
| so called "device". Device receives requests from the REP endpoint |
| and forwards them to the REQ endpoint. At the same time it receives |
| replies from the REQ endpoint and forwards them to the REP |
| endpoint:</t> |
| |
| <figure> |
| <artwork> |
| --- requests --> |
| |
| +-----+ +-----+-----+ +-----+-----+ +-----+ |
| | |-->| | |-->| | |-->| | |
| | REQ | | REP | REQ | | REP | REQ | | REP | |
| | |<--| | |<--| | |<--| | |
| +-----+ +-----+-----+ +-----+-----+ +-----+ |
| |
| <-- replies --- |
| </artwork> |
| </figure> |
| |
| <t>Using devices, arbitrary complex topologies can be built. The rest |
| of this section explains how are the requests routed through a topology |
| towards processing nodes and how are replies routed back from |
| processing nodes to the original clients, as well as how the |
| reliability is achieved.</t> |
| |
| <t>The idea for routing requests is to implement a simple coarse-grained |
| scheduling algorithm based on pushback capabilities of the underlying |
| transport.</t> |
| |
| <t>The algorithm works by interpreting pushback on a particular channel |
| as "the part of topology accessible through this channel is busy at |
| the moment and doesn't accept any more requests."</t> |
| |
| <t>Thus, when a node is about to send a request, it can choose to send |
| it only to one of the channels that don't report pushback at the |
| moment. To implement approximately fair distibution of the workload |
| the node choses a channel from that pool using the round-robin |
| algorithm.</t> |
| |
| <t>As for delivering replies back to the clients, it should be understood |
| that the client may not be directly accessible (say using TCP/IP) from |
| the processing node. It may be beyond a firewall, have no static IP |
| address et c. Furthermore, the client and the processing may not even |
| speak the same transport protocol -- imagine client connecting to the |
| topology using WebSockets and processing node via SCTP.</t> |
| |
| <t>Given the above, it becomes obvious that the replies must be routed |
| back through the existing topology rather than directly. In fact, |
| request/reply topology may be thought of as an overlay network on the |
| top of underlying transport mechanisms.</t> |
| |
| <t>As for routing replies within the request/topology, it is designed in |
| such a way that each reply contains the whole routing path, rather |
| than containing just the address of destination node, as is the case |
| with, for example, TCP/IP.</t> |
| |
| <t>The downside of the design is that replies are a little bit longer |
| and that is in intermediate node gets restarted, all the requests |
| that were routed through it will fail to complete and will have to be |
| resent by request/reply end-to-end reliability mechanism.</t> |
| |
| <t>The upside, on the other hand, is that the nodes in the topology don't |
| have to maintain any routing tables beside the simple table of |
| adjacent channels along with thier IDs. There's also no need for any |
| additional protocols for distributing routing information within |
| the topology.</t> |
| |
| <t>The most important reason for adopting the design though is that |
| there's no propagation delay and any nodes becomes accessible |
| immediately after it is started. Given that some nodes in the topology |
| may be extremely short-lived this is a crucial requirement. Imagine |
| a database client that sends a query, reads the result and terminates. |
| It makes no sense to delay the whole process till the routing tables |
| are synchronised between the client and the server.</t> |
| |
| <t>The algorithm thus works as follows: When request is routed from the |
| client to the processing node, every REP endpoint determines which |
| channel it was received from and adds the ID of the channel to the |
| request. Thus, when the request arrives at the ultimate processing node |
| it already contains a full backtrace stack, which in turn contains |
| all the info needed to route a message back to the original client.</t> |
| |
| <t>After processing the request, the processing node attaches the |
| backtrace stack from the request to the reply and sends it back |
| to the topology. At that point every REP endpoint can check the |
| traceback and determine which channel it should send the reply to.</t> |
| |
| <t>In addition to routing, request/reply protocol takes care of |
| reliability, i.e. ensures that every request will be eventually |
| processed and the reply will be delivered to the user, even when |
| facing failures of processing nodes, intermediate nodes and network |
| infrastructure.</t> |
| |
| <t>Reliability is achieved by simply re-sending the request, if the reply |
| is not received with a certain timeframe. To make that algorithm |
| work flawlessly, the client has to be able to filter out any stray |
| replies (delayed replies for the requests that we've already received |
| reply to).</t> |
| |
| <t>The client thus adds an unique request ID to the request. The ID gets |
| copied from the request to the reply by the processing node. When the |
| reply gets back to the client, it can simply check whether the request |
| in question is still being processed and if not so, it can ignore |
| the reply.</t> |
| |
| <t>To implement all the functionality described above, messages (both |
| requests and replies have the following format:</t> |
| |
| <figure> |
| <artwork> |
| +-+------------+-+------------+ +-+------------+-------------+ |
| |0| Channel ID |0| Channel ID |...|1| Request ID | payload | |
| +-+------------+-+------------+ +-+------------+ ------------+ |
| </artwork> |
| </figure> |
| |
| <t>Payload of the message is preceded by a stack of 32-bit tags. The most |
| significant bit of each tag is set to 0 except for the very last tag. |
| That allows the algorithm to find out where the tags end and where |
| the message payload begins.</t> |
| |
| <t>As for the reamining 31 bits, they are either request ID (in the last |
| tag) or a channel ID (in all the remaining tags). The first channel ID |
| is added and processed by the REP endpoint closest to the processing |
| node. The last channel ID is added and processed by the REP endpoint |
| closest to the client.</t> |
| |
| <t>Following picture shows an example of request saying "Hello" being |
| routed from the client through two intermediate nodes to the |
| processing node and the reply "World" being routed back. It shows |
| what messages are passed over the network at each step of the |
| process:</t> |
| |
| <figure> |
| <artwork> |
| client |
| Hello | World |
| | +-----+ ^ |
| | | REQ | | |
| V +-----+ | |
| 1|823|Hello | 1|823|World |
| | +-----+ ^ |
| | | REP | | |
| | +-----+ | |
| | | REQ | | |
| V +-----+ | |
| 0|299|1|823|Hello | 0|299|1|823|World |
| | +-----+ ^ |
| | | REP | | |
| | +-----+ | |
| | | REQ | | |
| V +-----+ | |
| 0|446|0|299|1|823|Hello | 0|446|0|299|1|823|World |
| | +-----+ ^ |
| | | REP | | |
| V +-----+ | |
| Hello | World |
| service |
| </artwork> |
| </figure> |
| |
| </section> |
| |
| <section title = "Hop-by-hop vs. End-to-end"> |
| |
| <t>All endpoints implement so called "hop-by-hop" functionality. It's |
| the functionality concerned with sending messages to the immediately |
| adjacent components and receiving messages from them.</t> |
| |
| <t>In addition to that, the endpoints on the edge of the topology |
| implement so called "end-to-end" functionality that is concerned |
| with issues such as, for example, reliability.</t> |
| |
| <figure> |
| <artwork> |
| end to end |
| +-----------------------------------------+ |
| | | |
| +-----+ +-----+-----+ +-----+-----+ +-----+ |
| | |-->| | |-->| | |-->| | |
| | REQ | | REP | REQ | | REP | REQ | | REP | |
| | |<--| | |<--| | |<--| | |
| +-----+ +-----+-----+ +-----+-----+ +-----+ |
| | | | | | | |
| +---------+ +---------+ +---------+ |
| hop by hop hop by hop hop by hop |
| </artwork> |
| </figure> |
| |
| <t>To make an analogy with the TCP/IP stack, IP provides hop-by-hop |
| functionality, i.e. routing of the packets to the adjacent node, |
| while TCP implements end-to-end functionality such resending of |
| lost packets.</t> |
| |
| <t>As a rule of thumb, raw hop-by-hop endpoints are used to build |
| devices (intermediary nodes in the topology) while end-to-end |
| endpoints are used directly by the applications.</t> |
| |
| <t>To prevent confusion, the specification of the endpoint behaviour |
| below will discuss hop-by-hop and end end-to-end functionality in |
| separate chapters.</t> |
| |
| </section> |
| |
| <section title = "Hop-by-hop functionality"> |
| |
| <section title = "REQ endpoint"> |
| |
| <t>The REQ endpoint is used by the user to send requests to the |
| processing nodes and receive the replies afterwards.</t> |
| |
| <t>When user asks REQ endpoint to send a request, the endpoint should |
| send it to one of the associated outbound channels (TCP connections |
| or similar). The request sent is exactly the message supplied by |
| the user. REQ socket MUST NOT modify an outgoing request in any |
| way.</t> |
| |
| <t>If there's no channel to send the request to, the endpoint won't send |
| the request and MUST report the backpressure condition to the user. |
| For example, with BSD socket API, backpressure is reported as EAGAIN |
| error.</t> |
| |
| <t>If there are associated channels but none of them is available for |
| sending, i.e. all of them are already reporting backpressure, the |
| endpoint won't send the message and MUST report the backpressure |
| condition to the user.</t> |
| |
| <t>Backpressure is used as a means to redirect the requests from the |
| congested parts of the topology to to the parts that are still |
| responsive. It can be thought of as a crude scheduling algorithm. |
| However crude though, it's probably still the best you can get |
| without knowing estimates of execution time for individual tasks, |
| CPU capacity of individual processing nodes et c.</t> |
| |
| <t>Alternatively, backpressure can be thought of as a congestion control |
| mechanism. When all available processing nodes are busy, it slows |
| down the client application, i.e. it prevents the user from sending |
| any more requests.</t> |
| |
| <t>If the channel is not capable of reporting backpressure (e.g. DCCP) |
| the endpoint SHOULD consider it as always available for sending new |
| request. However, such channels should be used with care as when the |
| congestion hits they may suck in a lot of requests just to discard |
| them silently and thus cause re-transmission storms later on. The |
| implementation of the REQ endpoint MAY choose to prohibit the use |
| of such channels altogether.</t> |
| |
| <t>When there are multiple channels available for sending the request |
| endpoint MAY use any prioritisation mechanism to decide which channel |
| to send the request to. For example, it may use classic priorities |
| attached to channels and send message to the channel with the highest |
| priority. That allows for routing algorithms such as: "Use local |
| processing nodes if any are available. Send the requests to remote |
| nodes only if there are no local ones available." Alternatively, |
| the endpoint may implement weighted priorities ("send 20% of the |
| request to node A and 80% to node B). The endpoint also may not |
| implement any prioritisation strategy and treat all channels as |
| equal.</t> |
| |
| <t>Whatever the case, two rules must apply.</t> |
| |
| <t>First, by default the priority settings for all channels MUST be |
| equal. Creating a channel with different priority MUST be triggered |
| by an explicit action by the user.</t> |
| |
| <t>Second, if there are several channels with equal priority, the |
| endpoint MUST distribute the messages among them in fair fashion |
| using round-robin algorithm. The round-robin implementation MUST also |
| take care not to become unfair when new channels are added or old |
| ones are removed on the fly.</t> |
| |
| <t>As for incoming messages, i.e. replies, REQ endpoint MUST fair-queues |
| them. In other words, if there are replies available on several |
| channels, it MUST receive them in a round-robin fashion. It mast also |
| take care not to compromise the fairness when new channels are |
| added or old ones removed.</t> |
| |
| <t>In addition to providing basic fairness, the goal of fair-queueing is |
| to prevent DoS attacks where a huge stream of fake replies from one |
| channel would be able to block the real replies coming from different |
| channels. Fair queueing ensures that messages from every channel are |
| received at approximately the same rate. That way, DoS attack can |
| slow down the system but it can't entirely block it.</t> |
| |
| <t>Incoming replies MUST be handed to the user exactly as they were |
| received. REQ endpoint MUST not modify the replies in any way.</t> |
| |
| </section> |
| |
| <section title = "REP endpoint"> |
| |
| <t>REP endpoint is used to receive requests from the clients and send |
| replies back to the clients.</t> |
| |
| <t>First of all, REP socket is responsible for assigning unique 31-bit |
| channel IDs to the individual associated channels.</t> |
| |
| <t>First ID assigned MUST be random. Next is computed by adding 1 to |
| the previous one with potential overflow to 0.</t> |
| |
| <t>The implementation MUST ensure that the random number is different |
| each time the endpoint is re-started, the process that contains |
| it is restarted or similar. So, for example, using pseudo-random |
| generator with a constant seed won't do.</t> |
| |
| <t>The goal of the algorithm is to the spread of possible channel ID |
| values and thus minimise the chance that a reply is routed to an |
| unrelated channel, even in the face of intermediate node |
| failures.</t> |
| |
| <t>When receiving a message, REP endpoint MUST fair-queue among the |
| channels available for receiving. In other words it should |
| round-robin among such channels and receive one request from |
| a channel at a time. It MUST also implement the round-robin |
| algorithm is such a way that adding or removing channels don't |
| break its fairness.</t> |
| |
| <t>In addition to guaranteeing basic fairness in access to computing |
| resources the above algorithm makes it impossible for a malevolent |
| or misbehaving client to completely block the processing of requests |
| from other clients by issuing steady stream of requests.</t> |
| |
| <t>After getting hold on the request, the REP socket should prepend it |
| by 32 bit value, consisting of 1 bit set to 0 followed by the 31-bit |
| ID of the channel the request was received from. The extended request |
| will be then handed to the user.</t> |
| |
| <t>The goal of adding the channel ID to the request is to be able to |
| route the reply back to the original channel later on. Thus, when |
| the user sends a reply, endpoint strips first 32 bits off and uses |
| the value to determine where it is to be routed.</t> |
| |
| <t>If the reply is shorter than 32 bits, it is malformed and |
| the endpoint MUST ignore it. Also, if the most relevant bit of the |
| 32-bit value isn't set to 0, the reply is malformed and MUST |
| be ignored.</t> |
| |
| <t>Otherwise, the endpoint checks whether its table of associated |
| channels contains the channel with a corresponding ID. If so, it |
| sends the reply (with first 32 bits stripped off) to that channel. |
| If the channel is not found, the reply MUST be dropped. If the |
| channel is not available for sending, i.e. it is applying |
| backpressure, the reply MUST be dropped.</t> |
| |
| <t>Note that when the reply is unroutable two things might have |
| happened. Either there was some kind of network disruption, in which |
| case the request will be re-sent later on, or the original client |
| have failed or been shut down. In such case the request won't be |
| resent, however, it doesn't really matter because there's no one to |
| deliver the reply to any more anyway.</t> |
| |
| <t>Also note that unlike with the requests there's no pushback applied |
| to the replies. The replies are simply dropped in the case of |
| pushback. The reason for this behaviour is that if the endpoint |
| blocked and waited for the channel be become available, all the |
| subsequent replies, possibly destined for different, unblocked |
| channels would be blocked in the meantime anyway. That would allow |
| for a DoS attack by simply firing a lot of requests and not receiving |
| the replies.</t> |
| |
| </section> |
| |
| </section> |
| |
| <section title = "End-to-end functionality"> |
| |
| <t>End-to-end functionality is built on top of hop-to-hop functionality. |
| Thus, an endpoint on the edge of a topology contains all the |
| hop-by-hop functionality, but also implements additional |
| functionality of its own. This end-to-end functionality acts |
| basically as a user of the underlying hop-by-hop functionality.</t> |
| |
| <section title = "REQ endpoint"> |
| |
| <t>End-to-end functionality for REQ sockets is concerned with re-sending |
| the requests in case of failure and with filtering out stray or |
| outdated replies.</t> |
| |
| <t>To be able to do the latter, the endpoint must tag the requests with |
| unique 31-bit request IDs. First request ID is picked at random. All |
| subsequent request IDs are generated by adding 1 to the last request |
| ID and possibly overflowing to 0.</t> |
| |
| <t>To improve robustness of the system, the implementation MUST ensure |
| that the random number is different each time the endpoint, the |
| process or the machine is restarted. Pseudo-random generator with |
| fixed seed won't do.</t> |
| |
| <t>When user asks the endpoint to send a message, the endpoint prepends |
| a 32-bit value to the message, consisting of a single bit set to 1 |
| followed by a 31-bit request ID and passes it on in a standard |
| hop-by-hop way.</t> |
| |
| <t>If the hop-by-hop layer reports pushback condition, the end-to-end |
| layer considers the request unsent and MUST report pushback condition |
| to the user.</t> |
| |
| <t>If the request is successfully sent, the endpoint stores the request |
| including its request ID, so that it can be resent later on if |
| needed. At the same time it sets up a timer to trigger the |
| re-transimission in case the reply is not received within a specified |
| timeout. The user MUST be allowed to specify the timeout interval. |
| The default timeout interval must be 60 seconds.</t> |
| |
| <t>When a reply is received from the underlying hop-by-hop |
| implementation, the endpoint should strip off first 32 bits from |
| the reply to check whether it is a valid reply.</t> |
| |
| <t>If the reply is shorter than 32 bits, it is malformed and the |
| endpoint MUST ignore it. If the most significant bit of the 32-bit |
| value is set to 0, the reply is malformed and MUST be ignored.</t> |
| |
| <t>Otherwise, the endpoint should check whether the request ID in |
| the reply matches any of the request IDs of the requests being |
| processed at the moment. If not so, the reply MUST be ignored. |
| It is either a stray message or a duplicate reply.</t> |
| |
| <t>Please note that the endpoint can support either one or more |
| requests being processed in parallel. Which one is the case depends |
| on the API exposed to the user and is not part of this |
| specification.</t> |
| |
| <t>If the ID in the reply matches one of the requests in progress, the |
| reply MUST be passed to the user (with the 32-bit prefix stripped |
| off). At the same time the stored copy of the original request as |
| well as re-transmission timer must be deallocated.</t> |
| |
| <t>Finally, REQ endpoint MUST make it possible for the user to cancel |
| a particular request in progress. What it means technically is |
| deleting the stored copy of the request and cancelling the associated |
| timer. Thus, once the reply arrives, it will be discarded by the |
| algorithm above.</t> |
| |
| <t>The cancellation allows, for example, the user to time out a request. |
| They can simply post a request and if there's no answer in specific |
| timeframe, they can cancel it.</t> |
| |
| </section> |
| |
| <section title = "REP endpoint"> |
| |
| <t>End-to-end functionality for REP endpoints is concerned with turning |
| requests into corresponding replies.</t> |
| |
| <t>When user asks to receive a request, the endpoint gets next request |
| from the hop-by-hop layer and splits it into the traceback stack and |
| the message payload itself. The traceback stack is stored and the |
| payload is returned to the user.</t> |
| |
| <t>The algorithm for splitting the request is as follows: Strip 32 bit |
| tags from the message in one-by-one manner. Once the most significant |
| bit of the tag is set, we've reached the bottom of the traceback |
| stack and the splitting is done. If the end of the message is reached |
| without finding the bottom of the stack, the request is malformed and |
| MUST be ignored.</t> |
| |
| <t>Note that the payload payload produced by this procedure is the same |
| as the request payload sent by the original client.</t> |
| |
| <t>Once the user processes the request and sends the reply, the endpoint |
| prepends the reply with the stored traceback stack and sends it on |
| using the hop-by-hop layer. At that point the stored traceback stack |
| MUST be deallocated.</t> |
| |
| <t>Additionally, REP endpoint MUST support cancelling any request being |
| processed at the moment. What it means, technically, is that |
| state associated with the request, i.e. the traceback stack stored |
| by the endpoint is deleted and reply to that particular |
| request is never sent.</t> |
| |
| <t>The most important use of cancellation is allowing the service |
| instances to ignore malformed requests. If the application-level |
| part of the request doesn't conform to the application protocol |
| the service can simply cancel the request. In such case the reply |
| is never sent. Of course, if application wants to send an |
| application-specific error massage back to the client it can do so |
| by not cancelling the request and sending a regular reply.</t> |
| |
| </section> |
| |
| </section> |
| |
| <section title = "Loop avoidance"> |
| |
| <t>It may happen that a request/reply topology contains a loop. It becomes |
| increasingly likely as the topology grows out of scope of a single |
| organisation and there are multiple administrators involved |
| in maintaining it. Unfortunate interaction between two perfectly |
| legitimate setups can cause loop to be created.</t> |
| |
| <t>With no additional guards against the loops, it's likely that |
| requests will be caugth inside the loop, rotating there forever, |
| each message gradually growing in size as new prefixes are added to it |
| by each REP endpoint on the way. Eventually, a loop can cause |
| congestion and bring the whole system to a halt.</t> |
| |
| <t>To deal with the problem REQ endpoints MUST check the depth of the |
| traceback stack for every outgoing request and discard any requests |
| where it exceeds certain threshold. The threshold should be defined |
| by the user. The default value is suggested to be 8.</t> |
| |
| </section> |
| |
| <section anchor="IANA" title="IANA Considerations"> |
| <t>New SP endpoint types REQ and REP should be registered by IANA. For |
| now, value of 16 should be used for REQ endpoints and value of 17 for |
| REP endpoints.</t> |
| </section> |
| |
| <section anchor="Security" title="Security Considerations"> |
| <t>The mapping is not intended to provide any additional security to the |
| underlying protocol. DoS concerns are addressed within |
| the specification.</t> |
| </section> |
| |
| </middle> |
| |
| </rfc> |
| |