Network address translations (NATs) have proven to be a costly hurdle in the widespread deployment of voice over IP (VoIP). The cost can be measured in financial terms, as well as missed opportunities for a wider deployment of the technology. For a long time, the Internet Engineering Task Force (IETF) ignored NATs, hoping that they would go away. That indecision enabled the marketplace to become both judge and jury on what a NAT is, and how it should work, all without explicit guidance from the IETF. The result of this indecision is visible today most vividly in the VoIP area, where getting two VoIP peers to establish a voice session between autonomous organization boundaries is extremely difficult. Recently, the IETF has paid much more attention to the NAT issue, which is good news indeed.
So what does this have to do with the paper? The paper argues that the session initiation protocol (SIP), the foremost signaling protocol for establishing, maintaining, and terminating multimedia sessions, should be used as the protocol to establish all kinds of peer-to-peer (P2P) communications. In other words, SIP should become the middleware that P2P hosts use to figure out the reachability of each other across NAT-infested networks. SIP has had considerable success in defining its own extensions, and, sometimes, defining brand new protocols to address the NAT traversal issue.
The paper goes on to outline the NUTSS architecture, which is still in nascent form, and to provide a transmission control protocol (TCP) solution using it (TCP has proven to be problematic when used with NATs, for various reasons outlined in the paper).
Using SIP as middleware for network discovery among P2P peers is an interesting idea. However, it would seem more appropriate if the P2P peers used the underlying ideas and protocols from SIP to discover peers, rather than SIP itself. In other words, SIP uses secondary protocols like serial tunneling (STUN) and traversal using relay NAT (TURN), and a unifying protocol called interactive connection establishment (ICE) to discover peers. STUN, TURN, and ICE are orthogonal to SIP. It would seem more appropriate for P2P hosts to use STUN, TURN, and ICE directly, instead of implementing the SIP transactional state machines and offer/answer model to discover peers, especially if they have no use for interactive multimedia sessions.