Jabit/docs/seminar.tex

\documentclass{bfh}

\usepackage[numbers]{natbib}

\title{Informatikseminar}
\subtitle{Bitmessage -- Communication Without Metadata}
\author{Christian Basler}
\tutor{Kai Brünnler}
\date{\today}

\newcommand{\msg}[1]{\textit{#1}}
\newcommand{\obj}[1]{\textbf{#1}}
\newcommand{\node}[1]{\textbf{#1}}
\newcommand{\key}[1]{\textbf{#1}}

\begin{document}
  \maketitle

  \tableofcontents

  \section{Synopsis}

  TODO


  % Section basics
  \input{basics}


  \section{Protocol}

  \subsection{Nomenclature}

  There are a few terms that are easily mixed up. Here's a list of the most confusing ones:

  \listinginfo{}{message}{is sent from one node to another, i.e. to announce new objects or to initialize the network connection.}{}
  \listinginfo{}{msg}{is the object payload containing the actual message written by a user. The term 'message' is never used to describe information exchange between users in this document. 'Content' is mostly used instead.}{}
  \listinginfo{}{payload}{There are two kinds of payload: message payload for message types, e.g. containing inventory vectors, and object payload, which is distributed throughout the network.}{}
  \listinginfo{}{object}{is a kind of message whose payload is distributed among all nodes. Somtimes just the payload is meant.}{}

  \subsection{Process Flow}

  The newly started node \node{A} connects to a random node \node{B} from its node registry and sends a \msg{version} message, announcing the latest supported protocol version. If \node{B} accepts the version\footnote{A version is accepted by default if it is higher or equal to a nodes latest supported version. Nodes supporting experimental protocol versions might accept older versions.}, it responds with a \msg{verack} message, followed by a \msg{version} message announcing its own latest supported protocol version. Node \node{A} then decides whether it supports \node{B}'s version and sends its \msg{verack} mesage.

  If both nodes accept the connection, they both send an \msg{addr} message containing up to 1000 of its known nodes, followed by one or more \msg{inv} messages announcing all valid objects they are aware of. They then send \msg{getobject} request for all objects still missing from their inventory.

  \msg{Getobject} requests are answered by \msg{object} messages containing the requested objects.

  If a user writes a new mail on node \node{A}, it is offered via \msg{inv} to up to eight connected nodes. They will get the object and distribute it to up to eight of their connections, and so on.

  \subsection{Messages}

  The messages, objects and binary format are very well discribed in the Bitmessage wiki \cite{wiki:protocol}, the message description is therefore narrowed down to a description of what they do and when they're used.

  \subsubsection{version / verack}
  A \msg{version} message contains the latest protocol version supported by a node, as well as the streams it is interested in and which features it supports. If the other node accepts, it acknowledges with a \msg{verack} message. The connection is initialized when both nodes sent a \msg{verack} message.

  \subsubsection{addr}
  Contains up to 1000 known nodes with their IP addresses, ports, streams and supported features.

  \subsubsection{inv}
  One \msg{inv} message contains the hashes of up to 50000 valid objects. If your inventory is larger, several messages can be sent.

  \subsubsection{getdata}
  Can request up to 50000 objects by sending their hashes.

  \subsubsection{object}
  Contains one requested object, which might be one of:

  \listinginfo{}{getpubkey}{A request for a public key, which is needed to encrypt a message to a specific user.}{}
  \listinginfo{}{pubkey}{A public key. See \ref{subsec:addr} \nameref{subsec:addr}}{}
  \listinginfo{}{msg}{Content intended to be received by one user.}{}
  \listinginfo{}{broadcast}{Content sent in a way that the Addresses public key can be used to decrypt it, allowing any subscriber who knows the address to receive the such a message}{}

  \subsubsection{ping / pong / getbiginv}
  TODO: See https://github.com/Bitmessage/PyBitmessage/issues/112

  \subsection{Addresses}
  \label{subsec:addr}

  \textit{BM-2cXxfcSetKnbHJX2Y85rSkaVpsdNUZ5q9h}: Addresses start with "BM-" and are, like Bitcoin addresses, Base58 encoded\footnote{Which uses characters 1-9, A-Z and a-z without the easily confused characters I, l, 0 and O.}.

  \listinginfo{}{version}{Address version.}{}
  \listinginfo{}{stream}{Stream number.}{}
  \listinginfo{}{ripe}{Hash of both public signing and encryption key. Please note that the keys are sent without the leading 0x04 in \obj{pubkey} objects, but for creating the ripe it must be prepended. This is also necessary for most other applications, so it's a good idea to do it by default.}{ripemd160(sha512(pubSigKey + pubEncKey))}
  \listinginfo{}{checksum}{First four bytes of a double SHA-512 hash of the above.}{sha512(sha512(version + stream + ripe))}

  \subsection{Encryption}

  Bitcoin uses Elliptic Curve Cryptography for both signing and encryption. While the mathematics behind elliptic curves is even harder to understand than the usual prime-and-modulo-until-your-brain-explodes approach, it's based on the same principle that factorizing large numbers is very hard to do. Instead of two very large primes, we multiply a point on the elliptic curve by a very large number\footnote{Please don't ask me how to do it. If your're crazy enough, start at \url{http://en.wikipedia.org/wiki/Elliptic_curve_cryptography}. If you're not that crazy, use a library like Bouncy Casle.}.

  The user, let's call her Alice, needs a key pair, consisting of a private key
$$k$$
which represents a huge random number, and a public key
$$K = G k$$
which represents a point on the agreed on curve\footnote{Bitmessage uses a curve called \textit{secp256k1}.}. Please note that this is not a simple multiplication, but the multiplication of a point along an elliptic curve. $G$ is the starting point for all operations on a specific curve.

  Another user, Bob, knows the public key. To encrypt a message, Bob creates a temporary key pair
$$r$$
and
$$R = G r$$
He then calculates
$$K r$$
uses the resulting Point to encrypt the message\footnote{A double SHA-512 hash over the x-coordinate is used to create the actual key.} and sends $K$ along with the message.

  When Alice receives the message, she uses the fact that
$$K r = G k r = G r k = R k$$
so she just uses $R k$ to decrypt the message.

  The exact method used in Bitmessage is called Elliptic Curve Integrated Encryption Scheme or ECIES\footnote{See \url{http://en.wikipedia.org/wiki/Integrated_Encryption_Scheme}}.

  \subsubsection{Signature}

  To sign objects, Bitmessage uses Elliptic Curve Digital Signature Algorithm or ECDSA. This is slightly more complicated, if you want the details, Wikipedia is a fine starting point: \url{http://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm}.

  \section{Issues}

  \subsection{Scalability}

  Bitmessage doen't really scale. If there are very few users, anonymity isn't given anymore, and with many users traffic and storage use grows quadratically.

  \subsubsection{Streams}
  The intended solution for this problem is splitting traffic -- addresses, more precisely -- into streams. A node listens only on the streams that concern its addresses. If it wants to send an object to another stream, it just connects to a node in this stream to send the object, then disconnects. When all active streams are full, a new one is created which should be used for new addresses.

  The unsolved problem is to determine when a stream is full. Another issue is the fact that, as the overall network grows, traffic on full streams still grows, as there are more users who might wanto to write someone on the full stream.

  \subsubsection{Prefix Filtering}
  TODO

  \subsection{Forward Secrecy}

  Obviously it's trivial for an attacker to collect all (encrypted) objects distributed through the Bitmessage network\footnote{As long as disk space is not an issue.}. If this attacker can somehow get the private key of a user, they can decrypt all stored messages intended for that user, as well as impersonate said user\footnote{The latter might be more difficult if they got the key through a brute force attack.}.

  Plausible deniability can, in some scenarios, help against this. This action, called "nuking an address", is done by anonymously publishing the private keys somewhere publicly accessible\footnote{Soo \url{https://bitmessage.ch/nuked/} for an example.}.

  Perfect forward secrecy seems impractical to implement, as it requires to exchange messages prior to sending content. That would in turn need proof of work to protect the network, resulting in twice the work for the sender and three times longer to send --- that is if both clients are online.

  \section{Discussion}

  TODO


  \bibliographystyle{plain}
  \bibliography{bibliography}

  \appendix
  \addcontentsline{toc}{section}{Appendix}
  \section*{Appendix}
  \renewcommand{\thesubsection}{\Alph{subsection}}

  \subsection{TODO}


\end{document}