FRESH

Hacker News

Home

Signing data structures the wrong way

119 points by malgorithms

by Retr0id

2 subcomments

Putting domain separators in the IDL is interesting but you can also avoid the problem by putting the domain separators in-band (e.g. in some kind of "type" field that is always present).
Tangentially, depending on what your input and data model look like, canonicalisation takes O(nlogn) time (i.e. the cost of sorting your fields).
Here I describe an alternative approach that produces deterministic hashes without a distinct canonicalization step, using multiset hashing: https://www.da.vidbuchanan.co.uk/blog/signing-json.html

by lukev

1 subcomments

So, isn't this a rather longwinded way to say that a signature only extends to the scope of the message it contains?
It doesn't matter if I sign the word "yes", if you don't know what question is being asked. The signature needs to included the necessary context for the signature to be meaningful.
Lots of ways of doing that, and you definitely need to be thoughtful about redundant data and storage overhead, but the concept isn't tricky.

by socketcluster

1 subcomments

The crypto dev community has a strange idea that working with binary is superior. For many algorithms, it's not. It just obfuscates what's happening and the performance advantage is negligible... Especially in the context of all the other logic in the system which uses far more resources.
I didn't know that Protobuf wasn't canonical but even without this knowledge, there are many other factors which make it an inferior format to JSON.
Also, on a related topic; it seems unwise that essentially all the cryptographic primitives that everyone is using are often distributed as compiled binaries. I cannot think of anything more antithetical to security than that.
I implemented my own stateful signature algorithm for my blockchain project from scratch using utf8 as the base format and HMAC-SHA256 for key derivation. It makes it so much easier to understand and implement correctly. It uses Lamport OTS with Merkel MSS. The whole thing including all dependencies is like 4000 lines of easy-to-read JavaScript code. About 300 lines of code for MSS and 300 lines for Lamport OTS... The rest are just generic utility functions. You don't need to trust anyone else to "do it right" when the logic is simple and you can read it and verify it yourself! Simplicity of implementation and verification of the code is a critical feature IMO.
If your perfect crypto library is so complex that only 10 people in the world can understand it, that's not very secure! There is massive centralization and supply chain risk. You're hoping that some of these 10 people will regularly review the code and dependencies... Will they? Can you even trust them?
Choosing to use a popular cryptographic library which distributes binaries is basically trading off the risk of implementation mistake for the risk of supply chain attack... Which seems like a greater risk.
Anyway it's kind of wild to now be reading this and seeing people finally coming round to this approach. I've been saying this for years. You can check out https://www.npmjs.com/package/lite-merkle feedback welcome.

by formerly_proven

4 subcomments

This article claims that these are somewhat open questions, but they're not and have not been for a long time.
#1 You sign a blob and you don't touch it before verifying the signature (aka "The Cryptographic Doom Principle") #2 Signatures are bound to a context which is _not_ transmitted but used for deriving the key or mixed into the MAC or what have you. This is called the Horton principle. It ensures that signer/verifier must cryptographically agree on which context the message is intended for. You essentially cannot implement this incorrectly because if you do, all signatures will fail to verify.
The article actually proposes to violate principle #2 (by embedding some magic numbers into the protocol headers and presuming that someone will check them), which is an incorrect design and will result in bad things if history is any indication.
Principles #1 and #2 are well-established cryptographic design principles for just a handful of decades each.

by tantalor

1 subcomments

Since the example was given in proto, I'll suggest a solution in proto: add a message option.

  extend google.protobuf.MessageOptions {
    optional uint64 domain_separator = 1234;
  }

  message TreeRoot {
    option (domain_separator) = 4567;
    ...
  }

by efitz

1 subcomments

When my data structures are messages to be sent over a network, I always start with msgId and msgLen, both fixed width fields.
This solves the message differentiation problem explicitly, makes security and memory management easier, and reduces routing to:
switch(msg.msgId): …

by cogman10

2 subcomments

Why not digest the type as part of the hash? This avoids the problem in the article and keeps the transmission size small.

by jeffrallen

0 subcomment

This is a nice explanation of an obvious idea. Both domain separation, and putting the domain signifier into the IDL are fine, but not novel.
Crypto is hard. Do it right. Get help from your tools. 'Nuff said.
Jeeze, I'm getting too old for this crap.

by Muromec

2 subcomments

So another lesson had been relearned from asn.1. I'm proud of working in this industry again! Next we will figure out to always put versions into the data too

by erpellan

0 subcomment

Am I missing something or would this be solved by adding a 1 byte `msg` field to the payload?

by Someone

0 subcomment

TLDR: the idea is
- to have a convention to, instead of signing “payloads, to always sign “type identifier + payload”, to prevent adversaries from reusing your signature to sign the same payload, interpreted as a different type.
- use 64-bit type identifiers
- put the identifiers in the IDL (may need augmenting IDL to allow that)
#1 makes sense to me; #3 also makes sense, as that’s the place where people will have to look to learn about your types.
#2, I think, is up for discussion. These could be longer, Java-like strings “com.example.Foo”, or whatever.
I think some people also may disagree with the argument that putting type identifiers inside the payload makes messages too large, but I don’t have enough experience on that to make a judgment.

by colek42

0 subcomment

DSSE is great for this, if you need more schema use in-toto

by logicallee

1 subcomments

along the same lines, did you know that you can get an authenticated email that the listed sender never sent to you? If the third party can get a server to send it to themselves (for example Google forms will send them an email with the contents that they want) they can then forward it to you while spoofing the from: field as Google.com in this example, and it will appear in your inbox from the "sender" (Google.com) and appear as fully authenticated - even though Google never actually sent you that.
This is another example where you would think that "who it's for" is something the sender would sign but nope!

1 subcomments

by volume_tech

0 subcomment

[flagged]

by bengt_trustpay

0 subcomment

[flagged]