-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
[WIP/RFC] broaden the scope of hash
#37964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I think we should have a dedicated mixing function for this. Am I right that mixing hash values can typically be much faster than computing hash values? E.g. in our case the mixing function would not need to know about |
hash
hash
Yes good catch. For example, for a random julia> @btime hash($a, $(Ref(h))[])
4.299 ns (0 allocations: 0 bytes)
0x04b5a55ce4d88c58
julia> @btime $a ⊻ $(Ref(h))[]
1.749 ns (0 allocations: 0 bytes)
0x5c3b8e8960ef988b There is an example in the PR where I kept the two versions (the |
I would be happy to see this merged. My motivation: https://discourse.julialang.org/t/dictionary-with-custom-hash-function/49168 |
You generally want to do something asymmetrical in the arguments. Something like The criteria for a constant integer multiplication being expressible as an |
Currently
hash(x, h::UInt)::UInt
is used for hashing values (x
here) to beused as keys in a hash-container. The
h::UInt
parameter is used to combinehashes of subobjects in order to compute the hash of a parent object.
This PR suggests to extend the scope of
h
by having its generic signature behash(x, h::H)::H
, whereH
is any type, with accompanying methods:hashinit(::Type{H}) = H()
(e.g.hashinit(UInt) == UInt(0)
)hashdigest(h::H)::Integer
gives back the result of the hashing processwhen it's finished (e.g.
hashdigest(h::UInt) = h
)It doesn't make writing
hash
methods much more complicated: basically, just don't considerh
to be an integer, i.e. you can't directly xor its value; this mean replacing things likeh = hash(x, xor(h, 0x1234567))
by...; h = hash(x, hash(0x1234567, h)); ...
.Motivation 1: cryptography in non-container context
Once in a while, I would like to get a cryptographic hash of some objects, to
compare them, uniquify them or whatnot, without having to compare them all
against each other. An ad-hoc
cryptohash
function could be written, but it would begreat to be able to re-use the pre-existing implementations (although not all
of them are compatible with cryptographic requirements, far from it).
One concrete example is testing the reproducibility of an RNG. Instead of
having in your tests
@test rand(MyRng(0), 100) = [... long list of numbers ...]
for different seeds, you could do@test hashdigest(hash(rand(MyRng(0), 1000), SHA1Hash())) = 0xfe9160330ac0a5c265517b6831a92414c6ec889f
Here is a little implementation of this idea (of course this would live in a package),
working atop this PR.
Motivation 2: provide alternate hash/isequal couples to be used by
Dict
-like containersI can think of three examples:
use faster hashing in some contexts; for example, in
Dict
,hash(x) == hash(y)
impliesisequal(x, y)
,and as all our numbers hash the same if they represent the same value,
hash
has to go its wayto enforce this invariant. This can have some performance implications
use different meaning of equality in
Dict
, e.g.===
(would behave likeIdDict
), or(x, y) -> typeof(x) == typeof(y) && isequal(x, y)
.related to 2), use a work-around hash when default hash fails, e.g.
As a proof of concept, this PR includes a small modification of
Dict
toaccomodate this:
Dict{K,V}
is replaced byHashDict{K,V,H}
whereH
is a type of hash,and
Dict{K,V}
is an alias forHashDict{K,V,UInt}
(given how small this patch is,I would find value in having this in
Base
rather than in a package, which would involve alot of duplication; but if this is taken seriously, this should move into another PR...)
To illustrate 1), we define a
Context{T}
hash type which can hash onlysubtypes of
T
. We define correspondinghash
forBigInt
,UnitRange
andSet
to show how this can compose.
Demo:
As a last example, here how to implement an
IdDict
clone viaHashDict
: