Very Fast WString /Object Hash Table (Comments)
#1
One thing I learned from this, for absolute speed, you can’t use normal WString comparison. Also, Equal is faster than Comparison if that’s all you need, which is the case with a hash table. Comparison requires two comparisons to determine if it’s less than or greater than. For absolute speed, you can’t use; If A$ = B$ Then.
In the Equal callback, testing the lengths first immediately eliminates some values. Passing a pointer and the length to the callback eliminates that overhead in the callback.
I could get this faster, about 0.095 seconds to find 1,000,000 keys, by writing the code inline instead of using callbacks, but the complexity isn’t worth the benefit.

Code:
Very Fast WString /Object Hash Table
Reply
#2
Already on forum.powerbasic.com and in gbThreads?  Why repeat it here?
Reply
#3
(Today, 05:59 AM)Stuart McLachlan Wrote: Already on forum.powerbasic.com and in gbThreads?  Why repeat it here?

Because it isn’t on PB. This is an object hash table. It’s tweaked, extremely fast and simple to use. This is entirely new code, the result of much experimentation and testing.
It was a personal goal to build a hash table that can find 1,000,000 keys in less than a 10th of a second. This will almost do that. I did build one that will do that, but this is a lot simpler and only slightly slower.
It’s traditional that if you build a better version, you have an update.
Reply
#4
Hi Stanley,

The following code

Code:
Class Method LongFromObj(o As IUnknown) As Long
  If IsObject(o) Then
     o.AddRef()
     Method = Peek(Long, VarPtr(o))
  End If
End Method

wouldn't it be better to use ObjPtr?

Code:
Class Method LongFromObj(o As IUnknown) As Long
    If IsObject(o) Then
        o.AddRef()
       Method = ObjPtr(o) 'Peek(Long, VarPtr(o))
    End If
End Method
Reply
#5
There are no other options for PB users. It will find 10,000,000 random keys in 1.936 seconds.
Asking Google AI: “How long will it take the best hash tables to find 10,000,000 wide string keys” = one second
“This is a very rough estimate …”
Based on “rough estimate” about a second slower than the very best on 10,000,000 keys. Almost world class code.
It’s worthy of an update.

(10 hours ago)Steven Pringels Wrote: Hi Stanley,

The following code

Code:
Class Method LongFromObj(o As IUnknown) As Long
  If IsObject(o) Then
     o.AddRef()
     Method = Peek(Long, VarPtr(o))
  End If
End Method

wouldn't it be better to use ObjPtr?

Code:
Class Method LongFromObj(o As IUnknown) As Long
    If IsObject(o) Then
        o.AddRef()
       Method = ObjPtr(o) 'Peek(Long, VarPtr(o))
    End If
End Method

Probably. Will require some testing. It might make a big difference. This is fast of finding key but will be dramatically slower getting the object due to overhead.

Didn’t work, it crashed. Not sure why, but it could be that the reference count must be increased before calling that, it may not increase the count. In which case, it might only be slightly faster.
I’m afraid to mess with it.
Reply
#6
Asked Google AI: “How long will it take the best hash tables to find 1,000,000 string keys”
“A C++ benchmark showed highly optimized implementations like ClickHouse's HashMap performing 1 million lookups in about 200 milliseconds, significantly outperforming a standard library implementation.”
Asked Google AI: “how fast is ClickHouse HashMap on 1,000,000 strings”
“For one million strings, ClickHouse's highly optimized hash table can operate in as little as 0.201 seconds, making it extremely fast for aggregation and lookup tasks.”
“In a 2023 benchmark, ClickHouse's custom hash map implementation was faster than several well-known C++ alternatives: Google DenseMap: 0.261s, Abseil HashMap: 0.307s, std::unordered_map: 0.466s
 
This is faster than all those, 0.112 seconds, and I can get it down to 0.095 seconds with a lot of inline code.
This might be the world’s fastest hash table. I tried random keys 10 to 20 characters long, still faster, 0.127 seconds. And I can make it faster with inline code.
 
You can change the test count: Local testcount As Long : testcount = 1000000 'can change
You can change the string size: Function RandomString() As String, For i = 1 To Rnd(5, 8)

If i cheat and use this to build the keys:

Code:
    ReDim a(1 To testcount)
    For i = 1 To testcount
        'a(i) = RandomString()
        a(i) = Format$(i)
    Next i
instead of this: 
Code:
    ReDim a(1 To testcount)
    For i = 1 To testcount
        a(i) = RandomString()
        'a(i) = format$(i)
    Next i
then I can get it down to 0.035 seconds for 1,000,000 keys
Reply
#7
Phenomenal speed Stanley.

On my overclocked AMD Ryzen 9 3900X (4.25Ghz) which is quite old I get:

add 100,000 random key/value items
Time = 000.078
Count = 100,000 duplicate keys not allowed

find 100,000 random keys
Time = 000.010

Hi Stanley,

The reason why you have the crash is because of the ObjFromLong should not use ObjPtr because the object doesn't exist and therefore the pointer isn't allocated (null pointer).

This works

Code:
Class Method LongFromObj(o As IUnknown) As dword
            If IsObject(o) Then
                o.AddRef()
                Method = ObjPtr(o)
                'Method = Peek(Dword, VarPtr(o))
            End If
        End Method

        Class Method ObjFromLong(ByVal h As dword) As IUnknown
            Local o As IUnknown
            If h Then
                Poke Dword, VarPtr(o), h
                o.AddRef()
                Method = o
            End If
        End Method

1 million items gives me:

add 1,000,000 random key/value items
Time = 000.853
Count = 999,909 duplicate keys not allowed

find 1,000,000 random keys
Time = 000.172
Reply
#8
It works, but the reference count still has to be increased. It affects the add time. 
Can't tell the difference.
I made the change and will use it in the future but not worthy of an update.
Thanks.

Code:
        Class Method LongFromObj(o As IUnknown) As Long
            If IsObject(o) Then
                o.AddRef()
                'Method = Peek(Long, VarPtr(o))
                Method = ObjPtr(o)
            End If
        End Method
Reply


Forum Jump:


Users browsing this thread: José Roca, 2 Guest(s)