Ian’s April 2024 Meeting Summary

  • OpenTools API Demystified – Glenn Dufke
  • TMS WebCore with Stellards.io – Conrad Vermeulen
  • Attempting the “One Billion Row Challenge” challenge with my datasets – Ian Hamilton

Ian’s Summary

Welcome and News

There was a pre-meeting discussion on AI, for those on-line before the official start. There was then the usual intro from Jason. There was a call for flecks, as it looked like there would be time for a few today. We are short of talks for next month, as usual.

Discussion covered meeting in London, who might be going to the conference in Amsterdam, Skia components and Zoom AI for notes again. A gofundme page for Glenn was suggested, so that his components might become available sooner and also for Conrad. Gustavo then talked about creating pascal components to generate html, in conjunction with htmx. During the break, talk continued on forms, UI and UX, and how web is different from Windows.

OpenTools API Demystified – Glenn Dufke

The first scheduled talk was by Glenn Dufke on the Open Tools API (OTAPI) for RADStudio.

The aim was to show how to extend the IDE functionality by using the OTAPI. The current documentation is not up to date, but Glenn has some new documents to publish that will change that (Watch this space: https://github.com/code-kungfu/Delphi-Definitive-OTAPI-SDK). Tools can be compiled as either BPLs or DLLs, with the usual pros and cons.

Glenn then showed a demo of changing the IDE light and dark styles. It registers a theme notifier, with an event listener. This is all interfaced based and dates back to Delphi 3, BorlandIDEServices acting as a kind of interface registry The demo theme notifier inherited from interfaced object, IOTANotifier and INTAIDEThemingServicesNotifier. IDEThemes.Utils was introduced in Delphi 12. The event notifier has a few lines of code to tell the IDE to switch style.

There was second simple demo changing the form title bar. Glenn then showed a Delphi IDE explorer, installed on his Tools menu, which is open source and is managed by the same person who manages GExperts. The final demo was the use of a second instance of Delphi, opened from within the first instance, in order to debug the IDE extensions. The debug instance can see inside the call stack of the parent IDE instance.

TMS WebCore with Stellards.io – Conrad Vermeulen

The second talk was done by Conrad on the new Stellards.io from TMS.

We think it is Stellar DS, as in Data Store. It is a quick and easy Database As A Service (DBAAS), although still in private beta, presenting itself through a REST API, discoverable through Swagger. It is particularly easy to use from Delphi, with new components for Delphi and Webcore, including CloudStellar in the FNC CloudPack.

Conrad showed the web-based management console and then a demo app. The management console allows for application config, authorisation and authentication, including with OAuth2, user management, including roles and permissions, and finally exposing the service via the Swagger UI. It is a bit basic, but has everything necessary to get started.

The database is defined as a set of tables, but with limited data types. It does not currently seem possible to add secondary indexes to the tables. Queries seem to be limited at the moment, with limited joins and the whole thing was described as rather clunky, but it is early days yet.

As a REST API, it can be accessed by any kind of client application, but the Delphi and Webcore components make for very quick and easy set up of clients. It was noted that if developing a web client, that has to be hosted separately and is not a part of the Stellards process.

Attempting the “One Billion Row Challenge” challenge with my datasets – Ian Hamilton

The final talk was a change to the intended schedule and done by Ian Hamilton, about attempting the 1 billion row challenge with his datasets.  This challenge came to the group via Gus (a guest at the meeting and one of EMBT’s latest MVPs) https://github.com/gcarreno/1brc-ObjectPascal

[Ed: Ian wrote this synopsis and so I have heavily edited Ian’s criticism of himself, as it is not warranted as it was a great talk].

The challenge is to read 1 billion rows from a text file, where each row contains a name:value pair, then to build a list of distinct names and build totals of values, along with minima and maxima. The source generator and entry projects are hosted on the web. Ian showed both a single-threaded attempt and a multi-threaded application.

Starting with the single-threaded attempt, he gave the times it took to do the obvious, by declaring a text file and reading it with the Readln statement. He then compared the time taken using a FileStream with a StreamReader, which only took about a quarter of the time to read the file. The same for parsing the line, with the standard Pos and Copy methods taking twice as long as using the string as an array of char.

On to the dataset part of the process, where the dataset was indexed on the name field. There would be 1 billion searches, with some 41,000 inserts and the rest updates. In the single-threaded test, to read a line, parse the name and value, then search on the name and add the value, it managed around half a million rows a second, or 30 minutes for the whole file. As far as the challenge is concerned, it needs to come down to 30 seconds.

On the multi-threaded part, the process was constrained by the hardware. With only 16Gb of ram and a 16Gb file, it could not be loaded in one go, so would require some kind of jit loading mechanism.

The loader thread fills the buffer list with file blocks and then sleeps, waking periodically to top up the buffer. The worker threads remove file blocks from the buffer and process them. Inserts, which fortunately all occur at the start, require the whole list to be locked, but updates only require record locks. Searches are concurrent and do not require locks, but do wait for the list to be unlocked.

Some issues had been found that will need to be dealt with and the process seemed to have maxed out at 2.8 million rows a second, bringing the total time down to 6 minutes. Discussion on how to approach and handle the problem continued for half an hour after the presentation had finished.