How to ensure the quality of the products you are building? This is a common challenge for any growing software company. A vital step towards this is testing. Ideally automated testing...
Challenges in manual testing
Testing can be done manually. However, this soon becomes challenging when code bases & team sizes grow.
The 1st challenge that arises is sufficient coverage. Testing is hard to do right. One cannot just test new code with a single input. You have to test with as many different inputs as possible to ensure that the feature works as intended in all possible scenarios.
A 2nd challenge lies in reproducibility. Often new features don’t just consist of new code - they also require modifying existing code. To ensure that all previous work still operates correctly, previously done tests should be done again. To do so manually is not only unrealistically time consuming, but also requires an unrealistic memory capacity. Who still remembers how they tested something a month into the future?
These challenges become exponentially worse when you realize that teams change over time. Team members understanding the product and code base leave & new members start with a blank slate. Who still knows how the software is supposed to function a year into the future?
Solution: automation?
As a software company, we go all in on automation! At its core, Continuous Integration is about programming and regularly executing a test suite that addresses the above concerns of broad coverage and reproducibility. This means tests can be executed within a fraction of the time that the manual process would take.
Let’s take in1 as an example. The in1 service is an HTTP server that accepts Sentinel-2 satellite data, applies a GPU-accelerated neural network to super-resolve it to a 4 times larger piece of data and returns this to the user. The service also consists of a database which is used to authenticate customers. We want to guarantee that over time these features keep functioning correctly:
• The server is accessible on https://api.in1.ai/v1/sentinel-2
• The server loads the super resolution neural network
• Sending correct data results in 4 times larger data being returned, sending wrong data fails gracefully
• The service is only usable with valid credentials. Invalid credentials also fail gracefully.
The Continuous Integration Pipeline
With a Continuous Integration pipeline, we can test the service for this behaviour every time new code is submitted. Such a pipeline typically consists of multiple phases:
• Setting up the environment: Installing dependencies, building the software, setting up and running docker containers
• Running linters or type checkers to do static analysis
• Run the test suite which consists of unit tests, integration and system tests
The 1st step doesn’t need any explanation. The 2nd step is done using tools that analyze the code for issues without executing it. Their main advantage is that they enforce code correctness without developers needing to program tests. Using mypy in combination with type annotations is roughly equivalent to basic tests with 100% coverage.
Keep in mind that linters and type checkers are too general and unreliable to exclusively rely on. The best tests you write yourself & execute in step 3. All of the requirements for the server mentioned above can be programmatically tested. For example, we can attempt to send Sentinel-2 to the URL without credentials and check that this correctly returns a 403 status code.
Other benefits of CI
CI brings more to the table than just preventing regressions.
Living documentation. Developers can consult these to know what kind of data is valid to be passed around. As it is enforced by mypy it doesn’t go out of date. The unit & integration tests document how the system really behaves. This makes it easier for developers to keep adhering to this behavior.
Greater changes with greater confidence. It allows developers to make greater changes with more confidence . They can modify the internals as much as they want, as long as all tests keep passing. Passing tests means the system behaves the same way as before, regardless of whether it is differently implemented or not.
From CI to CD. CI allows Continuous Delivery as a next step. If you know your software works well, why not simply deploy it automatically? Perhaps an interesting topic to discuss in another blog.
Automating challenges in an imperfect world
Hardware limitations mean tests cannot always be executed in environments exactly like the production environments. In the case of the example, typical CI servers don’t have GPUs available & are limited in memory capacity. This means that the example’s neural network cannot be tested. Instead, we use a smaller network than the real one. In other cases, we test with smaller data samples.
Writing tests is also a skill on its own. A proper test suite is also extensive. Writing it requires time & diligence. The CI pipeline is only as good as the tests it contains. To address this, testing is part of the code contribution process inside Sobolt. New features need to be accompanied by tests checking the new feature for correctness. This way, the test suite can grow organically.
2 april 2025 Schrijf in voor al weer de twaalfde editie van ons jaarlijkse congres met wederom een ijzersterke sprekers line-up. Op deze editie behandelen wij belangrijke thema’s als Moderne Cloud Data Architecturen, Datawarehouse Design met Ge...
3 april 2025 (halve dag)Praktische workshop met Alec Sharp [Halve dag] Deze workshop door Alec Sharp introduceert conceptmodellering vanuit een non-technisch perspectief. Alec geeft tips en richtlijnen voor de analist, en verkent datamodellering op c...
7 t/m 9 april 2025Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare richt...
10, 11 en 14 april 2025Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikba...
20 en 21 mei 2025 Deze workshop behandelt de implementatie van Knowledge Graphs en Large Language Models binnen organisaties en biedt een uitgebreid raamwerk waarin geavanceerde technieken worden gecombineerd met praktijkcases en oefeningen. Het vo...
22 mei 2025 Workshop met BPM-specialist Christian Gijsels over AI-Gedreven Business Analyse met ChatGPT. Kunstmatige Intelligentie, ongetwijfeld een van de meest baanbrekende technologieën tot nu toe, opent nieuwe deuren voor analisten met innovatie...
2 t/m 4 juni 2025 De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot en me...
Alleen als In-house beschikbaarWorkshop met BPM-specialist Christian Gijsels over business analyse, modelleren en simuleren met de nieuwste release van Sparx Systems' Enterprise Architect, versie 16.Intensieve cursus waarin de belangrijkste basisfunc...
Deel dit bericht