Jul 8 | 3 min read
Product Update #8
Catch up on recently shipped code, and what we have coming
Hello community,
Welcome to another bi-weekly digest of what’s going on at Opstrace.
We posted a couple of articles to our blog recently. The first one walks through the new unified alerts UI, thanks to Grafana 8. Previously, you had to separately configure several different services independently, so this is a refreshing change.
We consider our docs part of our product, so we wanted to keep them all in the product repository alongside our product code for easy discovery and editing. So we wrote (and open sourced) some code to help us do that. Check out our post to read about the challenges we faced in making them work both in GitHub and on our website, and how we solved them. And if you’re using NextJS, feel free to try out our code to do this for yourself.
On the product front, we are working on a new capability that we currently call
"custom DNS name". This will be your friend when your goal is to reach your
Opstrace instance using DNS infrastructure managed entirely by you, under a
custom DNS name that does not involve our shared DNS infrastructure for
*.opstrace.io. This feature work goes hand-in-hand with first-classing custom
Auth0 integration. While
quite a bit of code landed in
our main development branch, there is still quite a bit of testing and
documentation work to be done -- stay tuned for further updates.
We upgraded Loki to the latest and greatest, which is 174 commits ahead of
v2.2.1 release You can
use opstrace upgrade to roll out this version as with any other.
As with any product, there are some usability issues that crop up. We recently fixed one wherein the currently visible page in the UI was not maintained after switching a tenant. For systems with many tenants, this proved to be toilsome. So we fixed it.
Scale testing continues. We are still focused on testing metrics ingestion in Cortex, expanding the breadth of scenarios we test (for example, high cardinality metrics). We found several series caps and bandwidth limits that could be made configurable as needed, and we configured some reasonable default limits in Cortex that were found to greatly improve overall stability when the system was under high load:
- https://github.com/opstrace/opstrace/issues/964
- https://github.com/opstrace/opstrace/issues/965
- https://github.com/opstrace/opstrace/pull/986
We also did significant work in the Looker code base. Looker is our in-house load-testing tool for Loki and Cortex. It can be used for benchmarking as well as for black-box storage testing (with strict read validation). We are also working to replace Avalanche (synthetic workload generation) with Looker for Cortex metrics.
And, finally, we are introducing, stabilizing, and extending our browser-based testing (using Playwright). We now regularly test against all the major browsers on both desktop and mobile. Failed tests save artifacts—screenshots, video, and tracing—to improve debuggability. Hopefully, this makes life easier for anyone who wants to contribute to Opstrace.
We hope these updates are useful to you, and we welcome your feedback—just reply here and send us a message at hello@opstrace.com.
Cheers,
The Opstrace Team