SE Radio 556: Alex Boten on Open Telemetry : Software program Engineering Radio


Software program engineer Alex Boten, creator of Cloud Native Observability with Open Telemetry, joins host Robert Blumen for a dialog about software program telemetry and the OpenTelemetry undertaking. After a short evaluation of the subject and the OpenTelemetry undertaking’s origins rooted within the want for interoperability between telemetry sources and again ends, they focus on the OpenTelemetry server and its options, together with transforms, filtering, sampling, and fee limiting. They take into account a spread of subjects, beginning with different topologies with and with out the telemetry server, server pipelines, and scaling out the server, in addition to an in depth have a look at extension factors and extensions; authentication; adoption; and migration.

Transcript delivered to you by IEEE Software program journal. This transcript was robotically generated. To recommend enhancements within the textual content, please contact content [email protected] and embrace the episode quantity and URL.

Robert Blumen 00:00:16 For Software program Engineering Radio. That is Robert Bluman. At present I’ve with me Alex Boten. Alex is a senior workers software program engineer at LightStep. Previous to that, he was at Cisco. He’s contributed to open-source initiatives within the telemetry space, together with the OpenTelemetry undertaking. He’s the creator of the ebook, Cloud Native Observability with OpenTelemetry, and that would be the topic of our dialog as we speak. Alex, welcome to Software program Engineering Radio.

Alex Boten 00:00:50 Howdy. Thanks for having me. It’s nice to be right here.

Robert Blumen 00:00:52 Would you want so as to add something about your background that I didn’t point out?

Alex Boten 00:00:57 I feel you captured most of it. I’ve been contributing to OpenTelemetry for slightly bit over three years. I’ve labored on varied elements of the undertaking in addition to the specification, and I’m at the moment a maintainer on the OpenTelemetry Collector.

Robert Blumen 00:01:11 Nice. Now on Software program Engineering Radio, we have now lined various telemetry-related points, together with Logging in episode 220, Excessive Cardinality Monitoring, which was 429, Prometheus Distributed Tracing and episode 455, which was known as Software program Telemetry. So, listeners can positively take heed to a few of these in our again catalog to get extra basic data. We’ll be focusing extra on this dialog about what OpenTelemetry brings to the desk that we have now not already lined. Let’s begin out with, within the telemetry house, the place may you situate OpenTelemetry? What’s it much like? What’s it completely different? What downside does it resolve?

Alex Boten 00:02:02 That’s a terrific query. So, I feel the issue that OpenTelemetry goals to unravel — and we’ve already seen it occur within the business as we speak — is it modifications how utility builders instrument their utility, how telemetry is generated, and the way it’s collected, after which transmitted throughout techniques. And if I have been to think about what it’s much like the very first thing that involves thoughts are the initiatives that actually induced it to emerge, that are OpenCensus and OpenTracing, that are two different open-source initiatives that have been shaped slightly bit earlier. I feel it began in perhaps 2017, 2016, to supply a normal round producing distributed tracing. After which additionally OpenCensus additionally addressed slightly bit round metrics and log assortment.

Robert Blumen 00:02:50 What was occurring within the telemetry space previous to these initiatives that created the necessity for them, and what did they do?

Alex Boten 00:02:57 Yeah, so I feel, should you consider telemetry because the area in software program, it’s been round for a extremely very long time, proper? Like, folks as early because the earliest of laptop scientists needed to know what their computer systems have been doing. And earlier within the days of getting a single machine, it was pretty simple to print some log statements and have a look at what your machine was doing. However because the business grew, because the Web of Issues picked up, as techniques turned bigger and bigger to handle the rising demand, I feel techniques turned inherently extra complicated. And we’ve seen an evolution of what software program telemetry actually turned. So, should you consider earlier we have been capable of log knowledge on a single system. As folks needed to deploy a number of techniques, a necessity for centralized logging got here alongside so to combination and do combination searches on logs.

Alex Boten 00:03:54 And that turned actually pricey. After which we noticed a rise in of us eager to seize extra significant metrics from their techniques the place they may create dashboards and do queries, whereas it was cheaper than going via and analyzing log knowledge. And I feel the factor that I’ve seen occur within the final 20 years is each time there was a brand new perhaps paradigm round the kind of telemetry that techniques ought to emit, there was an opportunity for innovation to happen, which is nice to see, however should you’re an finish person who’s simply making an attempt to get telemetry out of a system, out of an utility, it’s a extremely irritating course of to need to go and reinstrument your code each few months or each few years, relying on what the flavour of the day is. And I feel what OpenCensus and OpenTracing and OpenTelemetry tried to seize is addressing the ache that customers have in terms of instrumenting their code.

Robert Blumen 00:04:49 What’s the relationship of OpenTelemetry to different techniques on the market, similar to Zipkin, Jaeger, Graylog, Prometheus?

Alex Boten 00:05:00 So the connection that OpenTelemetry has with the Zipkin, the Jaegers and the Prometheus of the world is admittedly round offering interoperability between these techniques. So, an utility developer would instrument their code utilizing OpenTelemetry, after which they will emit that telemetry knowledge to no matter backend techniques they need. So, should you needed to proceed utilizing Jaeger, you might positively do this with an utility that’s instrumented with OpenTelemetry. The opposite factor that OpenTelemetry tries to do is it tries to supply a translation layer so that people which can be perhaps as we speak emitting knowledge to Zipkin or to Jaeger or to Prometheus can deploy a collector inside their environments after which translate the info from a selected format of these different techniques into the OpenTelemetry format, in order that they will then emit the info to no matter backend they select by merely updating the configuration on their Collector with out having to return to their purposes who could also be legacy techniques that no one needs to switch anymore and nonetheless be capable to ship their knowledge to completely different locations.

Robert Blumen 00:06:06 Is OpenTelemetry then an interoperability normal, a system, or each?

Alex Boten 00:06:13 It’s actually the usual to instrument your purposes and to supply the interoperability between the completely different techniques. OpenTelemetry doesn’t provide a backend; there’s no log database or metrics database that OpenTelemetry supplies. Possibly in some unspecified time in the future sooner or later that that can occur. We’re actually seeing folks which can be supporting the OpenTelemetry format beginning to present these backend choices for people which can be emitting solely OpenTelemetry knowledge. However that’s not one thing the undertaking is enthusiastic about fixing at this level. It’s actually concerning the instrumentation piece and the gathering and transmission of the info.

Robert Blumen 00:06:52 In studying about this, I got here throughout dialogue of a protocol known as OTLP. Are you able to clarify what that’s?

Alex Boten 00:07:00 So the OpenTelemetry protocol is a protocol that’s generated from protobuf definitions. Each implementation of OpenTelemetry helps its intention is to supply excessive efficiency knowledge transmission in a format that’s standardized throughout all of the implementations. It’s additionally supported by the OpenTelemetry Collector. And what it actually means is, so this format helps all of the completely different indicators that OpenTelemetry helps. So, log traces, metrics, and perhaps down the highway, occasions and profiling, which is at the moment being developed within the undertaking. And the thought is should you help the OpenTelemetry protocol, that is the protocol that you’d use to both transmit the info, or should you’re a vendor or should you’re a backend supplier, you’ll use that protocol to obtain the info. And it’s really been actually good to see even initiatives like Prometheus beginning to help the OTLP protocol for transmitting knowledge.

Robert Blumen 00:07:56 So, let me summarize what we have now to this point, and you may inform me if I’ve understood. I’m constructing an utility, I may instrument it in a method that’s suitable with this normal. I won’t even know the place my logs or metrics are going to finish up. After which whoever makes use of my system, which can be folks in the identical group or perhaps I’m delivery an open-source undertaking, which has many customers — they will then plug of their backend of alternative, and they aren’t essentially tied to any selections I made about how I feel the telemetry might be collected. It creates the power of customers to plug and play between the purposes and the backends. Is that roughly appropriate?

Alex Boten 00:08:42 Yeah, that’s precisely proper. I feel it actually decouples the instrumentation piece, which traditionally has been the costliest side of organizations gaining observability inside of their techniques, from the choice of the place am I going to ship that knowledge. And the good factor about that is that it actually frees the top customers from the thought of vendor lock-in, which I feel a whole lot of us who’ve labored in in techniques for a very long time at all times discovered it to be troublesome. The dialog of making an attempt to perhaps check out a brand new vendor should you needed to check some new characteristic that you simply needed to have or no matter, normally would imply that you would need to return and re-instrument your code. Whereas now with OpenTelemetry, if in case you have instrumented your utility, hopefully that is the final time it’s important to fear about instrumenting your utility as a result of you may simply level that knowledge to completely different backends.

Robert Blumen 00:09:34 A short time in the past you probably did point out the Collector, and we might be spending a while on that, however I need to perceive what are the attainable configurations of the system. What I feel we’re speaking about now’s if the code is instrumented with the OpenTelemetry normal, that it may speak on to backends. The opposite possibility being you’ve a Collector in between them. Are these the 2 fundamental configurations?

Alex Boten 00:10:02 Yeah, that’s proper. It’s additionally attainable to configure your instrumented utility to ship knowledge to backends immediately: should you needed to decide on to ship the info to Jaeger, I feel most implementations that help OpenTelemetry formally have a Jaeger exporter, for instance. So there are alternatives should you needed to ship knowledge out of your utility to your backend, however ideally you’ll ship that knowledge in a protocol which you could then configure utilizing an OpenTelemetry Collector later down the road.

Robert Blumen 00:10:31 Let’s come again to Collector in a bit, however I need to speak about instrumentation. Typically if I need to speak to a sure backend, I would like to make use of their library to emit the telemetry. How does that change with OpenTelemetry?

Alex Boten 00:10:49 Yeah, so with the OpenTelemetry normal, you’ve two points of the instrumentation. So, there’s the OpenTelemetry API, which is admittedly what most builders would work together with. There’s a really restricted quantity of floor space that the API covers. For instance, for tracing the APIs, primarily you can begin a span and you may end a span and get a tracer. That’s roughly the floor space that’s making an attempt to be lined there. And the thought we needed to push ahead with, with our restricted API, is to simply cut back the cognitive load that customers must tackle to undertake OpenTelemetry. The opposite piece of the instrumentation that people must work together with is the SDK, which actually permits finish customers to configure how the telemetry is produced and the place it’s despatched to. If you happen to’re excited about this within the context of how is it completely different from specific backend and its instrumentation, the, the distinction is what OpenTelemetry you’ll solely ever use the OpenTelemetry API and configure the SDK to ship knowledge to the backend of alternative.

Alex Boten 00:11:55 However the API that you’d use for instrumenting the code wouldn’t be any completely different relying on which backend you ship it to. And there’s that clear separation between the API and the SDK that lets you actually solely instrument with that minimal interface and fear concerning the particulars of how and the place that knowledge is shipped utilizing the SDK configuration, which in my ebook I seek advice from as telemetry pipelines.

Robert Blumen 00:12:17 In that dialogue you talked about tracing, I’ve seen a whole lot of logging techniques, you may log no matter you need after which it places the burden on a Collector to choose up the logs and format them. After which metrics, you will have to make use of a library. If I’m adopting OpenTelemetry, how does it deal with logs and metrics?

Alex Boten 00:12:40 Yeah, so for metrics, there may be an API that calls out particular devices. So OpenTelemetry has an inventory of, I consider it’s six devices at the moment that it helps to roughly have the identical performance as just like the library. And I feel a whole lot of these devices have been developed in collaboration with each the open metrics and the Prometheus communities to make sure that we’re suitable with these of us. So, for the logging library, that’s slightly bit completely different in OpenTelemetry — or at the least it was on the time of writing my ebook, which was written in 2021, largely. The thought behind logging and OpenTelemetry was, we already have been conscious there have been so many alternative APIs for logging in every language. Every language has like a dozen logging APIs and we didn’t essentially need to create a brand new logging API that folks must undertake. And so, the thought was to essentially hook into these present APIs. It’s been an attention-grabbing transition although. I feel prior to now, perhaps prior to now six or eight months or so, there’s been nearly an ask for an API and an SDK within the logging sign as properly. That’s nonetheless at the moment in improvement. So, keep tuned for what’s going to occur there.

Robert Blumen 00:13:51 In what languages are the OpenTelemetry SDKs out there?

Alex Boten 00:13:57 Yeah, so there may be at the moment 11 formally supported languages. I’m most likely going to overlook a few of them, however there’s positively one in C++, in Go, in Rust, in Python, Ruby, PHP, Java, JavaScript, all these languages are lined formally by OpenTelemetry. And what this implies is that the implementations have been reviewed by somebody on the technical committee, and the implementations themselves reside inside the OpenTelemetry group in GitHub and has the identical course of. We’ve got maintainers and approvers for every a kind of languages. There’s a few further implementations that aren’t formally supported but, however that’s actually simply because there hasn’t been sufficient contributors to it but. So, I feel there’s one in Lua and perhaps Julia is the opposite one?

Robert Blumen 00:14:46 I’ve discovered when instrumenting code up and spend a whole lot of time doing issues like writing a message {that a} sure technique has been known as, and listed below are the parameters — very boilerplate steps. I perceive that OpenTelemetry can to some extent automate that? How does that work?

Alex Boten 00:15:08 Yeah, so there may be — one of many very first OTEPs (the OpenTelemetry Enhancement Proposals) that was created within the early phases of the undertaking was to assist to help auto instrumentation out of the field. So, the hassle of auto instrumentation in several languages is at completely different phases. So, I do know the Java and the Python auto instrumentation efforts are slightly bit additional alongside. I feel .NET is coming alongside properly, and I feel JavaScript is, as properly. However the thought behind auto instrumentation with OpenTelemetry particularly is similar to what we’ve seen in different efforts earlier than the place it actually ties instrumentation to present third social gathering open-source library or third social gathering libraries. Proper? And the thought being, for instance, should you’re utilizing the Python SDK — I’m utilizing that for example as a result of I spent a good period of time writing some code there.

Alex Boten 00:16:02 If you happen to’re utilizing the Python SDK and also you needed to make use of, for instance, the Python Redis library, properly you might use the instrumentation library that’s supplied by OpenTelemetry, which lets you name to this library, which monkey patches the Redis library that it then makes a name to. However, in that intermediate step, it acts as a center layer that devices the calls to the library that you’d be making. So, should you have been calling Konnect, for instance, it might name Konnect on the instrumentation library begin span, perhaps document some form of metric concerning the operation, make the decision to the Redis library, after which on the return it might finish the span and produce some telemetry there with some semantic conference attributes.

Robert Blumen 00:16:49 Clarify the time period monkey patching.

Alex Boten 00:16:52 So monkey patching is when a library intercepts a name and replaces a name with itself as a substitute of the unique name. So, within the case of the Redis instance I used to be utilizing, the Redis instrumentation library intercepts the decision to hook up with Redis, after which it replaces it with its personal join name, which does the instrumentation, as properly.

Robert Blumen 00:17:17 This I may see being very helpful in that should you’ve acquired a library and one thing’s going fallacious within the library, I don’t know the place, then the earlier possibility has been that I must get the supply code of the library, and if I need logging, I must go and insert log statements or insert metrics or no matter kind of telemetry I’m making an attempt to seize into another person’s supply code and rebuild it. So, does this allow you to get visibility of what’s occurring inside third-party libraries that you simply’ve downloaded along with your bundle supervisor and also you’re not enthusiastic about modifying the code?

Alex Boten 00:17:57 Proper. I feel that’s a key advantage of it’s that you simply’re lastly capable of see what these libraries are doing, or perhaps you’re not acquainted with the code otherwise you’re probably not positive of the trail via the code and also you’re capable of see the entire library calls which can be instrumented on beneath the unique name of your utility, which a whole lot of the time you’ll discover issues there, but it surely’s actually onerous to determine them since you don’t essentially know what’s occurring with out studying the supply code beneath in any respect.

Robert Blumen 00:18:24 I’ve used a few of these languages within the 11. I’m conscious that each language is completely different so far as what entry it offers you to intercept issues at runtime or perhaps generate byte code and inject it into the library. I might suppose that the power to do that goes to vary significantly based mostly on the language, and perhaps C++ being moderately unfriendly to that. Do you anticipate to attain a parity with all of the languages within the extent which you could provide this characteristic? Or will it at all times work higher on some than others?

Alex Boten 00:19:02 That’s a terrific query. I feel, ideally, I think about that instrumentation libraries are a short lived repair. I actually consider that what everyone’s hoping for inside the neighborhood, and we’ve seen some Open Supply initiatives already attain out and begin instrumenting their purposes. We’re actually hoping that these libraries will in use the OpenTelemetry API to instrument themselves and take away the necessity for these instrumentation libraries altogether. For instance, if an HTTP server framework have been to instrument its calls to its endpoints utilizing OpenTelemetry, the top person wouldn’t even want this instrumentation library. And we may obtain parity throughout all of the languages as a result of every a kind of libraries would simply use the usual moderately than counting on both byte code manipulation or monkey patching, which it really works for what it’s, but it surely’s not at all times the best possibility.

Alex Boten 00:20:01 With monkey patching, perhaps the underlying libraries name modifications parameters, and it’s important to maintain monitor of these modifications inside these instrumentation libraries. And in order that, that at all times poses a problem. However ideally, like I mentioned, these libraries would, will go away because the undertaking continues to achieve traction throughout the business. And we’ve already seen, I feel there was a couple of Python open-source initiatives that reached out. I do know the Spring of us in Java had a undertaking to instrument utilizing OpenTelemetry. Envoy and some different proxies have additionally began utilizing OpenTelemetry. So it’s positively, I feel in some magician lab we’re nice for the quick time period, however in the long run it might be splendid if issues have been instrumented themselves.

Robert Blumen 00:20:45 That might be nice. However there are at all times going to be some older libraries that perhaps not below as energetic improvement the place there’s probably not anybody round to switch them. Then you definitely at all times have this to fall again on in these instances. I wouldn’t see it’s going away.

Alex Boten 00:21:02 Proper. Ideally it might, the norm would grow to be instrument your libraries with OpenTelemetry, and for these libraries that aren’t being modified and completely proceed to make use of the mechanisms that we have now in place as we speak.

Robert Blumen 00:21:16 Now I feel it’s the time to start out speaking concerning the Collector. We’ve talked concerning the supply and the way this knowledge will get revealed. A short time in the past we talked about you may ship immediately knowledge from a writer to a backend or you may have a Collector in between. What’s the Collector, what does it do, why would possibly I need one?

Alex Boten 00:21:36 Yeah, so the Collector is a separate course of that might be operating inside your atmosphere. It’s a binary that’s revealed as a separate binary, or docker picture should you’re enthusiastic about that. There’s additionally packages for, I feel, Debian and RedHat. And the Collector can be a vacation spot to your telemetry that may then act as a router. So, it has a collection of, I consider it’s over 100 receivers, which help completely different codecs and likewise can scrape metric knowledge from completely different techniques. And it has exporters, and once more, I lose monitor of it, however I feel it’s over 100 codecs of exporters that the OpenTelemetry Collector helps. So you may ship knowledge to it in a single format and export it utilizing a distinct format should you’re so eager on. You can even use processors inside the Collector, which let you manipulate the info, whether or not it’s for issues like redacting, perhaps PII that you simply may need, or should you needed to counterpoint the info with some further attributes — perhaps about your environments that solely the Collector would find out about.

Alex Boten 00:22:44 And that’s the Collector in a nutshell. It’s out there to deploy, as I mentioned, as a picture or as a bundle. There’s additionally, you may deploy utilizing Helm charts. You’ll be able to deploy utilizing the OpenTelemetry operator should you’re utilizing a Kubernetes atmosphere.

Robert Blumen 00:22:59 I’m going to delve into a few of these inner elements. I need to speak first slightly bit concerning the networking. It may be less complicated if I’ve N sources and variety of Ok backends, as a substitute of an N cross Ok topology, an N cross 1 and 1 cross Ok. Do you’ve any ideas on, is {that a} motivator to simplify your networking and every little thing that goes together with that? Is {that a} motivator for adopting a Collector?

Alex Boten 00:23:30 Yeah, I feel so. I feel the Collector makes it very interesting for a wide range of causes. One being that your egress out of your community could solely be coming from one level. So, from a safety auditing form of perspective, you may see the place all the info is admittedly going out moderately than having a bunch of various endpoints that need to be linked to some exterior techniques. I feel from that time alone, it’s positively price deploying a Collector inside a community. I feel there may be additionally the power to throttle the info that’s going out is essential. You probably have N endpoints which can be sending knowledge, it’s actually troublesome to throttle how a lot knowledge is definitely leaving your community, which may find yourself being pricey. So, should you needed to do issues like sampling, you’ll most likely need to have a Collector in place, in order that you might actually alter it as wanted.

Robert Blumen 00:24:22 How a lot telemetry can one occasion of Collector deal with?

Alex Boten 00:24:30 Yeah, I imply I feel that at all times is dependent upon the scale of the occasion that you simply’re operating. They’re on the OpenTelemetry Collector repository. There’s a fairly complete benchmarks which were run in opposition to the Collector for each traces and logs and metrics. And I consider the occasion sizes that have been used, if reminiscence serves proper, they have been utilizing ECE2 for the testing for the benchmarks. And I consider that’s all listed on the web site there. For folk which can be enthusiastic about discovering out.

Robert Blumen 00:25:01 If I needed to both run extra workload than what I may put via one occasion or for high-availability causes, have a clustered implementation with a a number of Collectors, is it attainable say to place a load balancer in entrance of it and distribute it? Or what are the choices for a extra clustered implementation?

Alex Boten 00:25:24 Yeah, so the best way you’ll need to most likely deploy that is: you’ll need to use some form of load balancer relying on the, the telemetry you’re sending out, you might need to use like a routing processor that lets you be extra particular as to which knowledge every one of many Collectors might be receiving. So for instance, should you had, perhaps a bunch of Collectors which can be deployed which can be nearer to your purposes, that might then be routed via perhaps a Collector as a gateway and also you needed to ship solely a sure variety of traces to the Collector as a gateway, you might fork it utilizing the routing processor based mostly on the hint IDs or one thing like that, should you needed to.

Robert Blumen 00:26:06 So, with stateless servers you may arrange a reasonably dumb load balancer and each request would get routed primarily to a random occasion. Is there any causes I’ve a bit extra of a sharding or pinning of sure workloads in a clustered implementation?

Alex Boten 00:26:27 I feel a few of this is dependent upon what you’re doing with the Collectors. So for instance, should you’re doing sampling on traces, you wouldn’t need your sampling resolution being made throughout, like there’s, there’s no option to share that sampling resolution throughout Collectors. And so, you’ll need to have the ability to make that call on the identical occasion of the Collector, for instance. And so you’ll really need the entire knowledge for a selected hint to go to the identical Collector to have the ability to make the choice on the pattern.

Robert Blumen 00:26:56 You utilize the phrase gateway, which is a standard phrase, however I perceive it means one thing particular in OpenTelemetry the place you’ve a gateway mannequin and an agent mannequin. Clarify these two fashions, the distinction between them.

Alex Boten 00:27:11 Yeah, so within the agent deployment for the OpenTelemetry Collector, you’ll be operating your OpenTelemetry Collector on the identical host or the identical node, perhaps as a part of a demon set in Kubernetes. So, you’ll have a separate occasion of the Collector for every one of many nodes which can be operating inside your atmosphere. And you’ll have your utility sending knowledge to the native agent earlier than it might then ship it as much as wherever your vacation spot is. Within the gateway deployment mannequin, you’ll have the Collector act as a standalone utility, and it might have its personal deployment. Possibly you’ll have one per knowledge heart or perhaps one per area. And that might act as perhaps the egress out of your community. And that’s form of the gateway deployment.

Robert Blumen 00:28:02 What you described as an agent mannequin that sounds similar to me of what I’ve seen known as sidecar with another companies. Is agent the identical as a sidecar?

Alex Boten 00:28:14 Sure and no. It may be like a sidecar, I feel once I consider a sidecar as, I might assume that it might be connected to each utility that’s operating with a sidecar alongside it, which might imply that you simply would possibly find yourself with a number of situations of the Collector operating on the identical node, for instance, which can be essential in particular instances, or it might not be, it actually is dependent upon your use case, whether or not or not there’s accessibility out of your utility to the host in any respect. That is dependent upon what your insurance policies are, how your insurance policies are confined or outlined. So, it might be the identical because the sidecar, but it surely doesn’t essentially need to be.

Robert Blumen 00:28:52 Delving extra into the internals of the Collector and what you are able to do, you talked about processors and exporters — and also you’ve lined a few of this earlier than, however why don’t you begin with what are among the main forms of processors that you simply would possibly need to use?

Alex Boten 00:29:11 Yeah, so I feel that the 2 really useful processors by the neighborhood are the, the batch processor, which tries to take your knowledge and batch it moderately than sending it each time there’s telemetry coming in. That is making an attempt to optimize among the compression and cut back the quantity of information that will get despatched out. In order that’s one of many really useful processor. The opposite one is the reminiscence restrict processor, which limits form of the higher sure of reminiscence that you’d enable a Collector to make use of. So you’ll most likely need to use that within the case of you’ve a selected occasion of some kind with some form of reminiscence outlined, you’ll need to configure your reminiscence restrict processor to be under that threshold in order that when the Collector hits that reminiscence restrict, it may well begin returning error messages to all of its receivers in order that perhaps the senders of the info can go forward and again off on the quantity of information that’s being despatched or one thing like that.

Alex Boten 00:30:02 One of many different processors that’s actually attention-grabbing to many of us is the rework processor, which let you use the OpenTelemetry Transformation Language to switch knowledge. So, perhaps you need to strip some specific attributes, or perhaps you need to change some values inside your telemetry knowledge and you are able to do that with the rework processor, which remains to be at the moment below improvement. However I feel it early days within the processor there was a whole lot of pleasure round what might be finished with processors. And so, folks began growing filtering processors and attribute processor for metrics and all these different form of processors that made it slightly bit difficult to know which processors of us ought to be utilizing as a result of there’s so lots of them. And typically, one could help one sign however not the opposite, whereas the rework processor actually tries to perhaps unify this and to a single processor like that can be utilized to do all of that.

Robert Blumen 00:30:55 You mentioned there’s a whole lot of pleasure round this characteristic. What was it that folks discovered so thrilling about it?

Alex Boten 00:31:01 Yeah, I feel from the maintainer and contributor standpoint, I feel we have been wanting ahead to deprecating among the different processors that might be mixed inside a single one. It reduces the, once more, I feel it reduces the cognitive load that folks need to take care of when ramping up on OpenTelemetry. I feel understanding that if you wish to modify your telemetry, all it’s important to do is use this one processor and, be taught the language that you’d want to remodel the info versus going via and looking out the repository for 5 or 6 completely different processors. I feel that’s typically nice to consolidate that slightly bit.

Robert Blumen 00:31:39 Inform me extra concerning the language that’s used to do these transforms.

Alex Boten 00:31:43 Yeah, so the OpenTelemetry language for people which can be enthusiastic about discovering the complete definition is it’s all out there contained in the OpenTelemetry Collector: can journey repository, but it surely actually permits of us to outline in a language that sign agnostic what they wish to do with their knowledge. So it lets you get specific attributes, set specific attributes, and modify knowledge inside your Collector.

Robert Blumen 00:32:09 The opposite inner part of Collectors I need to spend a while on is exporters. What do these do?

Alex Boten 00:32:17 Yeah, so the exporter take the info that’s been ingested by the OpenTelemetry Collector. So, the OpenTelemetry Collector use receivers to obtain the info in a format that’s particular to whichever receiver is configured. It then transforms the info to inner knowledge format inside the Collector after which it exports it utilizing whichever exporter is configured. So, the exporter’s job is to take the info, the interior knowledge format, and format it to the specification of the vacation spot of the exporter.

Robert Blumen 00:32:50 Okay. So, what are some examples of various exporters which can be out there?

Alex Boten 00:32:54 Yeah, so there’s a bunch of exporters which can be vendor-specific exporters that reside within the repository as we speak. There’s additionally lots of the open-source initiatives have their very own exporters. So, Jaeger has its personal, Prometheus has its personal exporter. There’s a couple of completely different logging choices as properly. Yeah.

Robert Blumen 00:33:12 So knowledge is available in, it goes via some variety of processors after which goes out via an exporter. Is there an idea of a pipeline that maps the trail that knowledge takes via the Collector?

Alex Boten 00:33:26 Yeah, so the perfect place to seek out that is actually contained in the Collector configuration. So, the Collector is configured utilizing YAML and on the very essence of it, you’ll configure your exporters, your receivers, and your processors, and then you definitely would outline the trail via these elements within the pipeline part of the configuration, which lets you specify what pipelines you need to configure for tracing, and for logs, and for metrics to undergo to the Collector. So, you’ll configure your receivers there, after which your processors, after which your exporters inside every a kind of definitions. And you’ll configure a number of pipelines for every sign, giving them particular person names.

Robert Blumen 00:34:07 And the way does incoming knowledge choose or get mapped onto a selected pipeline?

Alex Boten 00:34:14 Yeah, so the best way that the info can be mapped on every pipeline is by way of the precise receiver that’s used to obtain the info. So for instance, should you’ve configured a Jaeger receiver on one pipeline and a Zipkin exporter on a distinct pipeline and also you’re sending knowledge via Zipkin, then the pipeline that has the Zipkin endpoint can be the vacation spot of that knowledge, after which that’s the pipeline that the info would undergo.

Robert Blumen 00:34:40 So, does every endpoint pay attention on a distinct port or does it have a path or what’s the mapping?

Alex Boten 00:34:47 Yeah, in order that is dependent upon the precise receiver. So, some receivers have the power to configure completely different paths; some solely configure completely different ports. It additionally is dependent upon the protocol that you simply’re utilizing for the receiver and whether or not it helps it or not. And as I discussed, there’s additionally these items generally known as scrapers, that are receivers that may exit and scrape completely different endpoints for metrics, for instance. And people may also be configured as receivers, which might then take their very own path to the Collector.

Robert Blumen 00:35:17 I feel we’ve been largely speaking about below the idea of a push mannequin, however this scraper sounds prefer it additionally helps pull. Did I perceive that appropriately?

Alex Boten 00:35:28 Yeah, that’s appropriate. And, should you consider the Prometheus receiver, for instance, the Prometheus receiver makes use of the pull mannequin as properly. So, you’ll outline the targets that you simply wish to scrape, after which the info might be pulled into the Collector versus pushed to the Collector.

Robert Blumen 00:35:43 So to wrap this all up, then I might instrument or configure my sources to level them towards the OTel Collector or Collectors. My community, they might have a site identify or an IP handle and a port and perhaps a path that comes after that. They’re instrumented, they push knowledge out, it goes to the Collector, the Collector will course of it after which export it again into backend of alternative. Is {that a} good description of the entire course of?

Alex Boten 00:36:17 Yeah, that’s precisely proper.

Robert Blumen 00:36:18 How do the sources authenticate themselves to the Collector?

Alex Boten 00:36:23 Yeah, so for authenticating to the OpenTelemetry Collector, there’s a number of extensions which can be out there for authentication. So, there’s OIDC authentication extension, there’s the bear token authentication extension. You can even use the essential Auth extension should you’d like. So, there’s a couple of completely different out there extensions for that.

Robert Blumen 00:36:43 Yeah, okay. Properly, let’s speak about extensions. So, what are the extension factors which can be provided?

Alex Boten 00:36:49 Yeah, so extensions are primarily elements within the Collector that don’t essentially have something to do with the pipeline of the telemetry going via the Collector. And so, among the extensions which can be out there are the pprof extension, which lets you get profiling knowledge out of the Collector. There’s the well being test extension, which lets you run well being checks in opposition to the Collector, and there’s a couple of different ones which can be all out there within the Collector repositories.

Robert Blumen 00:37:20 Okay. So, we’ve just about lined most of what I had deliberate about what it does, the way it works. Suppose you’ve a undertaking that has not been constructed with this in thoughts and is enthusiastic about migrating. What’s a attainable migration path to OTel from a undertaking that may have been constructed a number of years in the past earlier than this was out there?

Alex Boten 00:37:45 I might say the primary path that I might advocate to of us is admittedly to consider is there a method that I can drop in a Collector and obtain knowledge within the format that’s already perhaps being emitted by an utility. That’s actually the very first step that I might recommend taking. I do know that there’s a couple of completely different mechanisms for accumulating telemetry that predate the Collector. So, telegraph is an instance of a kind of. You probably have telegraph operating in your atmosphere and also you’re enthusiastic about seeing should you can join it to the Collector, perhaps that’s place to start out is, to take a look at connecting the 2. And I do know Telegraph, for instance, emits OTLP, in order that’s already one thing that’s considerably supported. In order that’s actually step one I might take is can I simply get away with dropping in a Collector and emitting a format that’s perhaps already supported?

Alex Boten 00:38:30 One factor to notice is if in case you have a format on the market that’s not at the moment supported within the Collector, you may at all times go to the neighborhood and ask, ‘hey, is that this a part that people are enthusiastic about in adopting?’ And that’s at all times avenue to form of tackle. If you happen to’ve acquired dedication out of your group to perhaps change the instrumentation libraries that you simply’re utilizing inside your code, then nice. I might begin taking a look at sources. I do know there’s a couple of completely different use instances which were documented, I feel on OpenTelemetry.io round migrating away from both OpenTracing or OpenCensus. So, I might positively begin searching for these sources.

Robert Blumen 00:39:07 So we’ve talked concerning the historical past and what it does, what’s on the roadmap?

Alex Boten 00:39:12 Yeah, so on the roadmap for OpenTelemetry, which we really very not too long ago revealed. So, up till earlier this yr there wasn’t an official roadmap revealed by the neighborhood. However we’re lastly beginning to change the method slightly bit to attempt to actually focus the efforts of the neighborhood. So, at the moment on the roadmap we have now 5 initiatives which can be occurring. So, among the work is being finished round each client-side instrumentation, so both, internet browser-based or cellular shoppers, and round profiling. So, that is profiling knowledge being emitted both utilizing an present format, however there’s some dialogue round whether or not or not there’s going to be an extra sign known as profiles to OpenTelemetry. There’s additionally a whole lot of effort being put into making an attempt to stabilize semantic conventions. So, should you’ve seen the semantic conventions contained in the OpenTelemetry specification, you’ll most likely know that a whole lot of them are marked as experimental.

Alex Boten 00:40:10 And that’s simply because we haven’t had the prospect of actually focus the neighborhood on making an attempt to come back to settlement on what secure Semantic conventions ought to appear like. So, there’s a whole lot of effort to herald specialists in every one of many domains to make sure that they make sense. The opposite efforts that I’m enthusiastic about, as a result of I’m a part of the work, is to place collectively a configuration layer for OpenTelemetry as a complete in order that customers can configure utilizing some form of configuration file, take that configuration file throughout any implementation, and know that the identical outcomes will happen. So, for instance, should you’re configuring your Jaeger exporter in Python, utilizing this configuration format you’d be capable to take that very same configuration to your .NET implementation or Java and never have to write down code manually to translate that configuration. After which, there’s some effort round operate as a service help from OpenTelemetry. So, the group is at the moment centered round lambdas as a result of that’s the primary serverless or operate as a service mannequin that’s come to us. However there’s additionally effort to herald of us from Azure and GCP as properly. To form of spherical that out.

Robert Blumen 00:41:19 We’re at time, we’ve lined every little thing. The place can listeners discover your ebook?

Alex Boten 00:41:25 Yeah, so you’ll find a ebook on Amazon. You can even purchase immediately from Packet Publishing. And yeah, it’s additionally out there at your native bookstores.

Robert Blumen 00:41:35 If customers wish to discover your presence wherever on the web, the place ought to they give the impression of being?

Alex Boten 00:41:40 Yeah, to allow them to, they will discover me on LinkedIn slightly bit on Mastadon or on Twitter — although not as a lot anymore. They usually can discover me on the Slack channels for the CNCF Slack occasion. I’m fairly energetic there.

Robert Blumen 00:41:55 Alex Boten, thanks very a lot for talking to Software program Engineering Radio.

Alex Boten 00:41:59 Yeah, thanks very a lot. It’s been nice.

Robert Blumen 00:42:01 This has been Robert Blumen for Software program Engineering Radio. Thanks for listening. [End of Audio]

Leave a Reply