[MUSIC PLAYING] Hello. Welcome to this presentation on syslog-ng performance tuning. My name is Craig Finn. And I'm a pre-sales systems engineer with One Identity.
So the agenda today is as follows. We're going to talk about some configuration parameters, buffer sizes, window sizes, things you need to take care of when you're configuring syslog-ng for optimum performance. We're going to describe and talk about flow control, a very important feature of syslog-ng that you should always activate whenever possible.
We're going to talk about how you can set up your message processing pipelines for the most efficient processing possible, most performant. We're also going to introduce some tricks, not necessarily tricks, but things you should be aware of when you're crafting your log statements or your log processing pipelines, some options about their ordering and some of the other features you can implement to, again, optimize performance.
We'll also talk about what you can do in terms of name resolution to make sure if you need to do that, you can do it at the highest possible performance. We'll describe the differences between TCP and UDP primarily from a performance standpoint. There are some performance differences.
Normally the discussion regarding TCP and UDP revolves around reliability because UDP is naturally by definition a less reliable protocol than TCP. But there are some performance-related issues you should be aware of as well. And then we'll look at the relative impact of the various types of message processing that syslog-ng offers, things like parsing, rewrite rules, et cetera.
So one thing about syslog-ng that I'd like to emphasize right off the bat is that by design, it's designed to be fast and scalable on a single node. A single instance of syslog-ng itself has extremely high performance, which simplifies your deployment. You don't have to have many, many different low-power nodes configured into a cluster or a pod that you have to continually scale as you need more performance.
Syslog-ng, again, in a single instance gives you very, very high performance. And it does that by having a core that consists of very highly optimized C code. Philosophically we don't have anything against non-compiled code, interpreted code. In fact, we have a very useful and very well-constructed Python binding built into syslog-ng. So you can use Python to make custom extensions if you need to.
But the core, again, is highly optimized C to give the maximum amount of performance for the syslog-ng engine. And it does run in multi-threaded mode, which allows you to scale to however many CPUs and cores you have in the platform in which it's installed. And it's an extremely efficient asynchronous IO architecture, which is necessary because we're obviously handling a large amount of data coming in through your network interfaces, going out through your network interfaces, and also being written to local disk.
So when I talk about performance of a single instance, what am I talking about? Well, here's an idea. Based on a test done by the One Identity syslog-ng developers, a single instance with multiple connections writing to local files can accommodate roughly somewhere in the order of 635,000 events per second with a throughput of around 235 megabytes per second. So again, that's very high performance in just one instance of syslog-ng.
Now, there's a few-- well, a relatively small number, I'll say, of very important configuration options that you need to be aware of and you need to set correctly to get the best performance from syslog-ng. We're going to talk about them here. I'll give you obviously more detail later. But the ones that you need to consider very carefully on the input side, when you're defining your source options, your source blocks, in syslog-ng, our log-iw-size, log-fetch-limit, and the max-connections parameter.
And then on the output or destination side of your configuration, you're going to need to understand and size correctly what we call the log-fifo-size and also flush-lines. Now I'm going to go into each of these in a fair amount of detail. And I apologize for the text-heavy slide here. And I apologize in advance for reading some of this information to you. But it's kind of important to define exactly what these are.
Now I'm going to later on, show you, graphically, some examples what these things do and how they're set. But let me go through the definitions first. So the log-iw-size here, it sets the size of what we call a control window for a syslog-ng source. And basically what this does, it serves as an alias for how much space is available or left available in your output buffer. And the way it works is every message that syslog-ng reads in from the source, reduces the available space in this window by one.
Conversely, every message that syslog-ng can get out from its output buffer to its destination increases the output buffer-- the space in the log-iw-size window by one. So it expands and contracts based on how well syslog-ng is moving messages from the input to its ultimate destination. And it's used in something we call flow control. So basically, if this window, governed by log-iw-size gets full and you also have the flow control flag set in a log statement that references that source that's using that log-iw-size parameter, what syslog-ng will do is stop reading messages temporarily from that source.
And it will wait until some of the messages get pushed out of the output buffer, get to their destination. And that means there's more space available and syslog-ng can open up the source again and read more messages coming in through its input sources. So it basically serves and provides for an on/off controller to modulate the rate at which messages come into syslog-ng.
And as we'll see later on, the whole purpose is to prevent that output buffer from getting full. Because if the syslog-ng output buffer ever fills completely, any additional incoming messages will have to get dropped. So it's a mechanism to ensure that syslog-ng will not drop any messages. Log-fetch-limit is a pretty simple thing to understand. This is basically how many messages are read by syslog-ng from every connection to a source during each of its poll loops.
So syslog-ng will continually poll all of the sources, all the connections within the source for messages. And you could set and tell syslog-ng how many messages it should pull from each of those sources each time it does a poll. So you can increase or decrease that. But the more you can pull, potentially, the better your throughput will be through syslog-ng.
And another important parameter that you'll see in a source block definition is called max-connections. And this represents the maximum number of TCP connections. Now these are TCP actual connections, not UDP. But the maximum number of TCP connections that the source can accept. So very important to get that right so you don't inadvertently shortchange yourself. And I'll talk about that again a little bit later.
So on the output side, there's some very important buffer sizes and parameters you need to consider as well. And the most important one here is called log-fifo-size. And this is basically, this size is your output buffer. So every destination that you define in syslog-ng will have an output buffer. It's an in-memory structure, a queue, first in, first out queue for, again, each and every destination. So you can set the size of those separately based on the requirements of that destination. But you can if you want, create just a single size for all of your destinations and put that in your global options section.
So your choice. In general, you're probably be better off to set that individually, but you don't need to. But we'll talk again about how you should size that and what would be an appropriate way to do that a little bit later on. And then the other output side parameter that's important is called flush-lines. And from its name, you could probably gather what this does. This specifies how many lines or how many messages syslog-ng will accumulate before it sends that block out to the destination, whether that's a local file or to a network destination.
So when you increase that number, you get higher throughput from syslog-ng, and in general, better performance. But you also have to consider the higher that number, the longer syslog-ng will hang on to a message before it reaches that batch and sends them on down to the destination. And that will increase the latency. So if you're sending messages to some downstream application and maybe it's important for that application to get the message as quickly as possible, you might need to modify your flush-lines so that you don't have too much latency in the delivery of those messages to that destination.
So there's a lot of things to consider there. And you might be wondering, well OK, but how do I set these appropriately? Now you don't necessarily have to worry about it too much because syslog-ng will have intelligent defaults for all of these parameters for the most part intelligent. I'll discuss that in a minute. There's one that you really have to worry about setting. So you can just use what comes out of the box and be pretty much assured that your system is going to operate correctly.
But it may not be operating at the highest level of performance. So these are the default values that you'll see. And again, they're pretty good. You could live with these. But the one that I need to point out is called the max-connections parameter. That default value is 10. And that's telling you, if you don't set it, that tells syslog-ng that for this source, where the administrator is not set the max-connections parameter, I'm only going to accept 10 TCP connections through this source.
Now in general, that's not going to be enough. If you create a source to accept messages from your TCP/IP endpoints, you're probably going to have more than 10 sending messages your way. And if you don't change that, if you don't increase that syslog-ng will accept messages from the first 10 that connect and then any others, it'll reject. And you'll wonder why you're not getting messages from the overwhelming majority of your TCP sources.
So keep that in mind. Make sure you set that in an appropriate manner. Now I mentioned the defaults. I mentioned what these things do. But you're probably still wondering, OK, well how do I intelligently set these based on my requirements? So what I'm going to do is give you my recommendations for a quick and easy, but effective way to set these parameters. And the first one is what we just talked about, the max-connections parameter on your source.
Well, make sure you make that greater than or equal to the number of connected TCP peers that are going to hit that source. Once you've done that, you've got to consider your log-fetch-limit. I would set that at 100. I mean, that's a pretty good value. You could think about increasing that later on, but think, if you set it at 100, you're going to be fine for the overwhelming majority of uses you have.
Now the next one is your log-iw-size. Again, this is the size of that initial control window that's going to be important when we talk about flow control. And for log-iw-size, it's a pretty simple calculation. You take your max connections and you multiply that by your log-fetch-limit and you're good. Now you're set on that. And then the other one you need to really consider carefully is your log-fifo-size.
This is the size, again, of your output, in memory output buffer structure. And that's, essentially, got to be at least as large as your log-iw-size parameter, the size of your control window. But it really ought to be quite a bit more. Because you want to be able to fit several polls of syslog-ng of the sources into your output buffer. So I would recommend that you make that at least 10, preferably even higher than that. Another factor of 2.
So I would say take that log-iw-size that you've calculated above, multiply it by 20 for your log-fif0-size. And then you're going to be-- you're going to be pretty good. So this is all making easy button for setting these things in not necessarily the optimum, way but pretty close to it without having to scratch your head too much about how to get there. I know if you read the documentation, there are different descriptions and different places in the admin guide and it can sometimes get confusing.
If you follow these guidelines here, you're going to be in good shape. And here's a simple example. A configuration file illustrating these parameters. So I have a source definition. It's a very simple one, accepting messages, using the original BSD or legacy protocol. So what I'm saying is I want my max-connections to be 1,000. Because I've got several hundred TCP sources that will probably be sending messages here. So 1,000 should suit.
I'm going to keep my log-fetch-limit as 100 per my previous guidance. And then my log-iw-size. I probably will be using flow control. I'll need to set that control window. And again, I'm just going to multiply my log-fetch-limit times max-connections. And that gives me 100,000 for my log-iw-size. So we're set there. Now on our destination definition, in this case, I have a destination that's using [? IETF ?] RFC 5424 protocol, but that's actually not too relevant here. It doesn't matter. But it's a TCP outgoing connection.
And what I need to think about here is what is my output buffer size going to be? And again, following my previous guidance, what I'm going to do is I'm going to take my log-iw-size and I'm going to multiply it by 20. And that's going to give me two million slots for the message slot size for my output buffer. And then I'm done. And now I'm pretty much guaranteed that I'm going to have a near optimal configuration of my source and my destination.
OK. So we talked about flow control before and how instrumental sizing the control window was to make sure flow control works. So what I wanted to do here is give you a graphical view of what this is all about and bring a lot of these concepts together. So what I do, I have a very simple configuration on top where I have two sources. One is accepting messages over the UDP network protocol. The other one with TCP.
And I have a very simple log statement, like log block, where I'm accepting messages through these two sources. They're all going to a common network connected destination. And I have the flag, the flow control flag set in my log statement. Flow control is something you set in your log statement, not in your source or destination definitions. And now I'm leaving out a lot of the parameters I just spoke about because they're not necessarily germane to this illustration.
So I'm just assuming I'm using the default values for those, just for clarity and to make it a little briefer. But down below, you see a diagram of what we're talking about. We have the two sources over here. They're both feeding the same destination. Now there's an output buffer for the destination, again, size via the log FIFO size parameter. And we also have a control window for the source. And again, that's size based on our log-iw-size parameter.
And again, the way these are sized, the control window will be always smaller than the output buffer. But what will happen is if we go through this, let's say that we have some kind of a situation happening at our external TCP destination. So for some reason, syslog-ng is now having difficulty delivering messages at that external TCP destination. So what's going to happen here is messages are still coming in, but they can't get out of the output buffer quickly enough or maybe not at all.
And the output buffer is going to slowly start to fill up. But before that fills up, you'll notice that the control window has already reached its maximum value. So what that's going to do, it's going to trigger flow control. So the control window says, look, I'm out of space, which means my output buffer is in danger of becoming out of space.
I don't want that to happen because if that happens, messages are going to get dropped. So I'm going to make syslog-ng turn off all the inputs from my sources temporarily. Now once communications are restored to the destination and the output buffer can drain, the control window will also shrink. That'll show space. And syslog-ng can open up that connection from the sources again. It can modulate these settings like an on/off valve, feeding maybe a tank. But for the moment now, it's shutting that off so that our output buffer will not overflow and be forced to drop messages.
That's the whole idea behind flow control. With flow control, syslog-ng will not and cannot drop any messages that are coming into it. But there is a hidden issue. There's a hidden problem here that I'm going to go into in the next slide. So if we step up a level, look at this from the overall network scenario, network picture, where this dotted line that you see, everything to the left of that is handled by the kernel of our operating system, the TCP stack. And everything to the right is handled by syslog-ng,
So let's go again and see what happens when flow control is triggered. So again, that's going to make our sources, it's going to have syslog-ng stop accepting messages from the sources. Now let's look at what happens on the top level here for TCP connection sources. The sources that are sending messages to the s_tcp source, well, they're going to be OK because they're just going to pause. They're going to be in contact with their peer at the Linux OS, their TCP/IP peer connection.
And the connection is going to say, hey, wait a minute, my TCP window size is zero. The application cannot accept any more data. So hey, please hold off sending any additional TCP data segments until I inform you that the window is opened up. So those sources will wait. When the window does open, when flow control is turned off, they'll be able to send again. So they will honor this modulation that syslog-ng is initiating via flow control. So they're all good. Nothing's going to drop from the TCP senders.
But it's going to be a different story for UDP. Because what's going to happen is the UDP senders have no way of knowing that the s_udp source is no longer accepting messages and sending them on to be processed by syslog-ng. UDP is a connectionless protocol. There's no state, there's no feedback from the destination to say, hey, wait a minute, I don't have any room left. The only thing they can do is keep on sending.
And what happens is the buffer maintained by the Linux kernel is going to fill up. And once that fills up, because nothing's draining it, this kernel is just going to drop any additional incoming messages. So you're going to lose messages. Syslog-ng is not dropping them, but they're dropping nonetheless because the kernel has no place to put incoming UDP datagrams. So the lesson here and the very important takeaway is that if you're going to use flow control, and please do. It's a really good feature. But it's only going to be effective for TCP-based protocols.
It's not going to be able to work and cannot work for stateless, connectionless protocols like UDP. That's what UDP is. So the bottom line here is don't mix TCP and UDP inputs in a source definition, which is used in a log statement that has flags flow control. So if you have flags flow control in your log statement, you should only have sources that are accepting messages from TCP in that log block.
So that's a description of the important configuration parameters, buffer sizes and how they interact with things like flow control to get started. But there's obviously another big dimension to performance of syslog-ng. And that's related to the types of log message processing you're going to be doing within your log processing pipelines within your log statement. So that'll be what kind of filters are you using? How complex are they? How much resource will they take to be executed?
Same thing with parsers. Same thing with rewrite rules. You're going to be doing all these things. And all of these things that you're doing with the messages will have some impact on the performance, the overall throughput, of syslog-ng. There are efficient ways to do it and then there are other ways that may not be quite as efficient. So we're going to discuss that in the next few slides.
And what I'm going to do is, since don't have time to discuss everything, I'm going to use-- I'm going to talk about filters in some detail because filters can be kind of a stand-in. A lot of the same concepts that relate to filtering can adapt-- you can adapt all sorts of things like parsers and rewrite rules. So we'll talk about filters in a little bit of detail here. It won't be a full tutorial on filters, but I'll hit the high points and the things that I think are important to consider when we talk about the impact of filtering on overall performance.
So filters are very basic. They're very easy to understand and syslog-ng and very easy to configure. There's basically two ways to do it. And one is by doing just macro comparisons. So syslog-ng is automatically parsing a huge amount of information out of syslog messages. They're automatically parsing things like the value of the process ID, the host name, the priority level, the facility, quite a number. The date, you name it. There's an extremely large number of these things.
And you can always access the value of any macro by using the terminology or the ability to use dollar sign, open curly bracket, close curly bracket, around the name of the macro. That way you reference the value of that macro. Once you've referenced the value of that macro, you can use Boolean comparisons to decide, does it equal this? Does it not equal that? Is it greater than if you're looking at the level number parameter or less than.
So it's a very simple way to evaluate the value of macros, see if they match in some Boolean logic way, your criterion. And if it does, that filter evaluates is true and that message will pass on through the pipeline. So very simple to do. But there's also another way, which may be conceptually is even easier to grasp. And that's through the use of syslog-ng built-in filter functions. So these, the syntax of filter functions, are very much like any function in any programming language.
So instead of having you go out, identify macros, reference a value of that macro, these have built-in logic to do that. And they have names that are either identical to or reminiscent of the names of those macros. So for instance, there is a filter function called Level. The parameter would be something like warning, or emergency, or error, or information. And it has some other nice features as well because this particular one, you can specify as your parameter, a range of levels.
And we have similar filter functions that automatically reference a program name or a host name or a facility name. So this is another way to perform filtering, which, again, in some cases is conceptually easier and also provides some additional options in the way you set these up and decide what your parameters should be to the functions. There's another advantage to using the inbuilt or the built-in filter functions, which I'll talk about in a little bit of a later slide.
So there are some additional capabilities of the filter functions too that I haven't really touched on yet. And that is that some of the filter functions, in particular, the ones related to the program name, the host name and also one called Match can use regular expressions. So you can see in this particular example, I'm looking for messages where the host name, the host macro in that message can match DVRCR, dash, followed by anywhere between 2 and 5 numeric digits. So you can get much more general matches and sometimes very complex matches by using regular expressions in these filters.
And you can also, with the same filter functions, use shell-style glob matching. So in the final example there, you can use a host filter function and you can search for messages in which the host name is the literal string, my host, followed by anything. And to do that, you describe the host that way with the asterisk after the literal match and then you specify type glob. And that will do a glob search, as opposed to an actual regular expression search.
Another really important one and one that has some important performance characteristics is called in-list. So in many cases, you might end up with a filter statement like the second one here. You might need to say, hey, I need to filter and I want to match a very large number of separate host names. And one way you might do that, and you might think it's the best way to do, is to say, OK, I'll use a regular expression match. And I'll use the alternation operator, the vertical line.
And I'll just say, OK, my filter is going to be true if the host name matches host one, host three, host 12, host 50, whatever those are. And you might have a very, very long list of these alternations that you're trying to get your regular expression to match. Two things are going to present a problem there. One is the performance of that regular expression might be horrific. It might take a lot of resource to satisfy it.
And it's kind of hard to maintain because you've got this big, very long list in your configuration file. So it's going to be a maintenance headache to some degree. So what you could do is use a filter function, it's just like [INAUDIBLE] in-list. And what this does, it says, OK, I'm going to allow you to create a file. Put that file anywhere you want and then put all the values you want to match in that file.
So I have an example here where instead of the second regular expression, we use in-list, we point to the file, it's going to have the values we want to match and then we tell it-- we're trying to match host. We're trying to match the host macro, or that portion of the message body. And then what you do is maintain a separate ordinary text file yourself. And you don't have to maintain it-- and just like ng.conf.
And it's going to be also be faster because you won't have to have syslog-ng evaluate a very complex, long, regular expression. So you can very definitely make syslog-ng run much more efficiently if you have situations that can be matched or addressed by the endless filter function. So let's recap some of the things that we touched upon here on filtering. So the first bullet here is obvious. You want to use the simplest filter when filtering incoming messages.
Sometimes it's not obvious what that is. But if you think about it, you can usually come down to the one that would be most efficient. One thing is you probably want to make sure that you're not resorting to a regular expression comparison when you don't have to. If you can avoid a regular expression match with an ordinary filter function or an ordinary filter match, you're better off doing it that way.
I mean, sometimes you can't. The second bullet is also important. We talked about different types of filter functions. Macro matches, macro evaluation against a match and also filter functions. Well, it turns out that the filter functions in syslog-ng are also slightly faster than the equivalent macro comparison definitions. So whenever you can, it behooves you to use filter functions instead of plain old macro comparisons.
Again, with regular expressions, the third bullet here, it's going to hurt you to some degree, not necessarily a lot. It's going to depend, obviously, on the complexity of the regular expression. But if you can avoid it, you should. Because again, going down to the next bullet, the regular filter functions, if you're able to exploit them, are going to have a much lower performance hit or performance degradation.
Another question that comes up a lot-- I deal with this quite a bit-- is does it matter in which order of the filter definitions appear in syslog-ng.com? And the answer is no. When you're defining filter functions or filter statements, it doesn't matter what order you put them in your file. You can put them anywhere you want, in any order. You can put them all at the back at the end of it if you want. But there are some cases within a filter definition itself where the order of things you put in that definition can make a big difference in the performance of a given filter.
So I'll give you an example here. We have a filter where we're using, again, a regular expression that looks like the one I showed you in a couple of previous slides ago, where I'm doing a match. And I'm looking at a match of a lot of different values separated by "or's." So does it have this expression or that or that or that or that? And that could be a very long compare if you're looking for what could be many, many different matches.
So what you can do in this case, if you can't possibly avoid this, which you might want to think about is making sure that, again, if you know this, and you may not, but if you have some heuristics about which value is going to be matched in most cases, if you could put them toward the front of your definition, that's going to make a huge difference in the evaluation of that filter. So I have an example here where someone has a situation where they're filtering against a whole number of these different ASA expressions from Cisco ASA devices.
And just a pretty long number of different options they want to match against. And perfectly doable, makes sense. You can use program and a regular expression with this syntax to make this filter.
But what's going to happen, and I'm going to illustrate this with some information I've gotten by going to regex101.com, which is a great website to help you evaluate these regular expressions. So if I look at that regex and I tried to figure out how it performs based on where in that statement a particular match might land, I can see that if I try to match the last one, the very last entry in my definition, it matches.
But it took the regular expression parser 789 steps to do it. Now if I look at another one. Let me look at the first one. If I try to match the first one in my definition, that takes 21 steps. So in other words, that match happens much more quickly and uses a far lower amount of resource on my syslog-ng instance than the first situation. So again, if you have some way of knowing, and you may not.
It may not be possible. But in a lot of cases, I think you might. If you know that a certain value of the expression is going to be much more numerous in incoming messages or a group of them are much more numerous than others, if you can get the more numerous matches toward that front of the definition, you're going to be much, much better off in terms of making that match happen more quickly using fewer resources.
Let's talk about log statements as well. Because here is a case where ordering does make a huge difference. This is the order in which these appear in syslog-ng.com. And what happens in syslog-ng, is when you have a chain of log statements, syslog-ng will go through them in order from top to bottom. And what it will do, in this case, with this particular syntax, it'll look at the first log statement. Incidentally, these all are looking at the same syslog-ng input source, s_net.
So what it'll do is it'll look at all the messages coming in that source. It'll apply some filter. And maybe we need to have the matching messages sent to this destination. That'll happen. Then what syslog-ng will do, it'll take all the messages that came in to this source, again, and now run them through this. So every message that came in to s_net is going to be matched against filter f2. Those matching ones will go to destination, d2, et cetera, et cetera, et cetera, all the way down to the final one, or next to the final one.
But the key here is that all the way down to the bottom, every single message that came into s_net is going to be matched against whatever filters or parses you have in subsequent log statements, which could be a lot of extra processing that you don't necessarily need to do. Because when you look at it, everything that met this filter is already gone to where it needs to go. Same here, same on down the line. In this particular example, has a very awful final log statement from a performance standpoint. Where it says, OK, now I'm going to look at every message again and I'm going to filter it to see if it doesn't match any of my defined filters.
So hey, it's going to be a Boolean true match if it doesn't match F1 or F2 or F3, et cetera, et cetera. So now, in this case, not only am I looking at every message all over again, but I'm applying what really is an unnecessary filter redundantly here. So this is a killer. From a performance standpoint, an awful way to do things.
There is a much different way, not a much different way, but a slightly different way to do it that's orders of magnitude more efficient. And that's using two things. The first one is use flags final in my log statement. So what this does, does the same logic as before, but here, when syslog-ng matches all the messages coming into the first log statement, so everything that matches F1 gets sent to destination, D1.
But then the matching statements here don't get cascaded down to the following log statements. So now a lower number of messages are going to go to my second, third, fourth and fifth. And each time it cascades further down, there's a lower and lower number of messages that have to be matched against these new filters in the subsequent log statements. Until I finally get down to the bottom. And here, what's going to happen is the only messages that are reaching my last final log statement are the ones that didn't match any of these filters.
So essentially, without doing any additional filtering, all my unmatched messages go to my unmatched destination. So I don't need this statement at all. So as you can plainly see in this scenario here, syslog-ng is doing much, much less processing of messages. And of course, the throughput's are going to be inordinately higher than the previous situation.
So again, use flags final. And one last thing. In addition to using flags final in situations like this, you want to also consider the order in which you do this processing. So here, again, if you have some way to know which of these filters in this cascading set of log statements is going to be matched by the majority of messages or most messages coming into s_net, you should put that first or near the top.
So let's say, as an example, a kind of an extreme example, you know that 90% of the messages are going to be the ones that match F1. What you want to do is make that your first log statement. That way, those will get matched here and then you'll take out 90% of the total load coming in s_net net that have to be evaluated by the subsequent log statements. So again, it behooves you to have those messages that will match log statements more readily higher in your list of log statements.
If you know, again, you may not know. You may not have a good idea, but usually, there will be ways to find out that information. Here, we just talked about that. So name resolution. So one thing syslog-ng can do, as I'm sure you're all aware, is it can do name resolution. It can resolve the host names of the clients based on IP address and then include the host names in the messages.
But that could be a problem in terms of performance. And you could see why. If it has to depend on an external DNS server, that DNS server might be slow. It might be temporarily inaccessible. So that'll definitely slow down the throughput in syslog-ng. So in the best situation, you should not use DNS. Don't use name resolution in syslog-ng if you can, but sometimes you need to do it. And there are ways to help.
The first thing is you can have-- within syslog-ng, DNS cache. So you can cache all of the important host names. Now there's a default size of the DNS cache. Like many other parameters, there's a default for you. But you can set the size of that yourself. So you can set that as large as is reasonable in your environment. The other thing you could do is if the IP addresses and host names of your clients really change only rarely, you could set the expiration of that DNS cache to a very large value.
Again, make that as big as you want. So if it's kind of a static DNS environment, make that very big. That way, it won't need to be expired and rebuilt as often. So that will definitely help performance quite a bit in DNS name resolution. And then to go further, you can do all of your hostname resolution locally. So you can set an option that says, look, I'm going to do my own DNS lookups through a persist file that I'm going to maintain.
And that could be any file, but generally, you'll do that through the local Linux, [? Etsy ?] host file. And that way, you're going to get very rapid name resolution because you've essentially now eliminated your dependence on an external DNS server. Let's talk a little bit about TCP and UDP. We've already touched on one issue regarding TCP and UDP in relation to flow control. But the big difference in the way these different network protocols get handled within syslog-ng is that any TCP source can break out incoming TCP connections into multiple threads so that the-- I mean, multiple processing can go in parallel threads for those incoming connections.
Now unfortunately, with UDP it's a different story. All those UDP inputs have to go through just a single thread. So there's going to be one processing pipeline for every single UDP input. So clearly, UDP is, by definition, going to be at a disadvantage in terms of performance. And here's another diagram similar to one I showed you before where you have a standard UDP source. And to the left of the dotted line is what's handled by Linux. To the right is what's handled by syslog-ng.
We have a standard syslog-ng pipeline that has any number of filtering and parsing and rewrites that are going on in that pipeline. And in this case, we're listening on a standard UDP Port 514. So those datagrams are coming in to Linux. They're going to hit a buffer. Linux is going to maintain a ring buffer to hold them until syslog-ng can pull them off the buffer and process them. So what's going to happen, basically, is the processing pipeline will have a certain number-- a certain message rate that it can comfortably handle and process and send to their destinations.
And as long as the input rate of the UDP datagrams carrying syslog data is lower than what the process pipeline can handle, we're going to be in good shape. Syslog-ng is going to be able to pull off messages quickly enough from this buffer so that it never fills, everything's going to run very happily. But if you do get a burst where the incoming rate exceeds the rate at which syslog-ng can handle that message input through its single process pipeline, our old friend, the ring buffer and the kernel is going to fill.
And the operating system is going to say, I'm sorry, I'm still seeing incoming datagrams. I got to drop them, no place to put them. So you've got lost events. So that's a problem. One thing you might want to do is see how bad your problem really is. And you could do that with a netstat command. You could find out from netstat how many packet receivers you have on UDP and that number of packet receive errors are, essentially, the number of UDP datagrams that the kernel is dropping because it has no place to put them.
And you could do certain things to help you out in that regard as well. What you could do is, first off, check what is the size of that buffer and here's the command to do that. And what you're going to want to do in almost every case is increase that buffer size. By default, your operating system is probably going to have a very low default number for that buffer size. You can increase that by putting this command or this designation into one of your startup files. In some file in /etc/sysctl.d to make that permanent. When you reboot, now you'll have that larger buffer size.
You could also do that at the command line temporarily if you want to see if it makes a difference using the command here. But again, you want to make that buffer as big as practical. That'll help out some, but it's not a panacea. It won't solve all of your problems. And if you do that, incidentally, you're also going to want to change the sl_receive buffer size parameter in your source definition for UDP inputs on syslog-ng.
That way, you'll make sure that syslog-ng can take advantage of that larger incoming buffer size as well. But the real solution to this problem was introduced in syslog-ng a couple of years ago. And that's to use a new input source block definition called UDP balancer. And UDP balancer works from the outside pretty much the same as the standard network with transport UDP definition does.
The one difference is it's got a parameter here called listeners. And listeners, in this context, allows this particular source to use multiple backend sockets. In other words, a way to have multiple threads processing these UDP datagrams that are coming in UDP port 514. So if we look at that same diagram, but now look how the UDP balancer source handles things, you can see that if we had listeners eight, what happens is syslog-ng, what it does is it leverages the extended Berkeley Packet Filter to do this. Very cleverly designed feature of syslog-ng.
What happens is it breaks out into eight different-- if my listeners are eight, it could be larger than that. In this case, I'm showing, there are eight back end listeners that syslog-ng creates. So it, essentially, creates eight additional threads. And the balancer input source makes sure that all of the incoming UDP datagrams carrying syslog get evenly balanced, absolutely evenly balanced among all 8 or 16, 0, or however many back end ports you have and then processed again, in parallel through the processing pipelines.
So as you can see, this is going to be able to process the incoming messages much more quickly, which means it's going to be able to drain that operating system queue more quickly. In fact, now this ring buffer and the operating system will never fill. It'll always be drain quickly enough that it will never have to drop any input syslog UDP datagrams. Now you still may lose UDP datagrams elsewhere in your network that are outside of the control of syslog-ng. But at least at syslog-ng, you're not going to have that same problem that perhaps, has been bedeviling you for quite some time.
So you really want to be able to use the UDP balance source to make these things fly. And I guess one thing we need to also discuss are disk buffers in syslog-ng. And I'm going to discuss them here. These are really disk buffers are primarily a reliability feature in syslog-ng. It's something to make sure that you have a place to queue messages up on disk semi-permanently in case you have a problem getting messages downstream.
So it's not by itself a performance parameter, but it has a big input on performance and throughput. So I won't go into how these are defined and designed in detail, but basically, you have two ways to set up these disk queues, these disk buffers to store messages in syslog-ng. And one is called reliable(no) and one is called reliable(yes). This is a normal disk queue and a reliable disk queue. And the names are a little bit misleading because if you set up a normal disk queue and when you define it, saying I want to define a disk buffer and reliable is no, that doesn't mean it's an unreliable disk buffer.
What it means is in this case, the incoming messages, if they can't be delivered to an external destination, they'll be stored on a queue on disk. But there will be with the so-called normal disk buffer, an in-memory component, a very small one that means that if, for some reason, say, the operating system upon which syslog-ng is running crashes, there might be a small number of messages or events that are lost. The overwhelming number will be written to disk and it'll be fine. But there still is a slight chance that a few messages might be lost in that unusual case.
If you set up a so-called reliable disk buffer queue, that can happen because it does not depend in any way on any in-memory buffer. Everything is written to disk first so that even if the operating system crashes, you are sure that the messages that have been incoming have been written to disk and there's nothing hanging out in some memory data structure somewhere that might get lost. But here's the kicker. Using either of these will have a very significant effect on the performance and throughput of syslog-ng.
And unfortunately, it's a very negative effect. It will decrease your throughput very, very significantly. And here's a comparison. This is based on some testing, again, done by the developers of syslog-ng. And you can see here, the first column where the test was done in a particular case without using a disk buffer, the number of messages per second process were roughly 350,000 events per second.
With a so-called reliable disk queue, that drops down to 40,000 events per second. With a normal disk queue, a little bit better, but still far off the performance without a disk buffer. The normal gets up to 60,000. So as you can see, you're going to have a big hit in terms of throughput whenever you use these disk buffer and disk queue features.
So you might want to consider when is a good time or a good situation to use them and when am I better off not using them? So I'll talk about the second one first, when not to use disk buffers. First of all, don't even bother using a disk buffer if you're using UDP as your destination network protocol. This goes back to the fact that UDP is not a connection-oriented protocol.
So if you're using UDP to send messages downstream, syslog-ng will have no idea whether or not that downstream connection is up, down, slow and no idea at all. So it'll never know that it should start queuing messages into the disk or not. So don't even think about it. It's not even a consideration. And please don't use UDP to do that, use TCP. And also maybe you have a very reliable destination.
You know that, hey, this destination never goes down. We have a rock solid network. Maybe the network sometimes gets a little bit slow, but it's always up and running, I can always depend upon that destination being there for me. So in that case, you don't really necessarily have to have a disk buffer. What you can do is use a very large standard in-memory buffer, your log-fifo-size. Make that much bigger. And then depend on flow control to make sure that you won't get any message loss.
Again, if for some reason the rate at which you get messages down to that reliable destination slows down, you can have flow control turn off sources temporarily until it can catch up. That way, you can avoid having to use disk buffers at all. Now we'll look at the other side of the coin, when to use disk buffers. Well, the reason there is pretty obvious. Whenever the destination maybe is sometimes unavailable, it might go down periodically, maybe it has to be frequently brought down for an upgrade. Maybe it's on an unreliable machine, maybe it gets very slow sometimes.
It maybe it slows to a crawl. Maybe the network's a little dodgy. That goes up and down. So in those cases, you really are going to have to have, or think about very seriously having a disk queue to handle those problems. And then whenever the source might also have a situation where you have a source that could be subject to very large message bursts and they come in so hot and heavy that the downstream destination just can't handle it. And maybe flow control is not an option.
A good example is, again, if you have UDP sources. Remember, flow control is not going to work on UDP sources. So if you've got a lot of UDP sources with bursty traffic that maybe can exceed the rate at which that downstream application can accept messages, again, a disk is probably something you'll have to depend on to make sure that you get reliable message transfer and you're not dropping anything.
OK. So we talked a lot primarily about filtering when we talked about the impact of types of processing tests like ngPerformance. And we talked about certain things you can do with filter functions, how you choose them, more optimal ways to do filtering. And we also talked a little bit about what you can do with your log statement definitions in order. But what we didn't do is we didn't talk a lot about some of the other processing features like parsing or rewriting.
And now I'm going to do that a little bit without going into a lot of detail. But first off, you'll hear a lot of times people will strongly discourage you from using regular expressions and filters and parsers and other aspects of a syslog-ng rewrite rules. And that's probably overblown. I mean, regular expressions can be very resource intensive in certain cases. But overall, they have a relatively slight impact.
So I wouldn't shy away from them, but you need to be careful with regular expressions. And another thing you might want to consider, I'll talk about this in a subsequent slide, but there's a very, very effective parser called patternDB in syslog-ng. And you got to watch out for patternDB because if you have a very robust multi-threaded capable environment, the patternDB is going to have a very large negative impact on performance. Again, I'm going to talk about that in a subsequent slide in a little more detail. But keep that in mind.
And again, this is an obvious point, the bottom point. When you're combining different types of pre-processing. You might have a log statement that does filtering, parsing and rewrite rules in your processing pipeline. What will obviously be the case is the processing rate through that pipeline, through that log statement is going to be controlled by the rate of the slowest component, the one that has the biggest impact on performance.
So if you do have that kind of a bottleneck, that's where you're going to want to really focus on all of your attention to improve it. And to give you an idea here of the relative impact of different types of processing. Here's a table from lightest to heaviest, I guess, or best to worst in terms of the impact of these various types of processing have on syslog-ng. Starting at the top with no pre-parsing at all.
And again, these are not exact figures. These are just relative figures from a specific test case. But you can see that most filters are not too bad. They're not going to really limit your throughput by much. And again, even simple regular expressions are not that bad. So you don't necessarily have to be fearful of using regular expressions in matches or other filter functions. Same thing with doing ordinary matching against a facility number or a priority number.
Here again, simple rewrites. If you're just rewriting a host name or rewriting some other piece of a message, that's not going to be too bad. KV parsers, a little bit of step below. If you're using the key value parser, now you're going to be using a little more processing. You can see the drop there to roughly half of the simple rewrite rules. And then patternDB, you see a big drop. I'm going to talk about that in a minute.
And the ones following that are also much more performance intensive. Not necessarily performance intensive, but just take a lot more time and will reduce the throughput that syslog-ng is able to accommodate. And trailing all of them are the XML parser because that really does take a lot of resource to parse XML. And of course, XML is something that will be used with Windows event messages. These come as raw XML, which means that if you're going to be ingesting Windows events, you'll be forced to depend to a large extent upon the XML parser.
But again, this gives you a rough idea of the relative impact of different types of processing. And if you can try to stay toward the top of the list if at all possible. So I'm going to make some comments about patternDB because I discussed that a little bit earlier, not necessarily in a positive light in terms of its effect on throughput. But patternDB is actually a great feature. It's an extremely flexible parser that can do a lot of things for you.
First off, it performs message classification, which can be very important in a lot of environments. But the really neat thing about it is you can parse and extract name value pairs from either partially structured message or something that's completely unstructured. Most of the other parsers depend on some level of structure that you can exploit in your parsing. With patternDB, message can have absolutely no structure whatsoever. It be really free form. And you can go in and find the information you want and create name value pairs based on your requirements.
And what it also does, it has built in pattern parsers that are far easier to use than regular expressions. I mean, by orders of magnitude easier to use. So it's really easy to make matches and define these name value pairs that you want to pick out of this completely unstructured message. And it does that using it does this using a very fast radix-tree data structure and matching algorithms.
So by itself, it's very fast. It does what it does very quickly. Another big advantage is it provides for message correlation. So it can identify messages. Maybe have-- and this happens in quite a few different scenarios, some software products will create a multitude of separate messages, but they all relate to a single event. What patternDB can do, it can identify the context of these messages and associate separate messages with one event context and allow you to combine them into a single event when the context ends.
So that's a huge advantage that the other parsers just can't do. So it's a great feature and great product. The thing you got to be careful of is that, and here's where the negativity in terms of throughput comes in. Because it's used in message correlation, it has to run in a single thread. It just has to because there's no other way to keep track of the context of different messages if they're all running in separate threads.
So if you have a pure parsing need, where you don't have to do correlation and you have a highly multi-threaded, multi-threadable environment, you might want to consider using other syslog-ng parsers. For instance, maybe you could use a regular expression match. Now they may be individually slower than patternDB and probably will be, but you do have the ability to run them in multiple threads. Because they can run in a multi-threaded fashion.
And they can still result in higher throughput than you would get with patternDB. So again, if you don't really need the features that patternDB provides for you, you may actually get better throughput with slower ways to parse, which provide better throughput by virtue of the fact that they can run in multiple threads versus patternDB's one thread.
So that's pretty much what I had to cover today. I'm going to leave you with some links. You can always go to syslog-ng.com. That's a One Identity page that's dedicated to syslog-ng. And you'll have access to all of the technical documentation, the knowledge base, all the code as well for the syslog-ng and store box products.
And you will find in the documentation section, a white paper, the performance guidelines for syslog-ng Premium Edition, which will have some of the information provided here in a different format. But you can refer to that as well. And of course, always go to the administration guide because that has comprehensive information about all things syslog-ng Premium Edition.
It'll describe in further detail a lot of that a lot of the topics I presented today, but may not give you the same kind of outlook or color that I added to it. So hopefully you found some of my information helpful to help you better understand how some of these things fit together. But that's pretty much all I had. So thank you very much for your attention and thank you from One Identity for joining.
[MUSIC PLAYING]