GeoEvent Gateway: Modifying the Kafka Properties

499
0
11-07-2019 12:35 PM
EricIronside
Esri Regular Contributor
1 0 499

DISCLAIMER: The content of this blog is unofficial and should be considered advanced configuration advice for anyone who is familiar with Kafka configuration.  I do not recommend changing the default settings in your kafka.properties file without significant research and testing.

The kafka.properites file

The setting file to be modified is located here on Windows:

   C:\Program Files\ArcGIS\Server\GeoEvent\gateway\etc\kafka.properties

The original settings in this file were chosen to optimize performance (at the expense of potentially large disk usage).  I recommend you start by adding the log.roll settings at the bottom before you change any of the existing settings.

Note: The default settings from Kafka creates a large set consumer offset partitions each with a mimum size of 20 MB. This large number is what gives the system such good performance (parallelism).  A new/empty installation for GeoEvent Gateway requires at least 1 GB of disk space just to get started. Each input/output you add after that will require a minimum of 360 MB additional disk space before you process any events.  Please note all of these sizes are minumum numbers and likely to grow when you use the GeoEvent.

You can UPDATE the following settings:

log.retention.bytes - This determines a minimum amount of disk space for a single partition.  The default is 100 MB.  This is not a maximum size, and I've seen some partitions grow up to more 3x the setting for this property. For a high velocity data stream, the size of each partition will probably never go below this number.  Also, this is per-partition, so multiply this size by 3 for each input/output you have. If you don't have a lot of high-velocity data, I don't think it will harm anything to reduce this number by 1/2 to 1/3.

log.retention.hours - This is the number of hours to leave logs on disk before considering deleting them.  The default is 1 hour.  You can make this shorter by removing this property and replacing it with log.retention.minutes. I have conisdered, but not tested replacing log.retention.hours=1 with log.retention.minutes=30. 

log.segment.bytes - This is the maximum size of a log file (the actual file on disk) before Kafka rolls over to another file.  Default is 100MB.  If you have high velocity data, you might end up with a lot of these, if not you might only have one.  I would consider updating this to 50MB, 25MB, or even 10MB. The lower the velocity of your data, the smaller the size you can make this.  The higher velocity your data, the larger this will be. If you set this too small, Kafka will continually be rolling over files. If you set it too big, Kafka will never roll over, and you'll keep log files (and old events in the queue) around forever (see log.roll.ms property below to avoid this).

You can ADD the following settings:

log.roll.ms - This determines a life span for any specific file on disk; after this many milliseconds, Kafka will roll over to a new file regardless of file size.  I would set this somewhere around 1/2 to 1/10 of your log.retention.hours/log.retention.minutes setting (adjusting for ms).  If you stick to 1/2 and your log files are not rolling over based on size, then you'll rarely have more than 3 log files on disk.  Once again, this totally depends on the velocity of your data.  I would avoid setting this value to anything less than about 5 or 10 minutes (just a gut feeling).

log.roll.jitter.ms - this is a fudge factor that allows the sytem to be less punctual when rolling to a new log file. If you are rolling files every 20 minutes and your log file is 20 min old, then Kafka has +- the log.roll.jitter.ms to roll the file over.  My gut says this should be no less than 30 seconds, and generally I would suggest setting it at 1/10 of log.roll.ms (if log.roll.ms=1200000 [20min] then log.roll.jitter.ms=120000 [2min]).

Hope this helps!
Eric

About the Author
Esri Professional Services Real-Time GIS Team GeoEvent Sr. Product Enginner