Garmin + ELK Fun

The quality of reports on garmin connect is disappointing to say the least. Some things I wanted:

  • Daily/Weekly over time. Garmin has fixed presets. At 6 months or a year, you only get monthly counts.
  • Control the date range - Garmin has week, month, 6 month, year fixed views, and fixed buckets in said views.
  • Exclude my commute rides. It skews the averages. What is my cadence/pace on only >20km bike rides?
  • It would be fun to do.

Unfortunately Garmin charges(!) to use their API. Disappointing. The simplest method for data I found was to manually export(20 events at a time..) from garmin connect. This gave me a bunch of .csv files to work with.

Initially I tried to just use a file input on logstash with the csv filter. Unfortunately Garmin is not consistent with it's data formats in events. The biggest issue were pace/speed fields varying. "1:50 min/100 m" - swimming, "15.6" (kph) - cycling/rowing/some swimming, or "4:30" (4:30/min km) - running/walking.

I ended up writing a python parser to do the following:

  • Read conf file indicating format of csv/important fields for each activity type in the csv
  • Convert fields into usable formats for Kibana graphing
  • Generate json logging/send to Logstash 'ready to go' for ES/Kibana.

The current version of the parser is here . It's rough, but works. Imported all 377 activities from my garmin account going back 2.5 years. Including some that were initially imported from run keeper to garmin, resulting in very odd formatting.

Now that I have the initial import working, i'll add a check against ES to prevent sending a duplicate activity to ES during imports. Ideally i'll find a way to automate fetching the 'last 20' events csv daily, but worst case i'll feed the script a fresh csv once a week.

Example .csv from garmin connect:

Untitled,Lap Swimming,"Wed, Jun 10, 2015 4:44",18:51,"1,000 m",240,--,--,1:53 min/100 m,1:33 min/100 m,--,--,--,--,480,--,12,--,--,40,
Stockholm Running,Running,"Wed, Jul 15, 2015 6:43",37:20,8.07,615,28,5.0,4:37,3:18,177,194,--,--,--,84,--,--,--,--,
SATs,Treadmill Running,"Tue, Jan 26, 2016 17:41",25:01,5.51,466,0,5.0,4:32,4:20,174,194,--,--,--,81,--,--,--,--,
Stockholm Cycling,Cycling,"Thu, Jul 16, 2015 6:33",3:07,1.15,36,6,--,22.1,25.5,--,--,--,--,--,--,--,--,--,--,
Untitled,Walking,"Sat, Sep 20, 2014 8:19",23:07,2.29,165,28,--,10:35,2:25,--,--,--,--,--,--,--,--,--,--,
Untitled,Rowing,"Tue, Sep 16, 2014 15:23",--,2.60,154,--,--,14.2,--,--,--,--,--,--,--,--,--,--,--,
Untitled,Strength Training,"Tue, Jul 14, 2015 6:42",22:11,0.00,0,0,--,0.0,--,--,--,--,--,--,--,--,--,--,--,
Open Water Swim -Tri,Open Water Swimming,"Sun, Aug 23, 2015 9:00",29:41,1.65,383,--,--,3.3,--,--,--,--,--,--,--,--,--,--,--,

Configuration for the parser:

[config]
csv_order: Activity Name,Activity Type,Start,Time,Distance,Calories,Elevation Gain,Training Effect,Avg Speed(Avg Pace),Max Speed(Best Pace),Avg HR,Max HR,Avg Bike Cadence,Max Bike Cadence,sumStrokes,Avg Run Cadence,Avg Strokes,Min Strokes,Best SWOLF,Avg SWOLF

csv_integers: Calories,Elevation Gain,Avg HR,Max HR,Avg Bike Cadence,Max Bike Cadence,sumStrokes,Best SWOLF,Avg SWOLF
csv_floats: Distance,Training Effect,Avg Speed(Avg Pace),Max Speed(Best Pace)

[activities]
Cycling_fields: Activity Name,Activity Type,Start,Time,Distance,Calories,Elevation Gain,Training Effect,Avg Speed(Avg Pace),Max Speed(Best Pace),Avg HR,Max HR,Avg Bike Cadence,Max Bike Cadence,sumStrokes

Lap Swimming_fields: Activity Name,Activity Type,Start,Time,Distance,Calories,Avg Speed(Avg Pace),Max Speed(Best Pace),Avg HR,Max HR,sumStrokes,Avg Strokes,Min Strokes,Best SWOLF,Avg SWOLF

Open Water Swimming_fields: Activity Name,Activity Type,Start,Time,Distance,Calories,Avg Speed(Avg Pace),Max Speed(Best Pace),Avg HR,Max HR,sumStrokes,Avg Strokes,Min Strokes,Best SWOLF,Avg SWOLF

Strength Training_fields: Activity Name,Activity Type,Start,Time,Calories,Avg HR,Max HR

Running_fields: Activity Name,Activity Type,Start,Time,Distance,Calories,Elevation Gain,Training Effect,Avg Speed(Avg Pace),Max Speed(Best Pace),Avg HR,Max HR,Avg Run Cadence

Key things:

  • csv_order: Since you can order/choose fields in garmin, needed to specify the order they will be in.
  • csv_integers/floats: Fields from the csv that should become an integer/floats, otherwise you can't do math in Kibana/ES
  • *_fields: fields for each activity type. The csv will put '--' in each field with no data/irrelevant, to prevent extra fields/cast issues in ES I ommit the fields that aren't relevant to each activity.

I will add host/port for the udp logger and some other parameters when I go back to clean up the initial script.

The relevant logstash configuration was:

input {
  udp {
    port => 6400
    codec => "oldlogstashjson"
    type => "garmin"
    workers => 5
  } 
}
output {
  if [type] == "garmin" {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "garmin-%{+YYYY.MM.dd}"
      template => "/opt/logstash-custom/templates/garmin.json"
      template_name => "garmin"
    }
  } 
}

The result is what I wanted. I can exclude all of the < 15km bike rides from my overall stats, specify my own ranges/buckets, etc:

Also created views for each activity.

Cycling:

Running:

Swimming:

more ...

RabbitMQ and Logstash

My current setup is a pretty common Logstash stack. It consist of various inputs coming into a non filtering Logstash which dumps these messages into RabbitMQ. From RabbitMQ I can have multiple Logstash indexers slurp from RabbitMQ, apply filtering, and output to Elasticsearch, on to further processing, etc.

The only articles I found when attempting this referenced the AMPQ input/output plugins, which is not how one integrates with RabbitMQ these days. As well, there wasn't much information around tuning. Here is my current configuration along with some comments of various settings I tried/results I experienced.

The flow:

Log/Data inputs -> LS(No filters) -> RabbitMQ -> LS Indexers(filtering) -> Elasticsearch/Other.

RabbitMQ output configuration:

rabbitmq {
    exchange => "logstash-rabbitmq"
    exchange_type => "direct"
    key => "logstash-key"
    host => ""
    workers => 4
    durable => true
    persistent => true
}

RabbitMQ input configuration:

rabbitmq {
    host => ""
    queue => "logstash-queue"
    durable => true
    key => "logstash-key"
    exchange => "logstash-rabbitmq"
    threads => 3
    prefetch_count => 50
    port => 5672
    user => ""
    password => ""
}

Key values:

  • exchange - Match on output and input plugin
  • key - Match on output and input plugin
  • queue - Must be set on input plugin, it should also match on all indexers. Otherwise, a 'random' queue is created each time LS connects.
  • durable - Set to true so queue persist through a crash or restart of RabbitMQ.
  • persistent - Set to true so messages are written to disk and persist through RabbitMQ restarting.

Initially I had a hard time achieving more than 1500msg/sec on the indexer side. This would result in the queue to fill up indicating my Logstash indexers had fallen behind.

RabbitMQ Tuning

At first it would appear the bottleneck is on the RabbitMQ server. Checking iotop one would find allot of disk I/O. RabbitMQ has some great writeups about why and when it writes to disk here. In short, since message acknowledgement is enabled every message is written to disk. This does not mean every message is read from disk, which you can observe from iotop. Lots of writing, no reading. RabbitMQ will eventually start reading from disk if the queue gets large enough and RabbitMQ consumes enough of system memory. I added the following configuration to discourage RabbitMQ from this behaviour:

{vm_memory_high_watermark, 0.8},
{vm_memory_high_watermark_paging_ratio, 0.8}

This is the only configuration I've done thus far with RabbitMQ, and after the Logstash tuning below it may not even be needed. RabbitMQ is fast, and I don't expect it to be my bottleneck anytime soon.

Logstash Tuning

The setting no_ack => true greatly improved the throughput, or so it appeared at first. By default Logstash sends an acknowledgement after each message is processed. Turning off this setting means RabbitMQ will not wait for the 'ack' and continue sending messages to Logstash. This resulted in the queue being empty even under the highest load, and what appeared to be better throughput. Seeing 7-9k/sec events yet nothing in the Queue. Unfortunately this was just an illusion.

Logstash uses a 'SizedQueue' hardcoded to 20 to throttle messages from input -> filter. The intent is if filters are busy this will block inputs, which in theory would stop receiving new events until there is free resources.

This works with no_ack=>false because RabbitMQ plugin sends an ack after it puts the event in the SizedQueue. If the SizedQueue is full it blocks until there is space, which blocks the ack. This in turns prevents more events coming from RabbitMQ, and thus the Queue on RabbitMQ begins filling up.

With no_ack=>true RabbitMQ ignores the prefetch count and sends all of the events in the queue straight into LogStash. Once the SizedQueue is fulll the events continue to pile up behind the ones waiting for resources. Memory consumption grows and eventually OOM Exceptions are thrown. On top of this, you lose every message that had been sent from RabbitMQ -> Logstash since no_ack=>true means RabbitMQ forgets about the message the moment it sends it. You also lose messages every time you restart an indexer with this configuration.

Sending the ack very well may be a performance decrease, though I don't know how much of one, but it certainly is not worth the risk of losing messages.

The way I've been able to increase throughput is by increasing the number of filter workers Logstash uses. According to this Jira issue setting worker threads in sysconfig do not work, so one must use '--filterworkers'. You can set this in LS_OPTS value, or however you wish to pass it at startup. This allows faster processing of events from input->filter thus preventing RabbitMQ from blocking when waiting for a spot in the queue. Unfortunately this limits you to using on filters that are threadsafe.

With these settings we've reached 6-9k/sec per indexer at peak with no hesitation. This performance is greatly dependent on your filters, which we have a considerable amount of.

more ...

Merge Filter for Logstash

The merge filter let's you combine two events that occur within a period into a new single event. This lets you graph/analyze the combination of certain events. The filter is available on github here

Options:

  • key => Unique identifier, used to match the two events you want to merge.
  • order => 'first' or 'last', the order the events should arrive
  • merge_tag => Tag(s) to add on the new event.
  • period => Max length of time between events(seconds).

There is an existing bug in Logstash that impacts filters which create new events. The github issue is here . This bug causes the new event to hit every filter below it in the config, regardless of the conditionals. This can result in additional tags being added, or other filters being applied to the event if they match. Not to mention a performance hit.

Fortuantely there is a work around, albeit crude/cumbersome. The deprecated 'tags => ' and 'type =>' conditionals within a filter statement still work. My current method of minimizing impact from this bug is a three step process.

  • Write the conditional i'd normally use for the merge filter, but instead mutate and add a unique tag to that event. This can be done anywhere in the config.
  • At the very bottom of my filter configuration list the merges out, using the 'tags =>' option with the unique tag added above
  • Use Mutate at after the merges to remove these tags so they do not end up in ElasticSearch.

In my setup I also group all my merges in a single conditional at the bottom of the config to limit the number of events that then are compared against the 'tags =>' conditional. If you have any filters/conditionals without a 'tags =>' option after your merges the event will be ran through it. This same issue applies to the metric, alert, clone, and any other event that creates new Logstash events.

Here is an example configuration doing the above.

input {
  file {
    type => example
    path => "/tmp/exampleData.log"
  }
}

filter {

#work around for event bug, add tags to events I want to track.
  if [type] == "example" {
    if "a" == [message] {
      mutate {
        add_tag => "A"
      }
    }
    if "b" == [message] {
      mutate {
        add_tag => "B"
      }
    }

###Other filters/configuration ###

#merges
    merge {
      tags => "A"
      key => "key-A"
      order => "first"
      merge_tag => "mergeTagA"
      period => 5 
    }
    merge {
      tags => "B"
      key => "key-A"
      merge_tag => "mergeTagB"
      order => "last"
    }
  }

#Remove the temp tags    
  mutate {
    remove_tag => ["A","B"] 
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Sending this data:

[dopey@dopey ~]# cat a
a
a
b
c
d
[dopey@dopey ~]#

Here is the result:

{
       "message" => "a",
      "@version" => "1",
    "@timestamp" => "2014-08-01T13:45:36.729Z",
          "type" => "example",
          "host" => "dopey.io",
          "path" => "/tmp/exampleData.log",
          "tags" => []
}
{
       "message" => "a",
      "@version" => "1",
    "@timestamp" => "2014-08-01T13:45:36.747Z",
          "type" => "example",
          "host" => "dopey.io",
          "path" => "/tmp/exampleData.log",
          "tags" => []
}
{
       "message" => "b",
      "@version" => "1",
    "@timestamp" => "2014-08-01T13:45:36.750Z",
          "type" => "example",
          "host" => "dopey.io",
          "path" => "/tmp/exampleData.log",
          "tags" => []
}
{
       "message" => "c",
      "@version" => "1",
    "@timestamp" => "2014-08-01T13:45:36.753Z",
          "type" => "example",
          "host" => "dopey.io",
          "path" => "/tmp/exampleData.log"
}
{
       "message" => "d",
      "@version" => "1",
    "@timestamp" => "2014-08-01T13:45:36.758Z",
          "type" => "example",
          "host" => "dopey.io",
          "path" => "/tmp/exampleData.log"
}
#New merged event below.
{
            "@version" => "1",
          "@timestamp" => "2014-08-01T13:45:36.750Z",
              "source" => "dopey.io",
           "merge.key" => "key-A",
    "merge.time_delta" => 0.003,
             "message" => "a",
                "type" => "example",
                "host" => "dopey.io",
                "path" => "/tmp/exampleData.log",
                "tags" => [
        [0] "mergeTagA",
        [1] "mergeTagB"
    ],
           "message-2" => "b"
}

Some future improvements before I submit it to the logstash-contrib:

  • Allow more than 2 events to be tracked/merged. Working on how i'd configure/manage this.
  • Prevent 'add_tag' from being added to the new event -- only merge_tag. If I tried to prevent this behaviour, the results were unpredictable. For now, if you have add_tag on the merge filter the tag will be added to both the original trigger event, and the new event. merge_tag will only be added on the new merged event.
  • Evaluate time on incoming events before overriding existing value. Right now, if your events come out of order the last one seen by logstash will be stored.

I've used this for a variety of use cases so far. It allows me to see how often action A followed by B happened, or A followed by C, etc. Hopefully others find a need as well.

more ...

'Alert' Filter for Logstash

I've attempted to write a filter for logstash. It's called 'Alert', i've submitted a pull request on github for it here. It is a work in progress, but the basic functionality I wanted appears to work. The initial requirements were:

  • Generate a single alert when something goes wrong. Don't spam.
  • Be able to alert on both volume of a particular message or lack thereof
  • If alerting on lack thereof, alert even if there was 0 messages during the period

I started with the throttle filter but felt the use case I wanted to use was a bit different than that filter was intended to do. I used it's code as a base for this filter. Disclaimer: This is my first attempt at Ruby, so I am sure it is quite rough code wise. :D

For all the examples below we will use the typical httpd access logs we all know and love.

input {
  file {
    path => "/var/log/httpd/access_log"
    type => "httpd"
  }
}

filter {
  grok {
    match => ["message", "%{COMBINEDAPACHELOG}" ]
  }
  date {
  #14/Jun/2014:23:31:16 -0400
    match => ["timestamp", "dd/MMM/YYYY:HH:mm:ss Z"]
  }
  • Generate an alert if there are more than 10 404s in a 5 minute period. We will use the [response] field created by %{COMBINEDAPACHELOG} and generate an alert for each message we see with an response code '404'. This is done using the max_threshold option.
#only call alert if response == 404
  if [response] == "404" {
    #generate an alert
    alert {
    max_threshold => 10
    max_threshold_msg => '404 Issues'
    period => 300
    key => "404Alert"
    }
 }

The above uses the key "404Alert" to track these events within the filter. For each trigger of this filter the count will be incremented. If 11 or more occurences are counted within 300 seconds a single event is created for the entire 300 second period.

  • Generate an alert if you do not see atleast 10 '202's in a 5 minute period. Using the same [response] field as above. You have two options when using the min_threshold option. Both options require the below config -- The actual event you are wanting to track:
#only call alert if response == 202
  if [response] == "202" {
    #generate an alert
    alert {
    min_threshold => 10
    min_threshold_msg => 'Traffic to site below threshold'
    period => 300
    key => "202Alert"
    }
 }

This configuration will generate an alert if there is less than 10 events in a 5 minute period. However, the event can only be created when the alert filter is triggered. This means if you have 3 events in the first 5 minutes, then another 90 minutes later you will receive the alert at the 95 minute mark.

To get around this limitation there is a 'heartbeat' flag one can use. This is the second option to min_threshold. Add the alert in either the root of filter(triggers on every event) or in another filter that has a high enough volume of events to ensure it will trigger. It would look like this:

  #Place in a 'high traffic' area. Either root of filter{} or somewhere to ensure the filter is regularly called
  alert {
  heartbeat => true
  min_threshold => 10
  min_threshold_msg => 'Traffic to site below threshold'
  period => 300
  key => "202Alert"
  }

For now you have to mirror your configuration in both alert filters, especially the key as that is how the filter tracks your events. The heartbeat flag causes the filter to only check two things:

  • Does an entry for this key exist? If not, make one and exit
  • If the key does exist, has my time expired? If so, evaluate the min_threshold. Alert if needed, delete key, exit.

The goal is to let you monitor the presence of certain logs to confirm stuff and things are working. If all the things die you still want to receive an alert! Once you've created your filters, you can then leverage the results in output{} . When a threshold alert is triggered a new event is created. The message and tag is configurable, by default the tag is 'alert_filter'. Here is how you could send an email for the 404 alert above.

output {
  if [alert_filter.key] == "202Alert"  {
    email {
      from => "logstash@dopey.io"
      subject => "%{message}"
      to => "dopey@dopey.io"
      via => "sendmail"
      body => "Min Threshold:%{min_threshold}\n\nNum of 202s:%{alert_filter.count}"
      options => { "location" => "/sbin/sendmail" }
    }
  }
}

Possible Options in logstash.conf:

#key - Unique identifier to track/count events
config :key, :validate => :string,  :required => true
#All fields created are formatted as "alert_filter.$tag" by default. Can change with this setting.
#Also a tag will be created for the event with this value.
config :filter_tag, :validate => :string, :default => "alert_filter", :required => false

#Min threshold. Alert is created if # of events is < this number.
config :min_threshold, :validate => :number, :default => -1, :required => false
#Value that will be [message] for the created event. 
config :min_threshold_msg, :validate => :string, :default => "Min Threshold Alert", :required => false

#Max threshold. Alert is created if # of events is > this number.
config :max_threshold, :validate => :number, :default => -1, :required => false
#Value that will be [message] for the created event. 
config :max_threshold_msg, :validate => :string, :default => "Max Threshold Alert", :required => false

#Used with min_threshold to generate alerts independent on the event being tracked. 
config :heartbeat, :validate => :boolean, :default => false

#Time period to track, in seconds.
config :period, :validate => :string, :required => true

Some future enhancements i'd like to do:

  • Not require duplication of configuration when leveraging heartbeat.
  • Configure alarms only during certain times.
  • Improve performance, code is rough.
  • Make various tags created optional/configurable.

Hopefully someone else finds a use for this filter. We are just starting to implement this ourselves so I will likely be tweaking it as I find problem areas.

more ...

Snapshots in ElasticSearch

I'm in the process of building out a logstash setup at work. We have a considerable amount of logs from a number of sources. One thing I wanted to figure out was how to backup/restore the various dashboards I am making along the way. Doing them one at a time from Kibana seemed inefficient. ElasticSearch snapshot feature is the answer. In general it's to backup/restore any/all of your indices. Your dashboards are stored in kibana-int index.

The steps are listed out here Here. Here is how it went for me. There isn't much more here than what is on that page, their instructions are quite good:)

Create the local repo:

curl -XPUT 'http://localhost:9200/_snapshot/dopey_backup' -d '{
    "type": "fs",
    "settings": {
        "compress" : true,
        "location": "/var/data/elasticsearch/backup"
    }
}'

The directory used for backups needs to be writable by the elasticsearch users:

[dopey@dopey-logstash ~]#  curl -XPUT "localhost:9200/_snapshot/dopey_backup/snapshot_1?wait_for_completion=false"
{"error":"SnapshotCreationException[[dopey_backup:snapshot_1] failed to create snapshot]; nested: FileNotFoundException[/var/data/elasticsearch/backup/snapshot-snapshot_1 (Permission denied)]; ","status":500}
[dopey@dopey-logstash ~]# chown -R elasticsearch:elasticsearch /var/data/elasticsearch/backup/
[dopey@dopey-logstash ~]#  curl -XPUT "localhost:9200/_snapshot/dopey_backup/snapshot_1?wait_for_completion=false"
{"accepted":true}

You can see the backup being built in your set location:

[dopey@dopey-logstash ~]# ls -lh /var/data/elasticsearch/backup/
total 16K
-rw-r--r--. 1 elasticsearch elasticsearch   31 Jul  2 13:27 index
drwxr-xr-x. 5 elasticsearch elasticsearch 4.0K Jul  2 13:27 indices
-rw-r--r--. 1 elasticsearch elasticsearch  522 Jul  2 13:27 metadata-snapshot_1
-rw-r--r--. 1 elasticsearch elasticsearch  256 Jul  2 13:27 snapshot-snapshot_1
[dopey@dopey-logstash ~]#

Check on status of the backups:

[dopey@dopey-logstash ~]# curl -s -XGET "localhost:9200/_snapshot/dopey_backup/_all"| python2 -mjson.tool
{
    "snapshots": [
        {
            "duration_in_millis": 7397, 
            "end_time": "2014-07-02T11:39:08.893Z", 
            "end_time_in_millis": 1404301148893, 
            "failures": [], 
            "indices": [
                "logstash-2014.07.02", 
                "kibana-int"
            ], 
            "shards": {
                "failed": 0, 
                "successful": 10, 
                "total": 10
            }, 
            "snapshot": "snapshot_1", 
            "start_time": "2014-07-02T11:39:01.496Z", 
            "start_time_in_millis": 1404301141496, 
            "state": "SUCCESS"
        }
    ]
}
[dopey@dopey-logstash ~]#

The kibana-int indice is what contains your dashboards. If I ever wanted to just restore those i'd do:

curl -XPOST "localhost:9200/_snapshot/dopey_backup/snapshot_1/_restore" -d '{
    "indices": "kibana-int",
    "ignore_unavailable": "true",
    "include_global_state": false

}'
more ...

Learnings from Logstash

I've been a user of splunk before, and can speak first hand the value of being able to visualize large sets of data. You simply can identify issues/patterns/stuff that you'd never otherwise see. I've started working with logstash at my new company, it's been a great experience. I am notorious for skimming docs, and experimenting to figure things out..So Here are some various things i've learned/bashed my head against on the way. I am sure many, if not all, are explained in the docs :)

Usefool tools:

  • Grok Debugger -- Simply amazing. Once you get the hang of logstash/filters, it makes building new filters a breeze.
  • #logstash @ freenode - Friendly and helpful
  • Logstash.net and elasticsearch.org have top notch documentation.

Logstash leverages grok which has many prebuilt patterns that make standardized logs a breeze to import. I have a great deal of custom logs, which left me with building(and still much left) customized grok patterns. A few tips:

Start off with the raw logs going into logstash. Find the field you'd like to build a pattern for to better tag/break down. i.e. 'message' field. Take that field to the grok debugger and build your pattern. Once you have your pattern build your config.

input {
  syslog {
    type => syslog
    port => 2514
  }
}

filter {

  if [type] == "syslog" {
    grok {
      break_on_match => "true"
      match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:junk} took '%{NUMBER:response_time:int}' ms%{GREEDYDATA:junk}Stuff Made = %{NUMBER:stuff_made:int}%{GREEDYDATA:junk}Things made = %{NUMBER:things_made:int}%{GREEDYDATA:junk}"]
      add_tag => [ "stuff", "things" ]
      tag_on_failure => []
    }
  }
}

Pretty simple config, things I bashed my head against:

  • Building a grok for the entire log entry as it came from syslog. The match line is only matching against the 'message', not the entire line. Logstash is smart enough to get the timestamp, logsource, program name, etc and leaves the rest of it was 'message'.
  • All entries were receiving '_grokparsefailure' tag, even when they also got the tags I set. I believe this is due to me attempting to match lines on filters that fail(hence the tag). The end goal is to only apply a filter to a line I expect to match, but until then using 'tag_on_failure = []' will prevent this duplicate tag from being created.
  • 'type' can be anything you want. 'syslog' is an input plugin, and 'message' is a field of data -- and in this case created automatically by using the syslog input plugin.
  • Many tutorials use type => 'syslog' in the grok statement. This is deprecated, the above is the current method.
  • If you want to graph on a number you tag in a log, you need to include the :int after the semantic name like above. Otherwise you will get a class exceptio when building the graph.

Handy commands to use with ElasticSearch:

curl -XDELETE 'http://localhost:9200/dopeysIndex*/'

I initially set logstash up, configured the syslog plugin and pumped everything into it -- a few months ago. The result? A ton of logs, but not very usable. I could search over a large set but I couldn't graph response time, error codes, or anything of real use. It was a very good method of consuming large amounts of disk space though:). The above command lets you delete indexes. Either rotate older indexes or wipe out everything. Be careful though, kibana-int holds your dashboards and you can just as easily delete it ;). I wrote another page on using snapshots to backup data in general, or specific indexes such as kibana-int here .

Queries in Kibana:

  • Lucene query article from ElasticSearch
  • You can do things like _exists_:$tag, or tag:(query1 query2). This lets you create graphs that may cross multiple tags, but still limit data how you want. This also goes back to how you build the dataset, which is something I am still going back and forth on.
  • AND, OR, NOT are how you string queries together. All of this is in the above link, but I know myself and other colleagues felt a bit unimpressed by kibana's search initially until we learned the Lucene query language. Much better now.

Things I am still wrestling with:

  • How to layout logstash.conf. With the amount of custom logs I am parsing it can get messy in a hurry.
  • Best use of tags/fields. By role, vendor, both? Many queries may cross roles/vendors but I'd still want to filter down when I can to improve performance.
  • How to deal with logs I am not tagging. Store to a different index I rotate more frequent and store tagged logs only in another? Discard them instantly?
  • Whether to use logstash-forwarder or not, currently not
more ...