Quantcast
Channel: Tips & Tricks – Splunk Blogs
Viewing all 621 articles
Browse latest View live

Creating a Splunk Javascript View

$
0
0

Once of the best things about Splunk is the ability to customize it. Splunk allows you to make your own Javascript views without imposing many limitations on you. This means you make apps that includes things such as:

  • Custom editors or management interfaces (e.g. lookup editing, slide-show creation)
  • Custom visualizations (though modular visualizations are likely what you will want to use from now on)
  • etc.

That said, getting started on creating a Splunk Javascript view can appear a little daunting at first. It really isn’t that hard though. Keep reading and I’ll explain how to do it.

Parts of a Splunk Javascript View

Before we get started, lets outline the basic parts of a custom Javascript view:

Component Path Example Description
Javascript view file appserver/static/js/views/HelloSplunkJSView.js This is the main view Javascript file
HTML template file appserver/static/js/templates/HelloSplunkJSView.html This is where you can put HTML you want to be rendered
Stylesheet file appserver/static/css/HelloSplunkJSView.css Here is where you put your custom CSS
Third party libraries appserver/static/contrib/text.js Include any third-party libraries under a contrib directory
Splunk View default/data/ui/views/hellosplunkjs.xml This is the view where the custom view will appear
View Javascript file appserver/static/hellosplunkjs.js This Javascript will put your custom view on the dashboard

Lets get started

We are going to make a very simple app that will allow you to run a search and get the most recent event. The completed app is available on Github for your reference.

Step 1: make a basic Splunk app

We are going to put out content into a basic Splunk app. To do this, we will make a few files inside of a Splunk install.

Step 1.1: make app directory

First, make a directory in your Splunk install under the /etc/apps directory for the app you are going to create. I’m going to name the app “hello-splunkjs”. Thus, I will make the following directory: /etc/apps/hello-splunkjs

Step 1.2: make app.conf

Now, lets make the app.conf file. This goes here in your Splunk install: /etc/apps/hello-splunkjs/default/app.conf

The contents will just be this:

[launcher]
version = 0.5
description = Example of writing basic SplunkJS views

[ui]
is_visible = true
label = Hello SplunkJS 

You should see the app in Splunk If you restart it.

Step 2: make the basic view

Now let’s get started making the view.

Step 2.1: deploy the template

Now, let’s make the basic Javascript view file. To make things a little easier, you can start with the template available on Github. The Javascript views ought to be placed in your app in the /appserver/static/js/views/ directory. In our case, the view will be in /etc/apps/hello-splunkjs/appserver/static/js/views/HelloSplunkJSView.js.

All of the places in the template with a CHANGEME will be replaced in the next few steps.

Step 2.2: update the app references

We need to update the references in the require.js statement to point to your app. This is done by changing /app/CHANGEME to /app/hello-splunkjs (since our app directory is hello-splunkjs). This results in the following at the top of/etc/apps/hello-splunkjs/appserver/static/js/views/HelloSplunkJSView.js:

define([
    "underscore",
    "backbone",
    "splunkjs/mvc",
    "jquery",
    "splunkjs/mvc/simplesplunkview",
    'text!../app/hello-splunkjs/js/templates/CHANGEMEView.html', // Changed to 
    "css!../app/hello-splunkjs/css/CHANGEMEView.css" // CHANGE_THIS: Modify the path to use a template
], function(

See the diff in Github.

Step 2.3: add the template and stylesheet files

We now need to add the files that will contain the stylesheet and the HTML. For the stylesheet, make an empty file in /etc/apps/hello-splunkjs/appserver/static/css/HelloSplunkJSView.css. We will leave it empty for now.

For the HTML template, make a file /etc/apps/hello-splunkjs/appserver/static/js/templates/HelloSplunkJSView.html with the following:

Hellll-lllo Splunk!

See the diff in Github.

Step 2.4: set the view name

Now, lets change the other CHANGEME’s in the view’s Javascript (/etc/apps/hello-splunkjs/appserver/static/js/views/HelloSplunkJSView.js). We are naming the view “HelloSplunkJSView” so change the rest of the CHANGEME’s accordingly.

This will result in:

define([
    "underscore",
    "backbone",
    "splunkjs/mvc",
    "jquery",
    "splunkjs/mvc/simplesplunkview",
    'text!../app/hello-splunkjs/js/templates/HelloSplunkJSView.html',
    "css!../app/hello-splunkjs/css/HelloSplunkJSView.css"
], function(
    _,
    Backbone,
    mvc,
    $,
    SimpleSplunkView,
    Template
){
    // Define the custom view class
    var HelloSplunkJSView = SimpleSplunkView.extend({
        className: "HelloSplunkJSView",
        
        defaults: {
        	
        },
        
        initialize: function() {
        	this.options = _.extend({}, this.defaults, this.options);
        	
        	//this.some_option = this.options.some_option;
        },
        
        render: function () {
        	
        	this.$el.html(_.template(Template, {
        		//'some_option' : some_option
        	}));
        	
        }
    });
    
    return HelloSplunkJSView;
});

See the diff in Github.

Step 2.5: insert the text.js third party library

To make things easy, we are going to use a third-party library called text.js. The nice thing about Splunk views is that you can use the plethora of third-party Javaacript libraries in your apps. It is best to keep third-party libraries in a dedicated directory so that you can easily determine which parts were made by someone else. Let’s put those under /appserver/static/contrib. In the case of our app, the path will be /etc/apps/hello-splunkjs/appserver/static/contrib.

text.js is available from https://github.com/requirejs/text. Put it the app in the path /etc/apps/hello-splunkjs/appserver/static/contrib/text.js. Next, we will need to tell our view where to find text.js by adding a line to require.js’s shim. Put the following at the top of /etc/apps/hello-splunkjs/appserver/static/js/views/HelloSplunkJSView.js:

require.config({
    paths: {
    	text: "../app/hello-splunkjs/contrib/text"
    }
});

See the diff in Github.

Step 3: add to a dashboard

Step 3.1: make the view

We need to make a view that will host the Javascript view we just created. To do this, we will create a simple view that includes an place-holder where the view will render.

To do this, create the following view in /etc/apps/hello-splunkjs/default/data/ui/views/hellosplunkjs.xml:

<?xml version='1.0' encoding='utf-8'?>

<form script="hellosplunkjs.js" >
	<label>Hello SplunkJS</label>
	
	<row>
		<html>
			<div id="placeholder_for_view">This placeholder should be replaced with the content of the view</div>
		</html>
	</row>
</form>

See the diff in Github.

Step 3.2: put the view in the app’s navigation file

Next, make the nav.xml to register the view by making the following file in /etc/apps/hello-splunkjs/default/data/ui/nav/default.xml:

<nav color="#3863A0">
  <view name="hellosplunkjs" default="true" />
</nav>

See the diff in Github.

Restart Splunk and navigate to the view; you should see it with the text “This placeholder should be replaced with the content of the view”.

Step 3.3: wire up the Javascript view to dashboard

Now, we need to wire-up the Javascript view to the dashboard. To do so, make the following file at /etc/apps/hello-splunkjs/appserver/static/hellosplunkjs.js:

require.config({
    paths: {
        hello_splunk_js_view: '../app/hello-splunkjs/js/views/HelloSplunkJSView'
    }
});

require(['jquery','underscore','splunkjs/mvc', 'hello_splunk_js_view', 'splunkjs/mvc/simplexml/ready!'],
		function($, _, mvc, HelloSplunkJSView){
	
    // Render the view on the page
    var helloSplunkJSView = new HelloSplunkJSView({
        el: $('#placeholder_for_view')
    });
    
    // Render the view
    helloSplunkJSView.render();
	
});

This script instantiates an instance of the HelloSplunkJSView and tells it to render in the “placeholder_for_view” element (which was declared in the hellosplunkjs.xml view).

See the diff in Github.

Step 4: add click handlers

Now, lets make something in the view that is interactive (takes input from the user).

Step 4.1: create an HTML element that is clickable

We need to change the template file to include a clickable element. To do this, modify the file /etc/apps/hello-splunkjs/appserver/static/js/templates/HelloSplunkJSView.html with the following:

Hellll-lllo Splunk!
<div class="get-most-recent-event">Get the most recent event in Splunk</div>
<textarea id="most-recent-event"></textarea>

See the diff in Github.

Step 4.2: wire up the clickable element to a function

Next, wire up a click handlers along with a function that will fire when the user clicks the “get-most-recent-event” element. We do this by adding an events attribute that connects the HTML to a function called doGetMostRecentEvent(), which we will create in the next step:

define([
    "underscore",
    "backbone",
    "splunkjs/mvc",
    "jquery",
    "splunkjs/mvc/simplesplunkview",
    'text!../app/hello-splunkjs/js/templates/HelloSplunkJSView.html',
    "css!../app/hello-splunkjs/css/HelloSplunkJSView.css"
], function(
    _,
    Backbone,
    mvc,
    $,
    SimpleSplunkView,
    Template
){
    // Define the custom view class
    var HelloSplunkJSView = SimpleSplunkView.extend({
        className: "HelloSplunkJSView",
        
        events: {
        	"click .get-most-recent-event" : "doGetMostRecentEvent"
        },

See the diff in Github.

Step 4.3: run a search from Javascript

Now, lets add a require statement to import the SearchManager so that we kick off a search. We do this by adding “splunkjs/mvc/searchmanager” to the define statement and assigning the resulting object to “SearchManager” in the function:

define([
    "underscore",
    "backbone",
    "splunkjs/mvc",
    "jquery",
    "splunkjs/mvc/simplesplunkview",
    "splunkjs/mvc/searchmanager",
    'text!../app/hello-splunkjs/js/templates/HelloSplunkJSView.html',
    "css!../app/hello-splunkjs/css/HelloSplunkJSView.css"
], function(
    _,
    Backbone,
    mvc,
    $,
    SimpleSplunkView,
    SearchManager,
    Template
){

See the diff in Github.

Now, let’s add code in the function doGetMostRecentEvent() that will kick off a search and put the most recent event in the view. See below for the created function:

        doGetMostRecentEvent: function(){ 
        	
           // Make a search
            var search = new SearchManager({
                "id": "get-most-recent-event-search",
                "earliest_time": "-1h@h",
                "latest_time": "now",
                "search":'index=_internal OR index=main | head 1 | fields _raw',
                "cancelOnUnload": true,
                "autostart": false,
                "auto_cancel": 90,
                "preview": false
            }, {tokens: true});
            
        	
            search.on('search:failed', function() {
                alert("Search failed");
            }.bind(this));
            
            search.on("search:start", function() {
                console.log("Search started");
            }.bind(this));
            
            search.on("search:done", function() {
                console.log("Search completed");
            }.bind(this));
        	
            // Get a reference to the search results
            var searchResults = search.data("results");
            
            // Process the results of the search when they become available
            searchResults.on("data", function() {
            	$("#most-recent-event", this.$el).val(searchResults.data().rows[0][0]);
            }.bind(this));
            
            // Start the search
            search.startSearch();
            
        },

See the diff in Github.

Restart Splunk and click the text “Get the most recent event in Splunk”; the most recent event should show up in the view when you click the “Get the most recent event in Splunk” text:

most_recent_event

Step 4.4: customize styling

The link we made for getting the raw event doesn’t look like a link. Let’s deploy some styling to make it look clickable. To do this, edit the file /etc/apps/hello-splunkjs/appserver/static/css/HelloSplunkJSView.css with the following:

.get-most-recent-event{
	color: blue;
	text-decoration: underline;
}

This will style the link such that it looks like this:

hello_splunkjs_with_link

See the diff in Github.

Conclusion

There is many more things that you can do with Javascript in Splunk, this is just the start. See dev.splunk.com and Splunk answers if you need more help.

 


Smart AnSwerS #61

$
0
0

Hey there community and welcome to the 61st installment of Smart AnSwerS.

I just had the pleasure of joining over 60 Splunk users for the April SplunkTrust Virtual .conf session on Best Practices for Splunk SSL by dwaddle and starcher. You can find the recording and slides for this and previous presentations on the Virtual .conf wiki page in case you missed out. For those of you in the San Francisco Bay Area that want to continue getting your Splunk clue on, come out to the SFBA Splunk User Group meeting at Splunk HQ next Wednesday, May 4th @ 6:30PM PDT. Becky Burwell from Yahoo!/Flickr will give a talk on batch search parallelization, and Sasha Velednitsky from Netflow Logic will present on using NetFlow Integrator with Splunk. Visit the SFBA user group event page to RSVP!

Check out this week’s featured Splunk Answers posts by Splunkers that wanted to share some useful information with the community:

For Splunk Enterprise, Splunk Light, and Hunk pre 6.3, default root certificates expire on July 21, 2016 – Recommendations?

Ellen from Splunk Support posted this question and answer as a product advisory for users with Splunk instances running versions older than 6.3 using default root certificates. She covers the impact, what deployments this affects, and several recommendations for moving forward. There is also a lot of helpful input worth reading from other Splunkers, partners, and customers, including reference to the very relevant talk from today’s SplunkTrust Virtual .conf session.
https://answers.splunk.com/answers/395886/for-splunk-enterprise-splunk-light-and-hunk-pre-63.html

Are there any online collections of Splunk search examples?

Sometimes users need some inspiration looking at prior well-crafted searches to discover the possibilities of Splunk’s search processing language. ChrisG, Senior Director of Documentation, shares two awesome sites with various search examples created by members in the Splunk community. These free online resources are definitely bookmark worthy.
https://answers.splunk.com/answers/372126/are-there-any-other-online-collections-of-splunk-s.html

PowerShell sample for HTTP Event Collector

gmartins_splunk noticed there was no published PowerShell example for the HTTP Event Collector, and since he had already created one, he decided to post it on Answers for other users to access. halr9000 joined in to add some constructive feedback and finishing touches to the example. Great teamwork!
https://answers.splunk.com/answers/373010/powershell-sample-for-http-event-collector.html

Thanks for reading!

Missed out on the first sixty Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

Protected: Enriching threat feeds with WHOIS information > Splunk

$
0
0

This content is password protected. To view it please enter your password below:

Splunk 6.4 – Using CORS and SSL settings with HTTP Event Collector

$
0
0

Summary

In Splunk 6.4.x and beyond CORS and SSL settings for HTTP Event Collector are dedicated. To use CORS and SSL in 6.4, you must configure the new settings which are located in the [http] stanza of inputs.conf.

Details

In Splunk 6.3.x, CORS and SSL settings for HTTP Event Collector are shared with Splunk’s REST API, and are set in server.conf in the [httpServer] and [sslConfig] stanzas.

In Splunk 6.4.x we’ve introduced dedicated settings for HEC. This means you can now have more fine-grained control of your HEC endpoint.

It also means if you were relying on CORS and SSL prior to 6.4, then you must configure the new settings in 6.4. They do not automatically migrate over.

The settings are located in the [http] stanza of inputs.conf located in %SPLUNK_HOME%/etc/apps/splunk_httpinput/local. Start at the sslKeysFile setting and you will see the new settings. Make sure you restart Splunk after updating the settings. Below for example is the setting for enabling CORS.

crossOriginSharingPolicy = <origin_acl> ...
* List of the HTTP Origins for which to return Access-Control-Allow-* (CORS)
  headers.
* These headers tell browsers that we trust web applications at those sites
  to make requests to the REST interface.
* The origin is passed as a URL without a path component (for example
  "https://app.example.com:8000").
* This setting can take a list of acceptable origins, separated
  by spaces and/or commas.
* Each origin can also contain wildcards for any part.  Examples:
    *://app.example.com:*  (either HTTP or HTTPS on any port)
    https://*.example.com  (any host under example.com, including example.com itself).
* An address can be prefixed with a '!' to negate the match, with
  the first matching origin taking precedence.  For example,
  "!*://evil.example.com:* *://*.example.com:*" to not avoid
  matching one host in a domain.
* A single "*" can also be used to match all origins.
* By default, the list is empty.

 

 

 

Tracing your TCP IPv4 connections with eBPF and BCC from the Linux kernel JIT-VM to Splunk

$
0
0

Starting with Linux Kernel 4.1, an interesting feature got merged: eBPF. For anyone playing with network, BPF should sound familiar: it is a filtering system available to user-space tools such as tcpdump or wireshark to filter and display only the wanted (filtered) packets. The e in eBPF means extended, to bring that out of just Network traffic and allowing to trace from the Kernel various things, syscall capture, kprobes, tracepoints etc.

eBPF will run a piece of C code compiled in bytecode which uses the Just-In-Time Compiler to the BPF interpreter. In short, eBPF uses the virtual machine which interprets code into the Linux Kernel. In the current git tree, BPF offers 89 instructions called from the bytecode buffer making the eBPF instructions.

It is an amazing tool for tracing, but in this post I would like to share how we can list TCP IPv4 connections and send them to Splunk using the HTTP Event Collector (HEC), all that kernel side!

We will cover the Linux kernel configuration that you need, as well as the Splunk dashboard which monitors those events.

Step 1: Getting the latest Linux Kernel

Those steps are done on a Debian distribution, should also work on Ubuntu. If you have another distribution, adjust this or find a way to grab a Kernel > 4.1.
We first grab the freshest Linux source code from the Linus tree by running the git clone command:
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Because we are on a Debian distribution, we would like to use the standardize tools provided by the Debian Kernel Package (more information here: https://wiki.debian.org/BuildADebianKernelPackage)
We need to install the following packages to automate the building and packaging creation of this kernel:
$ sudo apt-get install kernel-package build-essential libncurses5-dev fakeroot
Now we can configure options we need for our kernel by running the ncurses frontend, menuconfig:
$ make ARCH=x86_64 menuconfig
If you want to play with the new bpf() syscall, activate into the “General Setup” the item “Enable bpf() system call”:
kernelconfig
We save in the “.config” file, and we make sure the Kernel configuration builds BPF:
$ grep BPF .config
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_ACT_BPF is not set
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_TEST_BPF=m
Now we can use the Debian kernel package builder, make-kpkg:
$ make-kpkg --initrd --rootcmd fakeroot kernel_image
exec make kpkg_version=12.036+nmu3 -f /usr/share/kernel-package/ruleset/minimal.mk debian ROOT_CMD=fakeroot
====== making target debian/stamp/conf/minimal_debian [new prereqs: ]======
...
dpkg —build                   ~/git/linux/debian/linux-image-4.6.0-rc6+ ..
dpkg-deb: building package `linux-image-4.6.0-rc6+' in `../linux-image-4.6.0-rc6+_4.6.0-rc6+-10.00.Custom_amd64.deb'.
make[1]: Leaving directory ‘~/git/linux'
It builds the kernel bzImage, as well as the modules.
We install the package like this:
$ sudo dpkg -i ../linux-image-4.6.0-rc6+_4.6.0-rc6+-10.00.Custom_amd64.deb
Now it is time to reboot on your new kernel. You can then check the version by typing:
$ uname -a | grep 4.6.0-rc6
4.6.0-rc6
$ echo $?
0
If the echo command returns 1, you have booted on the wrong kernel. So now you can check if things were started correctly from GRUB.
This is all good from the Linux kernel point of view, we can now move on to the userspace tools, with BCC.

Step 2: Building BCC

Once our kernel is setup, we are now going to install and use BCC (BPF Compiler Collection), which offers a Python API where you include the C code you will bytecode for BPF and get results directly from the Linux Kernel… in Python!
You can get BCC from the latest git repository:
$ git clone http://github.com/iovisor/bcc
Simply follow the BCC building instructions:
We also install the tools iperf and netperf:
$ sudo apt-get install iperf netperf
To test BCC built fine, you can run the provided hello_world.py program:
sudo python /usr/share/bcc/examples/hello_world.py
          tpvmlp-1636  [000] d...  2633.342396: : Hello, World!
          tpvmlp-1636  [000] d...  2648.547213: : Hello, World!
And also a 4 lines longer code trace_fields.py:
$ sudo python /usr/share/bcc/examples/tracing/trace_fields.py
PID MESSAGE
1636 Hello, World!
1636 Hello, World!
3182 Hello, World!
3182 Hello, World!
1636 Hello, World!
Working? Now let’s go to the next step, setting up the Splunk HTTP Event Collector!

Step 3: Splunk HTTP Event Collector

Recently, Splunk introduced the notion of a HTTP Event Collector, which allows us to craft any type of event to be ingested by Splunk. The Event must be formatted in JSON, and send to the listening socket on the Splunk side.
I recommend you go and read “Set up and use HTTP Event Collector”, before continuing.
 We create a new HEC service, go into Settings>Data inputs:
9E2F1C6D-AD1D-4FAB-8BD8-9C679DBF6688
Now select on the left side the HTTP Event Collector:
DBCD4D88-2243-44C1-8C24-43A540CB36FA
On the upper-right corner, click on Global Settings:
64D02552-F8C5-46EB-9654-4D1B42BA07E7
This pops up the following window. We click on “Enabled” for All Tokens, we deactivate SSL, since we want to avoid adding the SSL handling code to make things easier for this blog article (however if you are not playing, it is obviously strongly discouraged to deactivate it!), and we leave the port number to the default. Click Save.
D8DA5F91-9CB0-475E-9299-D48D350B0308
Now back to the previous page, click on “New Token” on the upper-right corner:
 92F54099-1E88-46BC-A8A2-E52312A5D332
We give the name “bcc” to this token, a brief description and we can click on “Next“:
215192B4-C220-4912-8CAB-ABD0E5562765
We leave the input settings to the defaults, we can click on “Review”:
E89472CB-7939-451E-ADBD-9A63FBAC147C
We can now Submit:
46A40EDA-4121-4107-9A57-FAB118E3A991
Upon completion, our token is creating successfully like this:
322C9789-ACA2-43D5-B8D8-C8B904CFD99A
Copy the value, you will need this in your Python code!
We test if events can be sent using the program curl:
$ curl -k  http://localhost:8088/services/collector/event -H "Authorization: Splunk 652AE968-58E4-4304-A1FE-C4AB7A5CF327" -d '{"event": "hello world"}'
{"text":"Success","code":0}
And can check in Splunk the event was emitted:
13F96200-A4C2-43BB-8E7D-CBCA3B3A13BA

Step 4: BCC + HEC = \m/

We are going to modify an example given by the BCC project team which simply list the connected sockets in TCP on IPV4:
$ wget https://raw.githubusercontent.com/iovisor/bcc/master/examples/tracing/tcpv4connect.py
We can test the tool, by running it:
$ sudo python tcpv4connect.py
PID    COMM         SADDR            DADDR            DPORT
And on the other side, run an active connection, using wget:
$ wget google.com/index.html
Now back to where we started the program:
$ sudo python tcpv4connect.py
PID    COMM         SADDR            DADDR            DPORT
4367   wget         172.16.99.163    216.58.194.73    80
4367   wget         172.16.99.163    74.125.21.105    80
We can now send a Splunk event every time there is a new connection. We need to modify the code a little bit, no need to touch the C part, just the Python one.
Copy the tcpv4connect.py to tcp2splunk.py
$ cp tcpv4connect.py tcp2splunk.py
Edit now tcp2splunk.py with your favorite editor (emacs!) and go to line 20 to add the imports of httplib, os and json libraries:
from bcc import BPF
import os
import httplib
import json
# define BPF program
Now go to line 92 and initialize everything before the while loop starts:
headers = {"Authorization": "Splunk 652AE968-58E4-4304-A1FE-C4AB7A5CF327", "Content-Type": "application/json"}
conn = httplib.HTTPConnection("172.16.99.1:8088")
# filter and format output
while 1:
And finally, in the loop, we post received data to Splunk. We however add a pid check to make sure we do not send the connection this process creates to Splunk, otherwise we end up in a nice infinite loop!
        # Ignore messages from other tracers
        if _tag != "trace_tcp4connect":
            continue
        if os.getpid() != pid:
                message = {"event": {"pid": pid, "task": task, "saddr": inet_ntoa(int(saddr_hs, 16)),
                                     "daadr": inet_ntoa(int(daddr_hs, 16)), "dport": dport_s}}
                conn.request("POST", "/services/collector/event", json.dumps(message), headers)
                res = conn.getresponse()
We can now enjoy seeing our wget, as well as other python processes:
B8BBA618-1753-4025-8623-1747F545D496

Conclusion

As you have seen, using latest features from the Linux kernel in order to connect to Splunk anything the Linux kernel receives, all that from the kernel side using the glue offered by BCC so we can simply write the code and prototype using Python. I hope you will find creative ways to use the new eBPF feature and I would be more than happy to hear from you amazing stuff you are doing with it and Splunk!

High Performance syslogging for Splunk using syslog-ng – Part 1

$
0
0

Today I am going to discuss a subject that I consider to be extremely critical to any Splunk’s successful deployment. What is the best method of capturing syslog events into Splunk? As you probably already know there is no lack of articles on the topic of syslog on the Internet. Which is fantastic because it enriches the knowledge of our community. This blog is broken into two parts. In part one, I will cover three scenarios of implementing syslog with Splunk. In part two, I will share my own experience running a large Splunk/Syslog environment and what can you do to increase performance and ease management.

When given the choice between using syslog agent (ex: http://sflanders.net/2013/10/25/syslog-agents-windows/ ) or UF (Universal Forwarder), the UF should always win. The UF/Indexer pairs are designed to work with each other from the ground up. There are a lot of advantages to using Splunk Universal Forwarder (aka Splunk Agent) to push events into Splunk indexers. Here are a few reasons:

  1. Reliability
  2. Performance
  3. Ease of management.
  4. Better traffic throttling and buffering
  5. Ability to drop events at the source (new to 6.x)
  6. In-transit encryption (SSL).
  7. Intelligent events distribution across the indexers.
  8. Automatic indexer discovery (in clustered environments).

Getting back to syslogging, I have observed three scenarios utilized by Splunk’s customers for capturing syslog events:

  • Scenario 1: Using network inputs on the Indexer(s).
  • Scenario 2: Running syslog & IDX on the same server.
  • Scenario 3: SSeparate server(s) running syslog & HF/UF.

 

Scenario #1: Using network inputs on the Indexer(s)

As a Splunk ninja you already know that it is possible to configure inputs.conf to accept traffic on any TCP or UDP port http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Monitornetworkports While this mechanism is a workable solution it is not,however, ideal in high volume environments. The indexer’s main job is to write ingested events to a disk and to answer incoming queries from the Search Heads. So yes, you can enable network inputs and yes it will work, but if your indexer is not sized appropriately; then it may not be able to keep up with amount of incoming events. Here is a list of challenges with this approach:

  1. There is no error checking of any sort due to the fact that the indexers and the clients are utilizing generic network connection. With TCP inputs you get a transport layer error checking but not at the application layer. Unlike Universal Forwarders, network inputs do not have full awareness of the clients.
  2. In large implementation restarting Splunkd is slower than restarting syslog. So you will risk longer periods of service interruption. This issue may not be a big deal in load-balanced environments.
  3. Indexers normally get restarted more frequently than your syslogs engines, which will result in frequent service interruption.
  4. Setting up source types from a network input is less efficient than setting a source type from a file input. In some case it can also be complicated.
  5. If you use port numbers under 1024 (i.e. TCP/UDP 514) you will need elevated privileges, which means you may need to run splunkd as a root. This goes against best security practices.

 

Scenario #2: Running syslog & IDX on the same server:

Next, a Splunk ninja may investigate running syslog alongside splunkd (on the same server). This solution is also not a good fit for high volume environments. Here is why:

  1. Syslog daemons and Indexers are both I/O intensive applications and they will compete for valuable resources. Syslog will capture, filter and write to disk; then comes splunkd and repeats “similar” process consuming “similar” resources. Effectively you are doubling your hard drive reads/writes and using more CPU cycles with all the compression/decompression activities.
  2. Some Splunk ninjas may choose to limit where syslog is installed (maybe on one or two Indexers) thinking that will reduce the negative impact on Splunk performance. However, all you have to do to slow down your overall search speed is to drag down a single indexer in the indexing tier. As rule of thumb the search speeds (especially searches that depends on aggregation) will be as fast as the slowest indexer. This architectural “flaw” is more prevalent than I would like to see.
  3. Aside effect of scenario #2 is that the indexer running syslog will have more data than its peers which means it will need to work harder than its peers answering queries for the syslog data its holding. Data imbalance has a negative impact on storage utilization (if using built-in HD) and search performance.
  4. The performance of this design-approach gets worse in virtual environments where over provisioning is a common thing.

Scenario #3: Separate server(s) running syslog & HF/UF

A better design is to implement syslog engine(s) on their own hardware and run Universal Forwarders (or a Heavy Forwarders) to pick up the events and forward them to the rest of the indexing tier. In this configuration, syslog will act as file-based queuing mechanism, which will allow splunkd some “breathing room” to process events whenever it has the cycle to do so. Customers who has made the transition from scenarios one or two scenario three noticed significant improvement of the search speed and less UDP/514 packets drops.

In part two of this blog I will focus the discussion on syslog-ng because it is a tool I am very familiar with it. The stock and generic syslogd SHOULD NOT be used. It’s old and lacks the flexibility and the speed of modern syslog engines like SYSLOG-NG or RSYSLOG. I will cover some performance tuning tips and some management tips. The goal is to help you better manage and improve syslog events capturing.

 

Next :  http://blogs.splunk.com/2016/05/05/high-performance-syslogging-for-splunk-using-syslog-ng-part-2/

Smart AnSwerS #62

$
0
0

Hey there community and welcome to the 62nd installment of Smart AnSwerS.

There’s a lot of hustle and bustle going on at Splunk today as we will be expanding HQ with a brand new building next door! Construction has been ongoing for the past two years, but the big day is finally here with more than half of the Splunkers in our current building moving over. Folks are packing up their desks for the rest of the morning because in a little less than two hours, we’ll be celebrating the opening of the new building in true Splunk style with a Cinco de Mayo party. There is just no other way :)

Check out this week’s featured Splunk Answers posts:

Script to automate uploading diags to box.com

jgarland_gap wrote up a nifty bash script to automate the process of creating a diag and uploading it to a folder on box.com. He decided to post this question and answer out of the kindness of his heart to share with the rest of the Splunk community in case anyone else wanted to make their lives a little easier when a diag is requested.
https://answers.splunk.com/answers/390809/script-to-automate-uploading-diags-to-boxcom.html

Why is my Distributed Management Console trying to push a bundle to a newly added search peer which happens to be a standalone indexer?

lycollicott noticed the distributed management console was reporting a warning about being unable to distribute a bundle to a standalone indexer, but he didn’t understand why this was happening. ykou clarifies the difference between app, configuration, and search knowledge bundles, explaining how the DMC specifically sends search knowledge bundles to search peers in order to run ad-hoc searches against indexers being monitored.
https://answers.splunk.com/answers/368723/why-is-my-distributed-management-console-trying-to.html

Where can I find detailed use cases/scenarios for using the HTTP event collector in Splunk?

mcnamara had read a couple articles on the HTTP event collector, but wanted to see if there were any other resources that go into a little more depth on how it works and its advantages. renjith.nair provides a Splunk blog post that gives a comprehensive overview with examples on how to set it up. gblock joins in the discussion to help renjith.nair answer all of mcnamara’s follow up questions.
https://answers.splunk.com/answers/368531/where-can-i-find-detailed-use-casesscenarios-for-u.html

Thanks for reading!

Missed out on the first sixty-one Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

High Performance syslogging for Splunk using syslog-ng – Part 2

$
0
0

As I mentioned in part one of this blog, I managed a sizable deployment of Splunk/Syslog servers (2.5TB/day). I had 8 syslog-ng engines in 3 geographically separate data centers. Hong Kong, London and St. Louis. Each group of syslog-ng servers was load balanced with F5. Each group was sending traffic to their own regional indexers. Some of the syslog servers processed upward of 40,000 EPS (bursts traffic). The recommendation that I am about to describe here is what worked for me; your mileage may vary of course. I tried optimizing the syslog-ng engines to get as much performance as possible out of them. If you feel, however, that it is over kill or if you don’t have the manpower to go through the tuning process; it maybe easier to just add additional hardware and use the default settings.

 

SYSLOG-NG MANAGING TIPS

Modular configurations:

With syslog-ng release 3.x a new feature was introduce that allows you to dynamically include configuration files in the body of the main syslog-ng.conf. This is similar to C language “include” or Python “import”.

To use this feature just add a line like this to syslog-ng.conf

@include "/etc/syslog-ng/buckets.d"

This feature enables you to create a main syslog-ng.conf file then move all source-related configurations to a directory (let’s call it buckets.d). By doing so you have effectively split your syslog-ng configuration into two parts: The Static part which contains the syslog-ng server specific configuration (i.e. IP address, listening ports, sockets conditioning…etc.); and the Dynamic part, which is related to the source devices (hostnames, permissions, locations, filter rules…etc.). The static part does not change from server to server. Once configured you probably don’t need to change it. The dynamic part (buckets.d files) is constantly changing every time you add or remove a source host.

Sample buckets.d filter-file:

destination d_firewalls { file ("/syslog/FIREWALLS/$SOURCEIP/$SOURCEIP.log"
          owner(syslog-ng) group(splunk) perm(0755) dir_perm(0755) create_dirs(yes)
};
filter f_firewalls {  match("%ASA-"  value ("MSG"))
                   or match("%ASA-"  value ("MSGHDR"))
                   or match("%FWSM-" value ("MSG"))
                   or match("%FWSM-" value ("MSGHDR"))
                   or match("%PIX"   value ("MSG"))
                   or match("%PIX-"  value ("MSGHDR"))
                   and  not netmask("10.96.50.13/32”)
;};
log {source(s_network); filter(f_firewalls);    destination(d_firewalls);
};

 

To make syslog-ng configuration modular, create as many filter-files as you want. Each filter-file should contain a list of individual group of sources. Then periodically sync the “buckets.d” directory across all of your syslog-ng servers.

My sources devices (dynamic part) are not the same across all these data centers. So why am I syncing filter-files I wouldn’t use, you ask? Good question. The answer is ease of administration. By synchronizing buckets.d you don’t need to worry about which source lives where. My intention was to create a universal set of filter-files that will work in any data center. The simplicity of management superseded the clutter in this case. From that point on, every time you restart syslog-ng the entire contents for buckets.d along with the main syslog-ng conf will appear as one single conf file for the syslog-ng daemon.

 

Keywords naming convention:

As with Hungarian notation https://en.wikipedia.org/wiki/Hungarian_notation , I strongly recommend using the following naming convention to make your configuration easy to read and follow:

d_     for destination
f_      for Filters
s_      for Sources

destination     d_damballa {file ("/syslog/DAMBALLA/$SOURCEIP/$SOURCEIP.log" ); };
 
filter  f_damballa { netmask ("10.63.1.1/32") ;}; 

log {source(s_network); filter(f_damballa); destination(d_damballa); };

 

Turn on statistical gathering:

Turning on statistical gathering in syslog-ng. It will enable you to have visibility to the engine’s operation. You will be able to see how many events per source are being collected. This information is critical for capacity planning and performance tuning.

destination d_logstats { file("/home/syslog-ng/logstats/logstats.log"
          owner(syslog-ng) group(splunk) perm(0644) dir_perm(0750) create_dirs(yes));};

filter f_logstats { match("Log statistics;" value ("MSGHDR"))
                 and match("d_windows" value ("MSGHDR")); };

log { source(s_local ); filter (f_logstats); destination(d_logstats); };

 

Watch for file permissions:

Make sure syslog-ng process can READ buckets.d directory and can READ/WRITE the logs directories. Make sure that splunkd daemons has full READ access to the log files (and their parent directories)

file("/syslog/MSSQL/$SOURCEIP/$SOURCEIP.log" 
owner(syslog-ng) group(splunk) perm(0755) dir_perm(0755) create_dirs(yes));};

 

 Watch for UDP packets drops on the syslog server:

Many new syslog-ng admins don’t pay attention to his item. They simply assume things will work. For the most part that is true, but in a high volume environment UDP traffic drops are unavoidable. You will start to hear some users complaining about “missing events” and they will probably blame Splunk for it. So do yourself a favor and monitor UDP packet drops on the interface. Use whatever tool you are comfortable with. And yes, there is a Splunk app for that https://splunkbase.splunk.com/app/2975/#/overview

 

 

 SYSLOG-NG TUNING TIPS

Syslog-ng has several tuning parameters to achieve higher “ingestion” or “capture” rates. Please be aware that if you use large values; some of these configurations may require adjustment to your kernel (/etc/sysctl.conf). For full details on all syslog-ng available options please consult https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/index.html?_ga=1.115117060.204896635.1456724547

Here are some configuration options that you can use:

 Set the receiving buffer size

You can control the size of the receiving buffer using rcvbuf(). Incoming events will be queued in memory before they are written to disk. While large buffer will improve the capture speed it may also result in undesirable side effect of timestamp skewing. Syslog-ng timestamps events when they are written to disk and not when the network card does receive them. In my environment I managed to get over 5 minutes timestamp skewing by just creating too large of a buffer. If your events have multiple timestamps (one added by syslog-ng and the one added by the source device), you can probably instruct Splunk to use the second timestamp. Again use with caution!

udp ( ip(10.16.128.93) port(2514) so_rcvbuf (805306368) so_sndbuf(8096) time_zone(GMT) keep_timestamp(no) );

 

Use multiple sockets:

The term “network socket” refers to the combination of port number and IP address. One-way of enhancing syslog performance is splitting your incoming log traffic to multiple sockets (or channels). For example in my environment I configured syslog-ng to listen on UDP/2514 for the firewalls, and TCP/2515 for VMware logs and so on. You can also utilize multiple IPs if you have them. The idea here is to distribute the load among multiple channels. However, before you rush into opening multiple sockets, make sure you have exhausted the existing one(s). There is no need to complicate your design just because you can. Simple is always elegant!

 

Allow TCP logging:

Many network devices can only be configured to use UDP/514 for logging. But try to enable TCP logging whenever possible. The advantage is reliability of the transport protocol.

tcp ( ip(10.16.128.93) port(2514) ) ;

 

Set max-connections the socket can handle:

The objective here is to prevent a single source that has “gone wild” from overwhelming the channel. Start with a large number then tuned down based on your environment’s “normal” activity.

tcp ( ip(10.16.128.93) port(514) max-connections(5000) ) ;

 

Turn off DNS name resolution:

Unless you really need it, I recommend filtering by IP address and not attempt to lookup DNS hostnames. However, if you must have it; then look into running DNS cache-only servers. http://www.tecmint.com/install-caching-only-dns-server-in-centos/ . Please remember that syslog-ng can do DNS caching of its own, so again do not rush to enabling DNS caching in the OS or syslog-ng unless you really need it. In my experience enabling DNS caching in syslog-ng.conf is sufficient to reduce DNS traffic (out of the server).

use_dns(no);
dns_cache(no);

 

Explicitly define logging templates:

The advantage is that you will be able to control how the log message is formatted in case you need to forward the events to other syslogs or third party tool. Syslog-ng engine, much like Splunk, can act as logs router (aka syslog HUB). There are few things you need to worry about when you configure your syslog-ng as a hub. Watch for chain_hostname() and keep_hostname()

template t_default { template("${DATE} ${HOST} ${MSGHDR}${MSG}\n");

 

Fast Filtering:

Creatinging filters by IP addresses rather than filtering by keywords in the message body (MSG) or header (MSGHDR) is faster. If you’re trying to achieve higher capturing capacity you should look into this option.

Having said this, there might be a need to use keywords filtering. In my case I had an environment with 500+ firewalls. It was very easy to identify all possible unique Cisco ASA keywords found in ASA logs. The alternative would have been listing every single IP in the configuration file. I opted for ease of management this round. Additionally, avoid using regex in your syslog-ng, Splunk is better suited for this task

  

TESTING YOUR SYSLOG-NG INSTANCE

 There are many ways to test your configurations. The best way is to use a traffic generator like IXIA since you can really push massive amounts of traffic. My next favorite tool is loggen by Balabit (which is part of syslog-ng distribution). With this tool you can stress test your syslog-ng install by specifying the rate of syslog messages. You can also test using TCP or UDP protocols. For more realistic simulated test, you can utilize a sample input file (ex: Cisco ASA log). Using real world sample file is also very useful for testing your filtering rules.

https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/loggen.1.html

From Balabit documentations: When loggen finishes sending the messages, it displays the following statistics:

  • average rate: Average rate the messages were sent in messages/second.
  • count: The total number of messages sent.
  • time: The time required to send the messages in seconds.
  • average message size: The average size of the sent messages in bytes.
  • bandwidth: The average bandwidth used for sending the messages in kilobytes/second.

 

Example loggen commands:

 You can send data to your syslog using input file to have more realistic data

loggen 10.0.0.1 514 cisco_asa.log

 

The following command generates 1000 messages per second for ten minutes, and sends them to port TCP/514 on host 10.0.0.1 . Each message is 500 bytes long.

loggen --size 500 --rate 1000 --interval 600 10.0.0.1 514

 

 

In conclusion, as you can see syslog-ng is a very flexible and well designed open source tool. It can be a critical part of your Splunk deployment. You need to decide how far you want to take it. Weigh all your options, as every environment is different. Seek simplicity as much as possible, but don’t shy away from being on the bleeding edge if it makes sense. And finally don’t assume anything, test your configuration and monitor your deployment. As always I welcome your comments and feedback!

 

Back to part 1:  http://blogs.splunk.com/2016/05/05/high-performance-syslogging-for-splunk-using-syslog-ng-part-1/

 

 


What size should my Splunk license be?

$
0
0

This is a pretty common question in Splunkland. Maybe you’re an admin wondering how much license you’ll need to handle this new data source you have in mind for a great new use case. Or you’re a Splunker trying to answer this question for a customer. Or a partner doing the same. Given how often this comes up, I thought I’d put together an overview of all the ways you can approximate how big a license you need, based on a set of data sources. This post brings together the accumulated wisdom of many of my fellow sales engineers, so before I begin, I’d like to thank them all for the insights and code they shared so willingly. Thank you for your help, everyone!

All right. So, broadly, you have these options:
  1. Ask someone how much data there is
  2. Measure the data
    • At the original source
    • Outside the original data source
  3. Estimate the size of the data
    • Bottom-up, based on “samples” of data from various sources
    • Top-down, based on total data volumes of similar organizations

Let’s take a look at each of these in turn.


Ask someone how much data there is

Doesn’t hurt, right? Sometimes admins will actually have some of this information. Many times they will not. If they do, one approach is to take these rough estimates, add a buffer, and use that for your license size. The buffer helps account for inaccurate measurements (people make mistakes), out-of-date measurements, incomplete measurements and changing environments. This is actually a viable option if your environment is relatively static, homogeneous across sites, and can’t be measured for some reason – technical or logistical. The buffer also gives you some room to grow as you start to get comfortable with analyzing machine data. Typically, when people see what they can start to do with Splunk, they like to do more!

Where we have seen this start to break down is in some very large organizations with data all over the world. If you are looking at massive amounts of data – terabytes that is unevenly distributed between different sites, rough numbers from one site will not cut it. It might not be representative of the others. Add to that the data that hasn’t been factored into the old numbers because they only measured file sizes, which were the easiest, yet there’s more that’s collected via APIs (i.e doesn’t exist until you do the pull). And more that isn’t in a log file at all (performance data constantly streaming in, for example). Rough numbers don’t quite work in this situation.

If you do need to go beyond the rough numbers or have none, you get into the options below – measuring the data properly, or estimating it.


Measure the data

If you can do it economically and well, measuring is the best option you have. It gives you the best chance of coming up with a license that is neither oversized nor undersized. Some of the variables in the Estimation section below should be kept in mind, however, as you want to be sure you’re getting a measurement that’s truly representative.

So how do you actually do this measurement?

Measuring at the source

If you’re measuring at the original source, the exact way you’ll do this depends on the source itself.   Considering the vast universe of data sources that Splunk can take in, there really isn’t a quick trick that works for everything, unfortunately. That isn’t to say it can’t be done at the source – far from it – there are often easy-to-access ways to find this information. Native tools work just fine for files on disk. Just make sure you’re measuring uncompressed data. If you happen to have a collection of HOWTOs for different platforms, I’d love to see them.

Measuring at the source can have several advantages. It can be very accurate (read the estimation section below to see why I say “can”). You don’t have to set up a way to pull or push the data to something else. You don’t have to think about discarding the data after it’s been measured. You do have to measure it individually everywhere, however, which can get out of hand if you have a lot of sources of similar types, say desktops. In those cases, you can do some representative measurements, then extrapolate to the number of systems you have. This extrapolation can be less accurate than a true measurement obtained by collecting it all, but gets you closer.

Measuring outside the source

Since you’re usually trying to measure more than one data source, you can also consider sending them all to a central point where they’ll be collected, and you’ll get an idea of how different sources compare to each other. (You might find some surprises here). As you’d expect, Splunk can do these calculations easily. I’ll go into how shortly. It’s also something a syslog aggregator could do if your data is all coming in over syslog, for example.
If you decided to use a temporary Splunk install for this, there are a few things to keep in mind first.

Get a trial license

Splunk sales can generate a trial license for the duration of your test that can handle a large volume thrown at it. If you don’t have that option, a free Splunk license will also work, but you have to jump through a few hoops there that probably aren’t worth the effort. Since the trial option is so much easier, that’s by far the recommended approach.

Don’t touch it on the way in

Since we’re simply estimating the size of the data, there’s no reason to transform or parse it in any way before bringing it in. No props and transforms magic. At the very least, keep it to a bare minimum and just pull it in. If you’re doing this for a customer (you’re a Splunker or partner), it’s important to explain the process so everyone understands that this isn’t the start of a production install. When you get to production, you might want to do any number of things to your data on the way in – filter out some, mask portions of it, route it to certain indices, extract fields ahead of time. None of this is necessary when you’re simply counting bytes.

Make sure that retention on the “_internal” indexes is long enough for the estimation period.

The _internal indexes contain the information we’ll be looking at, so they have to stay around.

Throw it away afterwards

There’s no reason to keep the data around any longer than necessary. This isn’t production, and throwing it away also helps you get away with a lower class of hardware for this exercise. It won’t need to support consistently writing large volumes of data to disk and searching by multiple users, after all. To discard the data, route it to an index that doesn’t keep data around very long, i.e rolls it to frozen quickly. Say you’re routing it all to main and want to keep the data around no longer than a day (86400 seconds). Set this in your indexes.conf.
[main]
frozenTimePeriodInSecs = 86400
This will roll the data to frozen after 86400 seconds. If you haven’t specified a path for the frozen data, it will be deleted.

The maxTotalDataSizeMB parameter also controls when data is rolled to frozen, and does it based on the size of the data. Set it to a very large value to control the roll strictly by time. Learn about this behavior under setting a retirement and archiving policy.

So, assuming you’ve done all this, you’re ready to start measuring. Use the searches at the bottom of this page to do that. Lots of goodies there…modify as you need to.


Estimate the size of the data

As a last resort, you can fall back to estimating the data. Warning – this method can be inaccurate and make people unhappy if the resulting license sizes turn out to be too small or big. When relaying the information, please communicate that though you’re doing your *very best* to estimate, the practice comes with caveats they should keep in mind. Seeing hard numbers can lead people to think there was more science behind them than there was. If you don’t believe that (but you probably do), go read How to Lie with Statistics by Darrell Huff. It’ll be a bit of an eye-opener.
There are many good reasons to fall back on estimates. Maybe you’re in a time crunch and don’t have the time to measure things. Or there are no resources to do so. It could also be a question of budgets – this is how much I can afford at this time. And sometimes, the data is sensitive (or you aren’t sure) and there’s no will to temporarily pull it out to a centralized collector like Splunk to measure – even if you’re doing it on-premise. In these situations, balance it out against the cost of estimating wrongly, and if you still want to go ahead, you have a couple options to estimate.

Estimating based on data samples

These samples can come from previous sample sizes Splunk has collected (which we have), ones the actual environment has collected (but not measured yet), from mobile apps like LogCaliper, or from vendors of the data sources themselves. To use the Splunk-collected samples, Splunkers will already be aware of the options here. There are various spreadsheets and an app based upon the best of these. Partners – reach out. We also highly encourage submitting anonymized samples to improve the value of these samples, and we make it easy!

So why are samples something to be careful of? Simply put, they might not reflect your environment since they’re based on another one. Log volumes, even with the same product family, can vary widely depending on a number of factors:
  • Logging severity levels. Say you’re talking about Cisco devices. There’s a big difference between the amount of information between the lowest (0 – emergency) and highest (7 -debugging, or more likely 6 – information) levels. Are you logging simply which flows are allowed/denied, or going up the chain to user activity?
  • What the device is doing. Logs are a reflection of work performed. If this is a firewall, how much traffic is it seeing? The more it does, the more it will log, even if the severity level stays the same. Simply knowing the number of firewalls you have isn’t enough, though that’s often the first piece of information that comes back. How many flows are they seeing per second? Couple this with something like a netflow calculator if you’re looking at netflow, and you start to get somewhere.
  • Services enabled.
  • Types of events that are being logged. (Side note: Are you logging everything you should to help you in case of a security breach? Check out some excellent guidance from the NSA, and Malware Archaeology’s great talk at the 2015 Splunk .conf user conference.
  • How many things get logged. Say it’s Tripwire data. Change notifications depend on how many Tripwire agents there are in the environment and what rules are set to fire against them.
  • What’s IN a thing that gets logged. Say it’s Tripwire again. If you’re doing a lot with Windows GPO changes or AD changes, your change notifications get bigger.
  • Custom logs. You can add your own pieces to existing logs sometimes, such as with a BlueCoat proxy.
  • How long you’re measuring for. Volumes can ebb and flow over the course of a week.
  • Where the data is coming from. Some data sources are just more talkative than others. Carbon Black, for example.

…And so on. You get the picture. There are more examples on this Splunk Answer. (For completeness, I linked back to this post from there, so if you see a circular reference, you’re not imagining things). The point is – any sample you’re basing your estimate on is just that – a sample. You can attempt to average them over time, which helps. Just remember that your own environment can still be different. If this is the best you can do, run with it. Decisions are made with less than perfect information every day.

Estimating based on the usage of similar-sized organizations

In contrast to above, this takes a top-down, wholesale look at your data needs as an organization. The hypothesis is that you has a similar setup to others your size, and similar data sizes. This is a faster way than any of the above, since it abstracts away all detail in favor of the assumption that you can’t be *that* different from the others. The challenge, of course, is in defining who those others are. This is something you will probably need consultative help from Splunk on. A certain level of sanity checking would be needed here to see if you really can use similar assumptions, even if you happen to be outwardly similar. You might operate quite differently.


Appendix: Searches for measuring data ingested

****************************************************************************
Using Raw Data Sizing and Custom Search Base
These searches use the len Splunk Search command to get the size of the raw
event using a custom base search for specific type of data.
****************************************************************************

NOTE: Just replace "EventCode" and "sourcetype" with values corresponding to the type of data that you are looking to measure.

=====================================================
Simple Searches:
=====================================================

Indexed Raw Data Size by host By Day:
-------------------------------------
sourcetype=WinEventLog:*
| fields _raw, _time, host
| eval evt_bytes = len(_raw)
| timechart span=1d sum(eval(evt_bytes/1024/1024)) AS TotalMB by host


Indexed Raw Data Size by sourcetype By Day:
-------------------------------------------
sourcetype=WinEventLog:*
| fields _raw, _time, sourcetype
| eval evt_bytes = len(_raw)
| timechart span=1d sum(eval(evt_bytes/1024/1024)) AS TotalMB by sourcetype


Indexed Raw Data Size by Windows EventCode By Day:
--------------------------------------------------
sourcetype=WinEventLog:*
| fields _raw, _time, EventCode
| eval evt_bytes = len(_raw)
| timechart span=1d limit=10 sum(eval(evt_bytes/1024/1024)) AS TotalMB byEventCode useother=false


Avg Event count/day, Avg bytes/day and Avg event size by sourcetype:
--------------------------------------------------------------------
index=_internal  kb group="per_sourcetype_thruput"
| eval B = round((kb*1024),2)
| stats sum(ev) as count, sum(B) as B by series, date_mday
| eval aes = (B/count)
| stats avg(count) as AC, avg(B) as AB, avg(aes) as AES by series
| eval AB = round(AB,0)
| eval AC = round(AC,0)
| eval AES = round(AES,2)
| rename AB as "Avg bytes/day", AC as "Avg events/day", AES as "Avg event size"


Avg Event count/day, Avg bytes/day and Avg event size by source:
----------------------------------------------------------------
index=_internal  kb group="per_source_thruput"
| eval B = round((kb*1024),2)
| stats sum(ev) as count, sum(B) as B by series, date_mday
| eval aes = (B/count)
| stats avg(count) as AC, avg(B) as AB, avg(aes) as AES by series
| eval AB = round(AB,0)
| eval AC = round(AC,0)
| eval AES = round(AES,2)
| rename AB as "Avg bytes/day", AC as "Avg events/day", AES as "Avg event size”






=====================================================
Combined Hosts and Sourcetypes:
=====================================================
 
Top 10 hosts and Top 5 sourcetypes for each host by Day:
--------------------------------------------------------

sourcetype=WinEventLog:*
| fields _raw, _time, host, sourcetype
| eval evt_bytes = len(_raw)
| eval day_period=strftime(_time, "%m/%d/%Y")
| stats sum(evt_bytes) AS TotalMB, count AS Total_Events by day_period,host,sourcetype
| sort day_period
| eval TotalMB=round(TotalMB/1024/1024,4)
| eval Total_Events_st=tostring(Total_Events,"commas")
| eval comb="| - (".round(TotalMB,2)." MB) for ".sourcetype." data"
| sort -TotalMB
| stats list(comb) AS subcomb, sum(TotalMB) AS TotalMB by host, day_period
| eval subcomb=mvindex(subcomb,0,4)
| mvcombine subcomb
| sort -TotalMB
| eval endcomb="|".host." (Total - ".round(TotalMB,2)."MB):".subcomb
| stats sum(TotalMB) AS Daily_Size_Total, list(endcomb) AS Details by day_period
| eval Daily_Size_Total=round(Daily_Size_Total,2)
| eval Details=mvindex(Details,0,9)
| makemv delim="|" Details
| sort-day_period


Top 10 Hosts and Top 5 Windows Event IDs by Day:
--------------------------------------------------------

sourcetype=WinEventLog:*
| fields _raw, _time, host, EventCode
| eval evt_bytes = len(_raw)
| eval day_period=strftime(_time, "%m/%d/%Y")
| stats sum(evt_bytes) AS TotalMB, count AS Total_Events by day_period,host,EventCode
| sort day_period
| eval TotalMB=round(TotalMB/1024/1024,4)
| eval Total_Events_st=tostring(Total_Events,"commas")
| eval comb="| - (".round(TotalMB,2)." MB) for EventID- ".EventCode." data"
| sort -TotalMB
| stats list(comb) AS subcomb, sum(TotalMB) AS TotalMB by host, day_period
| eval subcomb=mvindex(subcomb,0,4)
| mvcombine subcomb
| sort -TotalMB
| eval endcomb="|".host." (Total - ".round(TotalMB,2)."MB):".subcomb
| stats sum(TotalMB) AS Daily_Size_Total, list(endcomb) AS Details by day_period
| eval Daily_Size_Total=round(Daily_Size_Total,2)
| eval Details=mvindex(Details,0,9)
| makemv delim="|" Details
| sort-day_period




**************************************************************************
Licensing/Storage Metrics Source
The below searches look against the internally collected licensing/metrics
logs and introspection index. These are in license_usage.log, which is
indexed into the _internal index.
**************************************************************************


============================================
     Splunk Index License Size Analysis
============================================

Percent used by each index:
---------------------------
index=_internal source=*license_usage.log type=Usage
| fields idx, b
| rename idx AS index_name
| stats sum(eval(b/1024/1024)) as Total_MB by index_name
| eventstats sum(Total_MB) as Overall_Total_MB
| sort -Total_MB
| eval Percent_Of_Total=round(Total_MB/Overall_Total_MB*100,2)."%"
| eval Total_MB = tostring(round(Total_MB,2),"commas")
| eval Overall_Total_MB = tostring(round(Overall_Total_MB,2),"commas")
| table index_name, Percent_Of_Total, Total_MB, Overall_Total_MB


Total MB by index, Day – Timechart:
-----------------------------------
index=_internal source=*license_usage.log type=Usage
| fields idx, b
| rename idx as index_name
| timechart span=1d limit=20 sum(eval(round(b/1024/1024,4))) AS Total_MB by index_name


============================================
Splunk Sourcetype License Size Analysis
============================================

Percent used by each sourcetype:
-------------------------------------------
index=_internal source=*license_usage.log type=Usage
| fields st, b
| rename st AS sourcetype_name
| stats sum(eval(b/1024/1024)) as Total_MB by sourcetype_name
| eventstats sum(Total_MB) as Overall_Total_MB
| sort -Total_MB
| eval Percent_Of_Total=round(Total_MB/Overall_Total_MB*100,2)."%"
| eval Total_MB = tostring(round(Total_MB,2),"commas")
| eval Overall_Total_MB = tostring(round(Overall_Total_MB,2),"commas")
| table sourcetype_name, Percent_Of_Total, Total_MB, Overall_Total_MB


Total MB by sourcetype, Day – Timechart:
-------------------------------------------
index=_internal source=*license_usage.log type=Usage
| fields st, b
| rename st as sourcetype_name
| timechart span=1d limit=20 sum(eval(round(b/1024/1024,4))) AS Total_MB by sourcetype_name


============================================
Splunk host License Size Analysis
============================================

Percent used by each index:
-------------------------------------------
index=_internal source=*license_usage.log type=Usage
| fields h, b
| rename h AS host_name
| stats sum(eval(b/1024/1024)) as Total_MB by host_name
| eventstats sum(Total_MB) as Overall_Total_MB
| sort -Total_MB
| eval Percent_Of_Total=round(Total_MB/Overall_Total_MB*100,2)."%"
| eval Total_MB = tostring(round(Total_MB,2),"commas")
| eval Overall_Total_MB = tostring(round(Overall_Total_MB,2),"commas")
| table host_name, Percent_Of_Total, Total_MB, Overall_Total_MB


Total MB by host, Day – Timechart:
-------------------------------------------
index=_internal source=*license_usage.log type=Usage
| fields h, b
| rename h as host_name
| timechart span=1d limit=20 sum(eval(round(b/1024/1024,4))) AS Total_MB by host_name


============================================
Splunk Index Storage Size Analysis
============================================

Storage Size used by each non-internal index:
-------------------------------------------
index=_introspection component=Indexes NOT(data.name="Value*" OR data.name="summary" OR data.name="_*")
| eval data.total_size = 'data.total_size' / 1024
| timechart span=1d limit=10 max("data.total_size") by data.name


Storage Size used by each internal index:
-------------------------------------------
index=_introspection component=Indexes (data.name="Value*" OR data.name="summary" OR data.name="_*")
| eval data.total_size = 'data.total_size' / 1024
| timechart span=1d limit=10 max("data.total_size") by data.name

Humanizing Security Data Visualization

$
0
0

Visualizing and displaying complex data is hard. Understanding complex data is harder. Rapidly making operational decisions based upon complex data is extremely hard.

Historically, operational security analysts rely on alerts, tables, and charts on dashboards or in email to pull potentially useful information out of the vast sea of data dumping into their analytic systems. This has always been problematic due to the combination of false positives and understanding the context of data filtered through the human brain. Most of the standard methodologies for displaying complex information make it harder, not easier, for humans to understand the information they seek in a timely and operationally useful manner.

Everyone has seen dashboards with a wall of text in tables interspersed with colorful, many-fruited pie charts nesting near clusters of yummy looking fruit pops of varying lengths or heights in bar and column charts next to a row of radial meters like tachometers running redline like a series of engines about to blow. These are proudly displayed using giant monitors centered on SOC walls to impress all and sundry. Meanwhile, the useful dashboards are subsets of those giant walls of information overload and are still difficult to use due to the dizzying array of information and data making it difficult for a human operator to zero in upon the most salient and useful bits of information needing further investigation or action.

In “Death by Information Overload,” an article in the September 2009 issue of Harvard Business Review, email is used as an example of individual information overload causing significant productivity losses for organizations. Like email, the volume of information coming at an analyst every day on SOC dashboards is often confusing and time consuming to interpret and manage more so than even a deluge of email in a day. Speeding up the decision-making process on security incidents leads to labor saving, shorter mean time to discovery and response, and faster determinations on true vs. false positives. As a result, any methodology that reduces the time and effort for an analyst to interpret, understand, and then act upon useful security related data has the potential for notable impacts on the organization in terms of cost savings from increased productivity; lower employee stress levels and burnout rates; and reduced risk due to improved security incident response and higher quality analysis.

To illustrate the differences in efficacy of viewing certain complex data and the impacts on a human being’s ability to discern the information out of the data presented, below is a series of visualization methods looking at the exact same data using the same search in Splunk.

The specific search to obtain the source events is long and complex and distracts from the points herein. Therefore, describing it as a search to present all events with either failed or succeeded authentication events across a variety of data sources will suffice.

NOTE: The specific search used was originally derived from an expansion of a Splunk Enterprise Security data model search resulting in the final pipe section using … | ”Authentication.action”=failure OR “Authentication.action”=success followed by a fun series of adventurous hoop jumping to arrive at our final data points.

The first evolution of condensed information display is to distill something down to simple data points.

Enter the single value charts. Behold the power of simple number displays:

Splunk_Single_Value_Example_JT

At once simple and elegant, in this visualization there are no dizzying arrays of information to distract from the simple counts. However, these numbers without context are meaningless. What the analyst can take from this is “Wow, there was an order of magnitude more failures than successes!” There is no way to know, given this display, whether either of these numbers is alarming or simply normal for a week. Single numbers are highly useful in certain circumstances, as are radial dials, thermometer gauges, and other methods of quickly looking at a valuable metric that presents in context as a single value. This use case is not one of those circumstances, clearly.

NOTE: Recent Splunk single value metrics are far more useful in that they can show trend lines and the value increased or decreased since the last reporting period.

This proves context is of paramount importance to provide relative value for these data for an analyst. Skipping several iterations of methodologies to find valuable metrics for brevity (an amusing term on a blog post over 2,000 words), the result is looking at these data in the context of whether they are statistical anomalies based on simple mean and standard deviation (also called sigma) across several weeks’ time.

(Yes, the stats geeks will be adamant that these are worthless until the data is proven of normal distribution or means of means are used to force normal distribution; at which time the security analysts panic at loss of fidelity of original data; and other such perfectly valid arguments shall ensue. For the purposes of simplifying the example of visualization valuable information, this blog assumes the data in question is normally distributed, even though it is most certainly an unlikely scenario in the supplied dataset.)

Next up, we have the ubiquitous data table:

Splunk_Table_Example_JT

In this example, we see the data points clearly laid out (rounded to whole numbers). There are our same actual figures as seen in the Single Value dashboard, combined with last week’s number under goal. The forecast is an arbitrarily chosen value of 2 above the mean merely for illustrative purposes. The low, med, and high values are 3 below the mean (with negative values, being meaningless in this context, replaced with 0); the mean; and 3 above the mean, respectively, for sake of having some type of fairly standard benchmark. (For the non-stats folks, this means that 99.7% of all values will land within the the high range. For the purposes of aiding analyst, this is likely sufficient for visual reference.)

This is, also, difficult to read and interpret. The columns could be rearranged in some fashion, perhaps programmatically to place the actual, goal, and forecast values in between the low, med, and high values relative to their numerical value. This would slightly improve a human’s ability to read this information. However, this is quite cumbersome and slow for an analyst’s decision making process.

Next, there is the ever-popular column chart:

Splunk_Column_Chart_Example_JT

This is far easier to interpret quickly compared to the table. The low, med, and high marks are placed as dotted overlays (due to being single value parameters as opposed to multiple values). This has failures above and successes below, with the goal, actual, and forecast as columns. The relative values compared to the low, med, high ranges are instantly obvious, though slightly hard to see due to being single points. The current week is clearly more than last week on both charts, and the forecast is show to be optimistic and likely inaccurate at best. To use a more accurate forecast value, use Splunk’s predict command and other more complex and accurate methods. For this purpose, we’ll use this data point as assumed accurate benchmark.

There are, however, still problems with this chart. It is difficult to ascertain the general numerical values due to the squashing of the chart for it to appear in a reasonable screen space. Anything smaller would render it useless, and a taller visualization becomes difficult to view together without scrolling.

To conquer these problems, the bar chart comes into play:

Splunk_Bar_Chart_Example_JT

This visualization is far easier to discern relative values and roughly specific values without further investigation. One can easily see that the actual successes are not far from last week on a relative level, and last week’s failures were far out of line with this week’s numbers. However, this chart carries over the other problems from the column chart.

In addition to the other issues discussed with the column and bar charts, both visualizations use multiple colors. Not only is using multiple colors potentially difficult for people with color vision deficiencies, but humans process different colors as information more slowly than they will gradients of single colors. Noah Iliinsky is one of many experts talking and writing about how humans perceive information, which has been studied rather extensively. He has a great chart illustrating his research on his site at http://complexdiagrams.com/properties, which is explained in detail in his 2013 IBM White Paper Choosing visual properties for successful visualizations. In short, there are faster and slower visual stimuli for humans to interpret, depending on the nature of the information. Generally speaking, the type of numerically ranked data used in the examples herein, is ordered with many data points, and is, therefore, better served by displays using length, size or area, combined with differences in saturation or brightness rather than different colors.

The previous charts may use different lengths to moderate effect, but they fall short on several other points, and the table is simply arcane and difficult to interpret at all, let alone quickly.

Stephen Few has studied, written, and spoken extensively on the subject of visualization in a larger context. He has developed a highly effective method for quickly viewing and interpreting the type of data used to create
the visualizations above: the bullet graph.

Splunk_Bullet_Graph_Example_JT

Not only is this display highly condensed on the screen, it allows for extremely rapid interpretation of these complex data relationships. Each chart is clearly labeled with numerical values close to the visual bands for easy estimations. The gray shaded background areas indicate the low, med, and high ranges, with the Auth Failures showing now low due to the value being set to 0. The vertical bar is the goal, or last week’s values in this context. The light blue line is the forecast value, and the dark blue line is the actual value. The blue shades are different enough from the gray shades that this could be rendered in grayscale or viewed by someone with color vision deficits with no change in presentation quality or value.

With bullet graph visualization of data, an analyst can quickly discern whether something seems anomalous or within normal bounds and make more rapid operational decisions, driven with higher quality data with better understanding.

This specific graph quickly and clearly shows that the Auth Failures count is generally between zero and 75,000 in a week, with the lower bound statistically below zero so there is no darker shade at the beginning of the graph. The last week, shown by the vertical bar, was notably smaller, being down in the 20,000 range, but the current week is closing in on 150,000 as shown by the dark blue bar. The projected value is near 200,000, which again was chosen arbitrarily for this example, but it could be based on any valid calculation for the given data in another use case.

The Auth Successes show three ranges based on standard deviation calculations, with the lower range crossing to mid-range around 1,800 and the high range starting around 6,200 based on the three grey shaded areas. The number of successes from last week was around 4,500 as shown by the vertical bar, and this week’s number is nearly 8,000 based on the darker blue line. The projection is over 9,000 based on the chosen metric.

The upshot is that an analyst can quite quickly see that the full range of possible values for Auth Failures is fairly wide, with last week being fairly anomalous compared and this week being rather high but generally in the range of normal trends. Therefore, there is a likelihood that last week something strange happened to have so few Auth Failures compared to this week. However, if this chart used Splunk’s predict command, it is possible there would be a clearer understanding of projected values for the final count of the week, which would provide a stronger relative sense of normal vs. anomaly.

The Auth Successes portion of the chart indicate there is a fairly high number of related events, but the overall trend indicates this is well within normal at the time the chart was drawn.

Background Notes:

Few provides the specifications for bullet graphs on his site at https://www.perceptualedge.com/articles/misc/Bullet_Graph_Design_Spec.pdf, including many grayscale and color examples.

The Splunk Bullet Graph Custom Visualization app is available on Splunkbase at https://splunkbase.splunk.com/app/3144/ with extensive documentation at http://docs.splunk.com/Documentation/CustomViz/1.0.0/BulletGraph/BulletGraphIntro.

Implementing the bullet graph app, and other amazing visualization apps, require the use of Splunk Enterprise 6.4’s new Custom Visualization framework, documented at http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/CustomVizDevOverview and one of the best innovations in Splunk Enterprise to combat information overload in this data-driven age.

The examples above relied upon the late and dearly missed David Carasso’s command timewrap, found at https://splunkbase.splunk.com/app/1645/.

The data dashboards used this base search (skipping all the scary auth stuff to pull the specific auth related events):

… | search “Authentication.action”=failure OR “Authentication.action”=success | timechart span=1w count by Authentication.action | eventstats mean(failure) AS FailureMeanPast,stdev(failure) AS FailureStdevPast, mean(success) AS SuccessMeanPast, stdev(success) AS SuccessStdevPast | timewrap w

The subsequent searches for the two charts in each used (except the single value dashboard, which merely counted one or the other):

eval title=”Auth Successes” | eval actual=success_latest_week | eval goal=success_1week_before | eval range_low=SuccessMeanPast_latest_week-(SuccessStdevPast_latest_week*3) |eval range_low=if(range_low&lt;0,0,range_low)| eval range_med=SuccessMeanPast_latest_week | eval range_high=SuccessMeanPast_latest_week+(SuccessStdevPast_latest_week*3) | eval forecast=SuccessMeanPast_latest_week+(SuccessStdevPast_latest_week*2) | table title goal range_low range_med range_high actual forecast

The table used a variation of the above with the values rounded.

Splunking Continuous REST Data

$
0
0

One of the ways vendors expose machine data is via REST. There are a couple of ways to get REST data into Splunk today:

  1. Use Damien Dallimore’s REST API Modular Input – you can provide a custom response handler for this input to persist state.
  2. Use the new Splunk Add-on Builder – this method will do a “one shot” of the REST endpoint – meaning, every time the input runs, it will get all the data every time.

In this post, I will show you how to implement a cursor mechanism (i.e. pick up where you left off last time) for REST endpoints that continually have new data over time using the checkpoint mechanism built into modular inputs.

The Data Source

For this example, we will ingest JSON data from a tumblr blog – http://ponidoodles.tumblr.com.  I chose this as an example because the v1 REST endpoint in Tumblr is open and easy to use for an example (no authentication required).  Plus, this one it is about ponies.

The API documentation and parameters can be found here https://www.tumblr.com/docs/en/api/v1

We will use 2 of the available parameters:

  • start – this is the post offset to start pulling posts
  • num – this specifies the number of posts to pull.

Getting the Data in

Splunk REST Data

Following is the pseudo-code we will use to get the data:

  1. Get the starting position from a checkpoint
  2. If there is no checkpoint, set the starting position to 0
  3. Pull up to 5 posts from the endpoint starting at the starting position
  4. Count the number of posts read
  5. Stream each post to Splunk
  6. Add the number of posts read to the starting position
  7. Save the new starting position (in the first case, the new starting position will be 5)
  8. Repeat

To keep the code concise, we will use the Splunk Python SDK to create a modular input.

In the Splunk Python SDK, all the magic happens in the stream_events method.

In order to implement the checkpoint mechanism based on the pseudo code above, I stole borrowed some code from the Splunk Add-on builder to abstract the check pointing mechanics.

Here is an actual code snippet:

state_store = FileStateStore(inputs.metadata, self.input_name)
last_position = state_store.get_state("last_position") or 0
self.url = "%s?start=%s&num=%s" % (rest_url, str(last_position), str(num))
http_cli = httplib2.Http(timeout=10, disable_ssl_certificate_validation=True)
resp, content = http_cli.request(self.url, method=self.rest_method, body=urllib.urlencode(self.data), headers=self.header)
jsVariable = content.decode('utf-8', errors='ignore')
# The response from this particular REST endpoint delivers content in a JavaScript variable.
#   - Example:  tumblr_api_read = {“key”:value, “key”:value, “posts”:[array of posts]}
#
# The below line strips out the unnecessary text to get just the JSON
jsonValue = json.loads('{%s}' % (jsVariable.split('{', 1)[1].rsplit('}', 1)[0],))
num_posts_streamed = 0
for post in jsonValue["posts"]:
    num_posts_streamed += 1
    # Stream the event here
    # Store the position to pick up on next time
last_position = int(last_position) + num_posts_streamed
state_store.update_state("last_position", str(last_position))

The complete code can be found on GitHub.

Note: The method we used here for saving a checkpoint is very basic (i.e. counting the number of posts) and may not apply to your situation.  Sometimes, the REST data may give you a continuation token and something like the following may be necessary:

if "nextLink" in jsonValue:
    state_store.update_state("nextLink ", jsonValue[“nextLink”])

Microsoft Azure Audit does this for instance.

Testing the Input

A nice way to test you input prior to using it in your Splunk environment is to use the Splunk CLI.  First copy the contents from the GitHub repo above to your $SPLUNK_HOME/etc/apps folder.  Next, execute the following (Splunk is installed in /opt/splunk in this case):

/opt/splunk/bin/splunk cmd splunkd print-modinput-config splunk_rest_example splunk_rest_example://RESTTest | /opt/splunk/bin/splunk cmd python /opt/splunk/etc/apps/TA_rest-example/bin/splunk_rest_example.py

The checkpoint location is

$SPLUNK_HOME/var/lib/splunk/modinputs/splunk_rest_example

There will be a file in there called last_position that gets updated on each run.  Open it up with a text editor to see for yourself.

Clearing Input Data

If you want to reset the checkpoint file, run the following command:

/opt/splunk/bin/splunk clean inputdata splunk_rest_example

Note: you can also clean eventdata to remove indexed data.  For testing purposes, I usually write events to a staging index (this is done via inputs.conf) and clean that index as needed.

Distributed Deployments

All the code and examples above were run on a single Splunk instance.  If you plan on using these techniques in a distributed deployment, the recommend architecture is to run the input on a heavyweight forwarder.  For more information about where to install add-ons in a distributed deployment, check the Splunk documentation.

Box Plots: Making Custom Visualizations

$
0
0

This is the first of a two part series on implementing Box Plots in Splunk for security use cases.

Analyzing complex data is difficult, which is why people use Splunk. Sometimes patterns in data are not obvious, so it takes various ways of looking at aggregate reports and multiple charts to ascertain the important information buried in the data. A common tool in a data analyst’s arsenal is a box plot. A box plot, also called a box and whisker plot, is a visual method to quickly ascertain the variability and skew of data, as well as the median. For more about using and reading box plots, read the excellent and succinct post by Nathan Yau of the Flowing Data blog “How to Read and Use a Box-and-Whisker Plot.”

With Splunk Enterprise 6.4, there is a new framework for custom visualizations. For anyone interested in building their own, there are extensive and well written docs on building custom visualizations, and they are excellent tutorials and reference materials for anyone building new visualization apps.

The most difficult part of building visualizations is not creating the Splunk app, especially with the excellent documentation and great community support from Answers, IRC, and Slack (signup online). The basic steps are (largely distilled from http://docs.splunk.com/Documentation/Splunk/6.4.0/AdvancedDev/CustomVizTutorial):

  1. Create working visualization outside the Splunk framework. This usually is as simple as an HTML file to call the JavaScript and the JavaScript code itself, combined with some example input of some sort.
  2. Download and install a fresh, unused Splunk Enterprise 6.4 (or newer) instance on a test machine or a workstation. Do not install any other apps or add-ons. This provides a clean and uncluttered environment for testing the app without potential conflicts or problems from a production environment. This, also, allows for restarting Splunk, reloading configs, or removing and reinstalling the app or Splunk itself at any time during the development process.
  3. Download the example app from the tutorial.
  4. Install the Viz_tutorial_app in the Splunk test instance.
  5. Rename the app directory to match the new app being developed.
  6. Edit the config files as directed in the tutorial for the new app name and other settings.
  7. Perform JavaScript magic to create the working visualization within Splunk. This post will help with this process.
  8. Optionally (and preferably) add user definable options.
  9. Test and package the app.

The difficult part of building a visualization app is the JavaScript code drawing the chart mentioned in steps one and seven above. Most people start with pre-written libraries to save the arduous work of writing the code from a blank screen, which sometimes makes step one easier. However, even when these libraries work in their native form perfectly well, most of them require some massaging before they work correctly within the Splunk Custom Visualization framework.

The most common visualizations are built on top of the Data Driven Documents (D3) library D3.js. The bulk of existing D3 applications are designed to work with some raw, unprocessed data, often supplied by a CSV file or other static source. Because of this, the data input usually must be altered within the JavaScript when building a Splunk custom visualization. In addition, without an analytics engine supplying the data, most D3 applications are written to perform all the mathematical calculations on the static data sources. With Splunk’s superior ability to perform a wide range of calculations on vast amounts of data, it behooves a visualization developer to alter the JavaScript to accept pre-processed data.

Following this paradigm, the D3 Box Plot application started with Jens Grubert’s D3.js Boxplot with Axes and Labels code. Grubert’s code runs well using the static format for a CSV as data input, but the JavaScript is hardcoded for a specific number of columnar inputs from the file. Altering the code is required to change the number of columns and, therefore, the number of box plots displayed in a single chart. Also, the source app performs all the calculations within the JavaScript using raw number inputs from the CSV data file.

Splunk supplies the pre-calculated data for an arbitrary number of box plots needed and uses the D3 Box Plot app only for display. Therefore, the original code required significant changes to remove the calculations; alter the inputs to accept the pre-calculated data only needed to draw the visual elements; and to work within the Splunk Custom Visualization framework.

Kevin Kuchta provided significant assistance in reworking the JavaScript from Grubert’s original code into something meeting the data input requirements and removing the mathematical functions to operate as a standalone app. This was needed to ensure the application can perform all the needed functions before it was converted to work within Splunk. Some of the original code is commented out in case it becomes useful in future editions of the published app and some has been removed entirely.

Grubert’s code uses two scripts. One is embedded in an HTML file that is used to call the code with a browser, and the other is a standalone file called box.js. During the development phases of altering the code to run outside of Splunk, the embedded script in the HTML was moved to an outside file called boxparse.js and sourced within the HTML.

Setting up the Development Environment

The original code in the HTML file that calls the visualization library used (d3.min.js) and the box.js file and looks like:

<script src=”http://d3js.org/d3.v3.min.js”></script>
<script src=”d3.v3.min.js”></script>
<script src=“box.js”></script>

After pulling the code from between the <script></script> tags immediately following those three lines and putting it into the boxparse.js file, it was sourced by adding:

<script src=“boxparse.js”></script>

To test this locally without cross-domain errors in Chrome (the browser of choice for debugging JavaScript today), an in-place web server was run using port 9000 (to not interfere with Splunk running locally on 8000) on the local machine from the directory holding the box plot code using:

python -m SimpleHTTPServer 9000

This allows for rapid testing using Chrome pointed at http://localhost:9000/.

Changing Inputs and Removing Data Calculations

The next step was to remove the calculation code and altering the inputs to both be dynamic in the number of different data sets for a variable number of box plots to display and to accept pre-calculated values for the final data required to create a box plot.

The required values to create a box plot are:

  • median
  • min
  • max
  • lower quartile (used for lower bound of the box)
  • upper quartile (used for upper bound of the box)
  • interquartile range (the difference between the upper and lower quartiles and called iqr in the app)
  • list of outlier values (not used in the initial version of the Box Plot Viz app)
  • category name or label

The data parsing is in the boxparse.js code taken from the HTML file. This made the process simple to remove the lines starting with:

// parse in the data
d3.csv(“data.csv”, function(error, csv) {

and ending with:

if (rowMax > max) max = rowMax;
if (rowMin < min) min = rowMin;
});

This section of the original code both reads the input CSV file and performs calculations on the data to find min and max values for each set. All of this code was removed and min and max are now set using:

var yMinValues = [];
for (minind = 0; minind < data.length; minind++) {
yMinValues.push(data[minind][2][0]);
}
var yMin = Math.min(…yMinValues);

var yMaxValues = [];
for (maxind = 0; maxind < data.length; maxind++) {
yMaxValues.push(data[maxind][2][1]);
}
var yMax = Math.max(…yMaxValues);

This sets yMin and yMax as new variables for clarity in naming, rather than using the original code’s min and max variable names. This required changing the y-axis from using:

.domain([min, max])

to using:

.domain([yMin, yMax])

The iqr() function to calculate the interquartile range was removed entirely, and references to such were replaced with the iqr variable supplied by the external data (to prepare for conversion to Splunk Custom Visualization).

Another notable change was to pass the yMin and yMax variables to the d3.box() function thusly:

var chart = d3.box({“yMax”:yMax,”yMin”:yMin})

This sends the data as part of the config object sent to d3.box() in box.js. To use these in box.js, the following was added to the bottom of the d3.box() function:

yMin = config.yMin,
yMax = config.yMax;

During testing, an array was created in boxparse.js to include data for testing. This was better than using an external file because it simulates how the data will come from splunk in the variable named data. Arbitrarily, the decision was made to use an ordered, index array like:

var newFakeData = [
// Column  Quartiles        Whiskers  Outliers        Min  Max
[“somedata”, [ 10, 20, 30 ],   [5, 45],  [1, 2],         0, 200],
[“otherdata”, [ 15, 25, 30 ],   [5, 65],  [],             2, 150],
];

Although outlier support was not included in the initial version due to the complexity of  Splunk searches being difficult for the average user, the ability to read and draw them is still in the code. They are merely set to null on input.

The last part, converting box.js to use the new data inputs rather than the internally calculated values was a fairly lengthy but not difficult process. It required careful review of the code to see where all the values were submitted to the various drawing functions or setting variables from calculations. In the places where there were calculations, a simple variable assignment replaced the original code.

For example, the original box.js set min and max with:

var g = d3.select(this),
n = d.length,
min = d[0],
max = d[n – 1];

However, the new box.js simple does:

var g = d3.select(this),
min = data[4],
max = data[5];

In the cases where values were calculated in separate functions, those functions were complete replaced with variable assignments.

For example, the original box.js set whiskerData using:

// Compute whiskers. Must return exactly 2 elements, or null.
var whiskerIndices = whiskers && whiskers.call(this, d, i),
whiskerData = whiskerIndices && whiskerIndices.map(function(i) { return d[i]; });

Yet, the new box.js uses the supplied array of whisker values using the inputed data with:

whiskerData = data[2];

This method was used on the rest of the required variables needed to build a box plot.

After these changes, the box plot loaded via the HTML file and the local test HTTP server.

Conversion to a Splunk App

The next step was to convert the stand alone application to work in Splunk as a Custom Visualization. Kyle Smith, a developer for the Splunk partner Aplura, and author of Splunk Developer’s Guide, provided excellent and thorough guidance in this process. His personal advice and assistance combined with his book were instrumental in the success of this conversion. There were numerous display issues once the app was built in Splunk. This required many iterations of tweaking the code, running the build command, running the search, and more code tweaking.

This process took a fair amount of experimentation with fits and starts down the wrong paths, much as any development process. The final changes are roughly outlined below.

The first thing to do was pull the CSS formatting code from the HTML file and place it into the file at:

$SPLUNK_HOME/etc/apps/viz_boxplot_app/appserver/static/visualizations/boxplot/visualization.css

Next, based on suggestions in the tutorial and Smith, the boxparse.js file was pasted directly into the updateView() function in the supplied source file found at:

$SPLUNK_HOME/etc/apps/viz_boxplot_app/appserver/static/visualizations/boxplot/src/visualization_source.js

An immediate problem to tackle is the conversion of the data supplied from the Splunk search to the format coded as discussed above. In future versions this will entail recoding all the data references to pull from the Splunk supplied data structure, but for now there is code to convert the format to wedge it into the format shown above. There are three formats for Splunk sending the data. They are documented at in the Custom visualization API reference. The default is Row-major where a JavaScript object is returned with an array containing field names and the row values from the raw JSON results object. There is a Column-major option which does the same for column values. The third option is Raw, which returns the full JSON object. The approach used here is Raw. This is set in the visualization_source.js file in the getInitialDataParams by changing outputMode from:

outputMode: SplunkVisualizationBase.ROW_MAJOR_OUTPUT_MODE,

to:

outputMode: SplunkVisualizationBase.RAW_OUTPUT_MODE,

This makes it possible to pull all the field value pairs into the app for quick conversion into the index array format in the formatData function using:

var newData = _.map(data.results, function(d) { return [ d[data.fields[0].name], [ Number(d.lowerquartile), Number(d.median), Number(d.upperquartile) ], [ Number(d.lowerwhisker), Number(d.upperwhisker) ], [], Number(d.min), Number(d.max) ]; });

This, also, forces evaluation of the values to numbers for all but the first category field, which should be a string holding the values of the split by field.

The most difficult part to track down at that point was tweaking the display to work correctly drawing a dynamic Y axis with correctly positioned sized box plots relative to that scale. After much experimentation, some padding happened at the top of the graph by adding a programmable label offset using:

var labeloffset = 75;
var height = 400 + labeloffset – margin.top – margin.bottom;

in visualization_source.js within the updateView code to provide some room at the top of the graph. In addition, the height for the box plots scale was changed from:

.range([yMax, min]);

to:

.range([height, yMin]);

This allowed for the Y axis draw and the box plot range to use the same values, which then allows for the positioning and sizing of the two to be relative such that 50 on the axis lines up with 50 for each of the box plots drawn.

At this point, it was simply adding the steps needed to meet Splunkbase standards and packaging the app directory into a tar.gz file and renaming to .spl.

Final Results

The ultimate result is a Box Plot app with results such as the image below (taken from the Box Plot App example screenshot):

BoxPlotViz-example

The app is available on Splunkbase at https://splunkbase.splunk.com/app/3157/.

The field used must be numeric and be split by another field. The search is longer than many of the normal visualizations, but the density of information displayed and the requirement of pre-computing all values necessitates the specifics.

The search used in the graph example (which comes with the app) uses a lookup with a series of values for the categories shown to provide numeric data and split by categories.

The specific search is:

| inputlookup boxplotexample.csv | stats median(Cost) AS median, min(Cost) AS min, max(Cost) AS max, p25(Cost) AS lowerquartile, p75(Cost) AS upperquartile by Service | fields – count | where isnotnull(median)| eval iqr=upperquartile-lowerquartile | eval lowerwhisker=median-(1.5*iqr) | eval upperwhisker=median+(1.5*iqr)

If the search starting at | stats … is copied and the Cost and Service field names changed, the box plot should draw. If the number ranges in one of the split by field values is in a totally different order of magnitude, the display will not likely be useful for comparisons. In those situations it may be useful to isolate split by field values by the general range for the numeric field (min and max) into different searches using their own box plots. This can be quickly determined by doing a | stats min(field) AS min, max(field) AS max and then sorting as needed to find common groupings.

As an aside, for those new to Splunk app development, speed up the process of reloading app content and configuration file changes, the Splunk docs Customization options and caching: Clear client and server assets caches after customization section suggests using http://<host:mport>/debug/refresh and/or http://<host:mport>/<locale_string>/_bump (e.g. http://localhost:8000/en-US/_bump). Use the debug/refresh end point to reload XML, configuration, and HTML files, and _bump to reload changes to ../appserver/static directories. These were both used many times during the development of the Box Plot app.

The next installment will show the Box Plot App leveraged for a security use case.

Smart AnSwerS #63

$
0
0

Hey there community and welcome to the 63rd installment of Smart AnSwerS.

With Splunk HQ officially more than two times larger, and Splunkers now spread out across more square footage, things have gotten eerily quiet around here as everyone is adjusting to their surroundings, getting to know new neighbors, and figuring out where all the new conference rooms are. Slowly, but surely, we’re getting comfortable in our new home, and once we’re completely settled in, we’ll find ourselves back into the groove of things with a nice balance of work and play :)

Check out this week’s featured Splunk Answers posts:

How to call a Python script from an HTML view?

dsollen had an HTML dashboard and wanted to call a Python script, but even though a success message was returned, the actual script wasn’t called. SplunkTrust member alacercogitatus explains why dsollen’s approach won’t function as expected. However, with his usual Splunk sorcery, he provides a well-crafted tutorial to make this work, breaking down how to create and enable a custom Splunk endpoint that can be called from an HTML view.
https://answers.splunk.com/answers/352862/how-to-call-a-python-script-from-an-html-view.html

How can I find duplicate scheduled searches running in a search head clustering environment?

sat94541 teams up with her identical twin sister rbal to share some helpful troubleshooting tips if you suspect duplicate scheduled searches running in your search head clustering environment. rbal shows a search to use from the distributed management console to find any scheduled searches that were run multiple times. You can further narrow down your investigation on any duplicate saved search sids produced in these results to find any culprits.
https://answers.splunk.com/answers/395008/how-can-i-find-duplicate-scheduled-searches-runnin.html

Why am I getting different count results using “chart count by field” versus “chart count(field) by field”?

Splunk SPL and how it functions can be hard to grasp, especially when we expect seemingly similar searches to produce the same results. sistemistiposta didn’t understand why he was getting different count results for two searches using the chart command with a slight variation in syntax. SplunkTrustee and search ninja sideview decided to take on this challenge, explaining what “count” and “count(foo)” are actually counting in your data, making this another useful lesson for the community as this can make all the difference in the accuracy of your reports.
https://answers.splunk.com/answers/371996/why-am-i-getting-different-count-results-using-cha.html

Thanks for reading!

Missed out on the first sixty-two Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

Smart AnSwerS #64

$
0
0

Hey there community and welcome to the 64th installment of Smart AnSwerS.

One of the Splunk Cloud support engineers left on vacation last week, so in true Splunk fashion, his desk is getting a complete makeover by the time he returns! yannK has been putting on his creative hat this week to transform the desk into a Star Wars TIE Fighter which has been coming together incredibly well. If it were my desk, I’d leave it as a permanent installation because it looks that cool and is still completely functional as a work station…not that I’m jealous or anything ;P

Check out this week’s featured Splunk Answers posts:

How would one correctly configure DATETIME_CONFIG for an app that could be installed in either an indexer cluster or standalone Splunk?

SplunkTrust member acharlieh needed to know how to configure DATETIME_CONFIG in an app relative manner. Users were developing and testing apps on local standalone Splunk instances, but he wanted to make sure these apps could also be deployed across production indexer clusters from a cluster master with the same settings. lguinn provides a clear example of where to store the custom datetime.xml and how to configure props.conf in the same app to deploy consistent settings in both types of environments without making manual changes on each indexer.
https://answers.splunk.com/answers/270337/how-would-one-correctly-configure-datetime-config.html

How to search for fields that cross correlate with a specified field?

zeophlite graphed a field and knew how to add additional fields to manually compare and find similarities in patterns, but wanted to know a way to have Splunk search and return fields that cross correlate based on results. jeffland gives an excellent answer showing various options like using the kmeans command, computing correlations by hand through an example search, and some Splunk out-of-the-box solutions such as the R Project or Machine Learning Toolkit and Showcase apps.
https://answers.splunk.com/answers/374184/how-to-search-for-fields-that-cross-correlate-with.html

Can a search macro have a default value for a parameter?

dsollen was curious to know if it was possible to create a search macro where some of the fields are predefined with a default value that would be used based on the number of arguments provided. SplunkTrustee sideview strikes again with a solution he uses for cases like this: defining two macros. He uses the examples from dsollen’s question to show how the logic between the two definitions would work to use a default value if a user only provides one argument.
https://answers.splunk.com/answers/373040/can-a-search-macro-have-a-default-value-for-a-para.html

Thanks for reading!

Missed out on the first sixty-three Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

Vote using Splunk

$
0
0

Someone recently challenged me to use Splunk for voting. Splunk is a versatile platform, why not make a voting app? Sigi and Stephen put the app together one afternoon and then I tested it out on a live audience during SplunkLive! San Francisco.

 

Picture1 copy

 

It worked like a charm and we gained insight from the audience. That’s when I realized, although it’s not a typical use case of Splunk, this app could be useful for others. From polling an audience during a presentation or even getting consensus from coworkers on a question during a meeting, maybe I should put the app on splunkbase.

 

 

I finally got around to publishing it. It consists of a few components. A webpage (thanks Sigi!)which users click a letter to cast their vote, the splunk vote app (thanks Stephen!) which displays the results, and the Lookup File Editor App for Splunk which makes it easy to create and edit the questions people will vote on.  The webpage is a simple page that allows participants to select one more more answers and move on to the next question by touching the right arrow at the bottom.

 

Screen Shot 2016-05-18 at 2.13.25 PM

 

When a user clicks on an answer an event, like the one below, is sent to your Splunk instance using the HTTP Event Collector.

 

Screen Shot 2016-05-18 at 2.15.06 PM

 

The subject “SplunkLive! San Francisco” is created by appending to the URL. In this case “my.webserver.com/vote/?subject=SplunkLive!%20San%20Francisco“. Simply substitute it for whatever event or topic you want people to vote on. I recommend using a URL shortener so you can give participants a more friendly URL to type in.

To setup the questions simply click the “Edit Questions” link in the nav bar. Some example questions are in the app already. Change them to whatever you would like and make sure to set the subject to match the subject in the URL.

 

Screen Shot 2016-05-18 at 2.23.22 PM

 

The event will appear in the “Select Event” dropdown. Selecting an event and a question in the dropdowns will populate the URL and question panels in the dashboard by pulling from the lookup table.

 

Screen Shot 2016-05-18 at 2.24.03 PM

 

That’s it! Now you can have people vote on topics anytime anywhere using Splunk!

 

 

P.S. If you want to shake things up check this out next. https://github.com/splunk/parallel-piper
At .conf2015 attendees shook their phones hard enough it triggered a custom alert action in Splunk and launched Buttercup from a cannon! I’d love to see what others have come up with. Be sure to send me a message if you’ve done something with the shake or vote app.


Configuring Nginx Load Balancer For The HTTP Event Collector

$
0
0

The HTTP Event Collector (HEC) is the perfect way to send data to Splunk, at scale, without a forwarder. If you’re a developer looking to push logs into Splunk over HTTP or you have an IOT use case then the HEC is for you. We cover multiple deployment scenarios in our docs. I want to focus on a single piece of the following distributed deployment for high availability, throughput and scale; the load balancer.

You can use any load balancer in front of the HEC but this article focuses on using Nginx to distribute the load. I’m also going to focus on using HTTPS as I’m assuming you care about security of your data in-flight.

You’re going to need to build or install a version of Nginx that enables HTTPS support for an HTTP server.

./configure --with-http_ssl_module

If you install from source and don’t change the prefix then you’ll have everything installed in /usr/local/nginx. The rest of the article will assume this is the install path for Nginx.

Once you’ve got Nginx installed you’re going to need to configure a few key items. First is the SSL certificate. If you’re using the default certificate that ships with Splunk then you’ll need to copy $SPLUNK_HOME/etc/auth/server.pem and place that on your load balancer. I’d highly encourage you to generate your own SSL certificate and use this in place of the default certificate. Here are the docs for configuring Splunk to use your own SSL certicicate.

The following configuration assumes you’ve copied server.pem to /usr/local/nginx/conf.

    server {
        # Enable SSL for default HEC port 8088
        listen 8088 ssl;

        # Configure Default Splunk Certificate. 
        # Private key is included in server.pem so use it in both settings.
	ssl_certificate     server.pem;
    	ssl_certificate_key server.pem;		

	location / {
            # HEC supports HTTP Keepalive so let's use it
	    # Default is HTTP/1, keepalive is only enabled in HTTP/1.1
  	    proxy_http_version 1.1;

  	    # Remove the Connection header if the client sends it,
  	    # it could be "close" to close a keepalive connection
  	    proxy_set_header Connection "";

            # Proxy requests to HEC
            proxy_pass https://hec/services/collector;
	}
    }

Next we’ll configure the upstream servers. This is the group of servers that are running the HTTP Event Collector and auto load balancing data to your indexers. Please note that you must use a heavy forwarder as the HEC does not run on a Universal Forwarder.

    
    upstream hec {
        # Our web server, listening for SSL traffic
        # Note the web server will expect traffic
        # at this xip.io "domain", just for our
        # example here
	keepalive 32;

        server splunk1:8088;
        server splunk2:8088;
    }

Now let’s put it all together in a working nginx.conf

# Tune this depending on your resources
# See the Nginx docs
worker_processes  auto;

events {
    # Tune this depending on your resources
    # See the Nginx docs
    worker_connections  1024;
}


http {
    upstream hec {
        # Our web server, listening for SSL traffic
        # Note the web server will expect traffic
        # at this xip.io "domain", just for our
        # example here
	keepalive 32;

        server splunk1:8088;
        server splunk2:8088;
    }

    server {
        # Enable SSL for default HEC port 8088
        listen 8088 ssl;

        # Configure Default Splunk Certificate. 
        # Private key is included in server.pem so use it in both settings.
	ssl_certificate     server.pem;
    	ssl_certificate_key server.pem;		

	location / {
            # HEC supports HTTP Keepalive so let's use it
	    # Default is HTTP/1, keepalive is only enabled in HTTP/1.1
  	    proxy_http_version 1.1;

  	    # Remove the Connection header if the client sends it,
  	    # it could be "close" to close a keepalive connection
  	    proxy_set_header Connection "";

            # Proxy requests to HEC
            proxy_pass https://hec/services/collector;
	}
    }
}

When you start Nginx you will be prompted to enter the PEM passphrase for the SSL certificate. The password for the default Splunk SSL certificate is password.

There are a bunch of settings you may want to tweak including HTTPS Server Optimization, load balancing method, session persistence, weighted load balancing and health checks.

I’ll leave those settings for you to research and implement as I’m not an expert on them all and everyone’s deployment will differ in complexity and underlying resources.

Hopefully this gives you the foundation for a reliable load balancer for your distributed HTTP Event Collector deployment.

Configuring Nginx With Splunk, REST API & SDK Compatibility

$
0
0

Last year I posted an article on how to configure HAProxy with Splunk, REST API & SDK compatibility. Yesterday, I posted an article on how to configure Nginx as a load balancer in front of a tier of HTTP Event Collectors. Today, I want to iterate on the work I did yesterday and show a basic config for Nginx that’s compatible with Splunk, the REST API and SDK’s.

You’re going to need to build or install a version of Nginx that enables HTTPS support for an HTTP server.

./configure --with-http_ssl_module

If you install from source and don’t change the prefix then you’ll have everything installed in /usr/local/nginx. The rest of the article will assume this is the install path for Nginx.

Once you’ve got Nginx installed you’re going to need to configure a few key items. First is the SSL certificate. If you’re using the default certificate that ships with Splunk then you’ll need to copy $SPLUNK_HOME/etc/auth/server.pem and place that on your load balancer. I’d highly encourage you to generate your own SSL certificate and use this in place of the default certificate. Here are the docs for configuring Splunk to use your own SSL certicicate.

The following configuration assumes you’ve copied server.pem to /usr/local/nginx/conf.

    server {
        listen 8089 ssl;
        listen 8000;

        ssl_certificate     server.pem;
        ssl_certificate_key server.pem;

        location / {
            proxy_pass http://splunkweb;
        }

        location /services {
            proxy_pass https://splunkrest;

        }
    }

Next we’ll configure the upstream servers. If you’re using the open source version of Nginx you’ll need to use the IP Hash method for session persistence. If you’re using the commercial version Nginx Plus, you have more options for session persistence methods. Add as many servers as you have to each of the upstream blocks. I used two to illustrate that you can add N servers.

    upstream splunkweb {
        ip_hash;
        server splunk-server-1:8000;
        server splunk-server-2:8000;
    }

    upstream splunkrest {
        ip_hash;
        server splunk-server-1:8089;
        server splunk-server-2:8089;
    }

Now let’s put it all together in a working nginx.conf

worker_processes  auto;

events {
    worker_connections  1024;
}


http {
    upstream splunkweb {
        ip_hash;
        server splunk-server-1:8000;
        server splunk-server-2:8000;
    }

    upstream splunkrest {
        ip_hash;
        server splunk-server-1:8089;
        server splunk-server-2:8089;
    }

    server {
        listen 8089 ssl;
        listen 8000;

        ssl_certificate     server.pem;
        ssl_certificate_key server.pem;

        location / {
            proxy_pass http://splunkweb;
        }

        location /services {
            proxy_pass https://splunkrest;

        }
    }
}

When you start Nginx you will be prompted to enter the PEM passphrase for the SSL certificate. The password for the default Splunk SSL certificate is password.

There are a bunch of settings you may want to tweak including HTTPS Server Optimization, load balancing method, weighted load balancing and health checks.

I’ll leave those settings for you to research and implement as I’m not an expert on them all and everyone’s deployment will differ in complexity and underlying resources.

Hopefully this gives you the foundation for a reliable load balancer to use with Splunk, the REST API and SDK’s.

Smart AnSwerS #65

$
0
0

Hey there community and welcome to the 65th installment of Smart AnSwerS.

We have a couple back-to-back community events happening right after the upcoming long Memorial Day weekend! The next SplunkTrust Virtual .conf Session is scheduled for Tuesday, May 31st at 12:00PM PDT. SplunkTrust member rich7177 will be teaching nOObs the basics of navigating Splunk Web and, time permitting, how to build reports, visualizations, and dashboards. For those of you in the San Francisco Bay Area next week, the SFBA User Group will be on Wednesday, June 1st @ 6:00PM PDT at Splunk HQ in our brand new building next door! Come join us in the shiny new space as Sr. Engineering Manager mszebenyi, original author of the Splunk App for Minecraft, will discuss Splunking game data, and Staff Engineer rsennett will be talking about various experiences doing cool things with Splunk.

Check out this week’s featured Splunk Answers posts:

Is there a way to dynamically assign chart labels using a search?

mszebenyi had a search to pull values from the data to use as labels, but needed a solution to dynamically assign these to charts on a dashboard. somesoni2 provides a run anywhere sample of Simple XML code for a dashboard, demonstrating how to set tokens in the search element to dynamically rename column names.
https://answers.splunk.com/answers/390468/is-there-a-way-to-dynamically-assign-chart-labels.html

Why does Splunk continuously attempt to find a user in LDAP after the user has been removed from Active Directory?

Before a user was removed from Active Directory, RJ_Grayson changed all of the user’s public objects in all Splunk apps’ local.meta files and disabled all privately owned searches and objects. However, Splunk kept attempting to find the user in LDAP and was reporting “Could not find user…” errors. Jeremiah shared a clear and concise process he uses to clear up these LDAP errors in his environment. He suggests replacing the username for ownership on all shared knowledge objects with the new owner’s username in metadata files, back up the user’s home directory by moving it out of the $SPLUNK_Home/etc/users directory, and restart Splunk.
https://answers.splunk.com/answers/389664/why-does-splunk-continuously-attempt-to-find-a-use.html

Tour Creation App for Splunk: How to search which users already completed the tours?

fabiocaldas needed to search which users completed tours created using the Tour Creation App for Splunk. MuS explains how a ui-tour.conf file will be created for a user with the option viewed=1 once a tour is finished. He then shows how to search this .conf file using the rest command in Splunk Web to get a table of the app name, tour name, and the users that have completed it.
https://answers.splunk.com/answers/312081/tour-creation-app-for-splunk-how-to-search-which-u.html

Thanks for reading!

Missed out on the first sixty-four Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

Astronomy Part 3: Splunk, Megastructures, and ITSI

$
0
0

It’s been over half a decade since I wrote a blog entry here at Splunk about an astronomy related topic and that’s because I was just waiting for something interesting to talk about. In the past, I have spoken about star brightness, but recent news has taken this subject to another level.

For those, who may not be following, last fall, astronomers found an inexplicable property of a star almost 1500 light years away. Using Kepler spacecraft data, the star was shown to dim up to 22 percent of its original brightness at various times. To put this in perspective, the largest planets passing in front of a star only dim most stars less than one percent. So, it was obviously not a local planet causing the dimming. The real questions then become can this be a repeatable measurement, at what interval does this happen, and most importantly, what causes the dimming?

Measuring Brightness

Measuring the brightness of one star at various intervals is one thing, but it would behoove the scientific community to do the same for millions of stars in the galaxy to figure out repeatable patterns or if this is an unique occurrence. Since the Milky Way has more than 100 billion stars (Ok, we can rule out the flaring M Class stars comprising 70 percent of all stars and short lived mega stars, but you are still left with 25 billion plus stars), taking historical measurements and analyzing them for patterns for an explanation makes this a big data problem. I had wrote a long time ago that all this data can be stored in Splunk and for any given star, we can conceptually create a timechart of its brightness using whatever units you like such as this example.

Timechart Brightness

 

However, that is so 2010. There may be quite a few software packages out there to create this kind of chart (although you would still need a big data platform to store, retrieve, and analyze such a large growing data set). One could conceivably start collecting data for each star and use thresholds to create alerts when the brightness dims below a certain percentage. The percentage could be hard coded into rules (or Splunk searches) and adjusted as necessary. On the other hand, what if each star has its own peculiarity for the thresholds? Some stars may dim by 1 to 2 percent, which may make them rather uninteresting, but a dimming of 3 percent for the same star may mean a critical alert is needed. Similarly, the star in question for this blog, Tabby’s star, as it is now known as for the discoverer, has thresholds for dimming that are beyond anything ever seen. Automatic threshold adjustment is needed for what could be considered normal, high, and critical. Enter Splunk ITSI (IT Service Intelligence).

Spunk ITSI

Splunk IT Service Intelligence is a monitoring solution that offers innovative, real-time insights into service health against defined  key performance indicators (KPI) to drive operational and business decisions. Splunk ITSI can organize and correlate relevant metrics/events into ‘swim lanes’ to speed up investigations and diagnosis.

If we were to think of a star as a service, key performance indicators (or health scores) can be its brightness, temperature, UV light ouput, and IR light output among other things. Each KPI can have its own thresholds that can be computed dynamically over time to indicate what is normal, high, or critical. Now, instead of creating a plain timechart and using hard coded rules for thresholds, Splunk ITSI can use machine learning to auto-adjust the thresholds over time. The KPI for brightness (or dimming percentage depending on how you look at it) could end up looking something like this in Splunk ITSI.

Brightness

In this hypothetical example, the star has dimmed over 20%, which is critical enough for someone to immediately investigate.

Moreover, since machine learning is used in Splunk ITSI, anomalies for the KPI could easily be detected and alerted upon.

Other KPI

One quick explanation that astronomers suggested for Tabby’s star dimming was a planetary collision at the time of measuring. I find this hard to believe because even the largest planet could only dim a star of this size by 1% and it would be incredibly strange for the fragments of a collision to line up in a blanket array to dim the star at such a large number. Scientifically, one could measure the IR light output increasing if a collision took place and correlate this with the dimming. This change in IR light measurement could also be a KPI for Splunk ITSI. In this case for Tabby’s star, no change in IR light was found.

Another explanation for the dimming that some scientist suggested was that a swarm of exo-comets just happened to be at the right place to dim the star. I find this equally as hard to believe as the number of comets needed (not to mention the enormous size of each comet) and the cooperative alignment of the comets to blanket the star makes it incredible. Scientifically, one could measure the UV light output around the star increasing as the comets swarm the star and this could be another KPI for Splunk ITSI.

Another KPI could be the star’s temperature meaning that perhaps, the temperature changes of the star has a direct relationship to its dimming percentage.  Putting all these KPI together within swim lanes could create a visual correlation for the cause of the dimming. Fortunately, Splunk ITSI has this capability to stack any KPI’s from any service together to visually correlate the results. Here’s a hypothetical example of this using the KPI that are mentioned here.

Swim Lanes

The astute reader will point out that half the stars found are binary stars or greater meaning that they have one or more companions. In this case Splunk ITSI has a concept called entities for which each star could be an entity and the comprised stars of the system could make up the service.

What we have shown here is that by using a platform such as Splunk ITSI, we can treat each notable star in the galaxy as a service, keep track of its KPI within a big data platform, use dynamic thresholds to monitor the service, find anomalies in the KPI, and finally group KPIs from any service within swim lanes to visually correlate possible root causes.

Possible Causes of the Dimming

If you’ve gotten this far reading this blog entry, you may want to ask what is causing this ordinary F class star to dim so much at various intervals. I will offer some more suggestions, for which I must admit, I am unqualified to backup, but it makes for interesting reading.

  • Instrument Error is one explanation. Many stars have been “dimming” over the last 100 years and it was found out that the way instruments measured the brightness over the years changed for the better, which means that no such dimming happened. However, for Tabby’s star, the dimming measurements are recent over the last several years and one cannot simply say new telescopes caused measurement discrepancies. I still find instrument error as the most probable cause as throughout history that has been the case for a number of measurement peculiarities. Some may recall how scientists thought that they had measured particles traveling faster than the speed of light a few years ago and it turned out to be measurement errors.
  • Some have suggested that it may be a brown dwarf star passing Tabby’s star at just the right time the dimming was measured. This would have be to a lucky coincidence. Since the dimming was measured more than once, it would be really difficult to figure out how this invisible brown dwarf star seems to pass undetected on a “frequent” basis.
  • Some intrinsic property of the star causes dimming. This means that the star itself is changing its brightness on non-predictable intervals for some unspecified reason. This is why it would be important to measure the dimming of millions (or even billions in the future) of stars to see if other stars could be caught in the same act. Even if more samples are found on a frequent basis, someone will have to come up with the physics for why a star can suddenly go dim.
  • What if none of the above is true and what if no changes can be found in IR and UV light output? That leaves the Science Fiction explanation of a Megastructure, where some alien civilization figured out a way to build some enormous apparatus around the star that may capture its energy causing it to dim to the outside observer. This may sound totally far out, but I find this explanation easier to digest than huge swarms of exo-comets perfectly arranged to dim a star. It is this explanation that has gotten most people excited about Tabby’s star. I will not speculate more than needed here. My real problem with a megastructure, even if true, is why would the star not be permanently dimmed, if it was blanketed by such a structure?

Public Domain Photo of Megastructure

What is happening is a kick starter effort to take more measurements at frequent intervals around Tabby’s star. Sadly, history has shown how we find interesting things in the cosmos that cannot be found again. The WOW signal and the Mars onsite life detection experiments from the 1970s come to mind. A past world without universal internet and social media left those events as part of history, with discussion limited to few. It would be sad to discover if no more extreme dimming can be measured for Tabby’s star. This is why I advocate that the “health” of many stars be measured and treated as a service with multiple KPIs. Our current limitation is the ability to gather the measurements at mass scale, but I suspect instrument automation will be something that develops broadly over the years.

Conclusion

I have made the case that measuring the variable properties of stars is a big data problem. The use of Splunk ITSI to treat each star as a service with KPIs makes the journey to discovery easier than past static techniques.

In my past Splunk blog entries covering astronomy, I mentioned people, who inspired me to create the entry. Throughout history, we know the familiar names of those who have discovered and enhanced our understanding of the Cosmos. But, for each well known scientific star, there may be hundreds that go unnoticed. I would like to take this opportunity to mention my uncle, retired optical engineer, Dr Pravin Mehta, as one of those, who were among the thousands, that contributed to the field as he had small parts for Skylab, Hubble, Chandra, and the Einstein X-Ray telescopes. It is the endeavors of people such as him that help illuminate our understanding of the universe.

 

One source, many use cases: How to deliver value right away by addressing different IT challenges with Splunk

$
0
0

At a recent #Splunk4Rookies event in Paris, we invited people to think about what kind of information they could get from a single piece of raw data to address different needs.

Here at Splunk we work hard to ensure you get the maximum value from your data.

We used the Prism example from (my blog hero) Matt Davies.  You can see his take on this issue here: http://blogs.splunk.com/2015/06/22/bigdatasuccess/

one_source_many_cases

I would like to share with you some ideas on how to promote Splunk internally by getting lots of values from your machine data. First things first, you need to be aligned with the company strategy. So, let me introduce a scenario for this first blog post so we can identify quick wins that will address issues that could prevent a company from achieving its goals.

Let’s take a real example from one of our top customers which offers bank transaction services to companies. Their mission statement is to deliver secure and innovative payment terminals and payment as a service. The business strategy for the next few years is to expand globally and offer innovative payment services.

To reach this goal, they built a strategic partnership with a global cloud provider so they can accelerate their growth and expansion to new countries. They also put a lot of effort into developping new payment solutions through different channels (mobile, accessories, etc.)

Our customer identified some IT challenges they’ll need to address to be succesful in those initiatives:

  1. Secure cloud applications and infrastructures
  2. Monitor payments through all the channels
  3. Lots of time pressure to deliver new features and expand
  4. Consolidate data from all the countries to have a global view of the business in real time

It seems that Splunk can help them to address many of these IT challenges which directly affect the core business and the strategy led by the company stakeholders.

Below is an anonymized piece of raw machine data coming from a bank transaction application log. Each transaction represents a customer transaction from a point of sale :

raw_data

Here, we have a custom log where a bank transaction is composed of several events that we have to gather by the transaction ID. Fortunately, because we are working with Splunk, we don’t even have to worry about how we send this exotic log structure to be able to send those logs into Splunk – thank you “Schema on the fly” !

So let’s send our logs to Splunk and see what’s happening …

splunk_search

As predicted, each line is an event and we have to group them by transaction ID to identify each bank transaction. Let’s do that.

But first, let’s configure the transaction ID extraction since Splunk does not yet understand this custom data …

extraction1 extraction2

A few clicks later, my transaction ID is extracted, I can group events and think about buidling some cool dashboards. Wow!

extracted_field

Next week, based on this kind of machine data, I will show you how you can build really quickly some dashboards that would address a subset of those critical challenges:

  • Transaction Performance Management: what KPIs the transaction application manager would like to see to measure the performance?
  • Business Performance: what’s my current revenue? compared to yesterday?
  • Security: what could be relevant for a security analyst?

views

If you have ideas, questions or feedback, tweet me @1rom1

 

Viewing all 621 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>