Buttercup Games – Level 3: The One-Millionth Flap

October 10, 2016, 10:21 am

≫ Next: Building add-ons just got 2.0 times easier

≪ Previous: Encrypt a Modular Input Field without using Setup.XML

On the final day of .conf2016 some of us were having dinner and I noticed the number of total flaps was approaching 1 million. That means people tapped their screen nearly 1 million total times to make Buttercup fly! So of course I needed to open a real-time search and watch it click over.

This made me wonder who was the person who actually touched their screen for the 1 millionth time? The answer is always just a search away in splunk.

Congratulations to Mike Ruszkowski, I hope bells rang and confetti rained! I know my co-worker Matt Oliver (at the top of the table above) was gunning for that 1 millionth flap.

Beyond the millionth flap there have been some other impressive statistics. I’m amazed at the users that have 10’s of thousands of flaps. Most impressive is jmiller1202 who has over 1,000 games and 84,000 flaps.

If you haven’t checked out Splunk 6.5 you might be wondering how the colors and scale were applied. Now you can simply edit a table and choose how you want the scale to work.

There is a lot more you can do with tables in the GUI. For example add summaries or format numbers and add units (while still being able to sort as a number!). To see how and other new features check out the 6.5 Overview app on splunkbase.

↧

Building add-ons just got 2.0 times easier

October 12, 2016, 10:16 am

≫ Next: Dashboard Digest Series – Episode 2: Part Deux

≪ Previous: Buttercup Games – Level 3: The One-Millionth Flap

Are you trying to build ES Adaptive Response actions or alert actions and need some help? Are you trying to validate your add-on to see if it is ready to submit for certification? Are you grappling with your add-on setup page and building credential encryptions? If you are, check out Splunk Add-on Builder 2.0.

Below is a brief overview of what’s new in Add-on Builder 2.0:

You can now leverage the easy-to-use, step-by-step workflow in Add-on Builder to create alert actions and ES adaptive response actions. No need to deal with .conf files and Python, let the tool do the work for you.

The validation process has been enhanced to include App Certification readiness. This validation process can also be performed on apps and add-ons that were created outside of Add-on Builder.

New enhanced user experience and step-by-step flow for building data collections. Let the tool automatically generate the Python code for you.

Enhanced out-of-box experience for building the setup page for add-ons with proxy support and multi-account support, as well as credentials encryption using the storage password endpoint.

New helper function libraries to make your life easier when building data collections and alert actions.

Click here for a walkthrough example of how to build ES adaptive response action. Please give Add-on Builder 2.0 a try and let us know your feedback. Happy Splunking and happy data on-boarding!

↧

Dashboard Digest Series – Episode 2: Part Deux

October 14, 2016, 9:00 am

≫ Next: Smart AnSwerS #79

≪ Previous: Building add-ons just got 2.0 times easier

Before moving on to the next episode 3 I decided to do a part two of Episode 2 – Waves! The reason being is two-fold. 1) Splunk Enterprise 6.5 was recently released and 2) Hurricane Matthew had quite the effect on some of these buoys/stations. See the original blog post here: Dashboard Digest Series – Episode 2

Purpose: Display meaningful statistics on NDBC buoy information in historical and real-time. Easily drilldown, aggregate and visualize data from 1000s of buoys transmitting information.
Splunk Version: Splunk 6.5 and above for table coloring
Data Sources: Polling NDBC RSS feed that produces JSON payload
Apps: Add-on for NDBC, Custom Cluster Map Visualization, Clustered Single Value Map Visualization, Geo Heatmap Visualization

Before we get started my thoughts and prayers go out to those affected by Hurricane Matthew.

Let’s take a look at the data . Recall our original dashboard for looking at some of our NDBC Buoy data.

Wouldn’t it be nice to add some coloring to our table (and some more stats)? In Splunk 6.5 this is super simple to do with the new GUI table coloring and conditional formatting options. Check out the Splunk 6.5 Overview App for more details on table coloring. Now it’s a little easier to see patterns and anomalies within our tables. In this case we have wave height, atmospheric pressure, water temperatures, wind speeds and gusts. Yikes! 135mph!

Let’s add another custom visualization to the mix: Geo Heatmap. What’s interesting is I would’ve expected to see high winds down by the Bahamas and Florida as well. Upon doing some research I found that many of the buoys started going off line (and are still offline) once wind gusts were above 80-90mph. The chart below shows what many of these buoys looked like. An exponential increase in wind gust and then – silence.

Hope you enjoyed this quick part deux of Episode 2. Stay tuned for Episode 3 coming very soon and Happy Splunking!

– Stephen

↧

Smart AnSwerS #79

October 14, 2016, 1:04 pm

≫ Next: Splunk takes a flexible approach to license enforcement with Splunk Enterprise 6.5

≪ Previous: Dashboard Digest Series – Episode 2: Part Deux

Hey there community and welcome to the 79^th installment of Smart AnSwerS.

It was great meeting a good handful of folks at .conf2016 just two weeks ago, and finally getting to put more faces to names among our awesome Splunk community. The enthusiasm, excitement, and overall energy throughout the conference is always revitalizing, reminding us Splunkers how important it is to maintain an open environment and culture moving forward. It’s thanks to the feedback of the many users in every type of role and level of experience that continue to make Splunk what it is today. I’m looking forward to more good times of learning and engaging with you all in the coming year.

Also, big congrats to our newest cohort of SplunkTrust Community MVPs for 2016-2017! Thank you for your amazing contributions that have helped educate and inspire users worldwide in all things Splunk. Your highly prized fezzes are very well deserved

Check out this week’s featured Splunk Answers posts:

After upgrading to 6.5.0, KV Store will not start

jcrabb from Splunk support shared this Q&A to help the community troubleshoot an issue with indexers failing to start the KV Store process that some users were experiencing after upgrading Splunk to 6.5.0. A common cause for this is an expired cert used by splunkd to talk to mongod. He shows what commands to run to verify the expiration date, how to create a new cert and test it is valid, and includes three options to confirm KV Store is up and running.
https://answers.splunk.com/answers/457893/after-upgrading-to-650-kv-store-will-not-start.html

Can you query external systems with the curl command in JKats Toolkit?

a212830 was using the JKats Toolkit add-on from Splunkbase, and wanted clarity on how the built-in custom curl command worked and what it could be used for. jkat54, the author of the add-on, responded with the purpose and syntax for the command. The discussion continues in the comment thread with a212830 to work through troubleshooting issues and more examples. Being responsive to inquiries for a tool he’s developed and continues to maintain to help the greater Splunk community, it’s no surprise that jkat54 recently became a new member of the SplunkTrust this past .conf2016
https://answers.splunk.com/answers/443575/can-you-query-external-systems-with-the-curl-comma-1.html

How to set a form input default value in 6.5.0?

frobinson from the Splunk documentation team posted this Q&A to publicize a useful tip and workaround for setting the “All” choice value as the default for a form input in Splunk 6.5.0. She shares example Simple XML code with the proper syntax to implement in your dashboards.
https://answers.splunk.com/answers/455641/how-to-set-a-form-input-default-value-in-650.html

Thanks for reading!

Missed out on the first seventy-eight Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

↧

Splunk takes a flexible approach to license enforcement with Splunk Enterprise 6.5

October 18, 2016, 10:00 am

≫ Next: Can you SPL?

≪ Previous: Smart AnSwerS #79

I can’t believe that Splunk .conf2016 is already behind us. If you joined us in-person in Orlando or watched the keynote on Splunk.com, you know an important theme for Doug Merritt, Splunk President and CEO, is making it easier to do business with Splunk. In his keynote, Doug announced an important change to Splunk Enterprise – the removal of metered license enforcement.

We know that Splunk plays a mission-critical role for your business. With metered enforcement, unanticipated data growth or bursts of new data during an incident investigation could cause disruption in your Splunk operations. So starting with version 6.5, Splunk Enterprise no longer disables searches when you exceed your licensed data ingestion quota.

table summary view This will be standard for any new license purchased as of September 27, 2016. If you’re an existing customer, you will need to upgrade to Splunk Enterprise 6.5 and request a “no-enforcement” license key from your Splunk Sales Rep or Splunk Authorized Partner. For all the details, refer to the Metered License Enforcement FAQ.

If your organization isn’t ready to upgrade to Splunk Enterprise 6.5, you can simply upgrade your existing Splunk License Master to benefit from this change. But then you’ll be missing out on other cool features of the latest release including table datasets, machine learning, search and dashboarding improvements and so much more. Check out the Splunk Enterprise 6.5 product video:

We’ve heard your feedback and we are dedicated to making it easier for you to do business with Splunk. With this enforcement change you’ll still be notified when you exceed your license limit. If you outgrow your license you should work with your Splunk Sales Rep or Splunk Authorized Partner to assess your data ingestion volume needs and purchase additional license capacity to stay in compliance.

In the next blog we’ll take a closer look at another exciting announcement from .conf2016, free personalized dev/test licenses.

Happy Splunking!

Kevin Faulkner

↧

Can you SPL?

October 21, 2016, 9:47 am

≫ Next: Smart AnSwerS #80

≪ Previous: Splunk takes a flexible approach to license enforcement with Splunk Enterprise 6.5

A couple of weeks ago at .conf2016 we conducted our 2nd annual SPL’ing Bee and it was just as exciting as the year before. We had over 30 contestants, close to 100 spectators and a whole new set of challenging questions.

Here is a little background on how the SPL’ing Bee works.

During the SPL’ing Bee, contestants compete by using SPL to answer questions of a specific data set. To do this, contestants download and install the “Add-on for SPLBee App” on Splunkbase. This app allows each contestant to write a SPL query on a specific data set and submit their results to a master judging instance using a macro and a Splunk custom command called sendjobmeta created by our very own Steve Zhang. The master judging instance uses the SPLBee App to update/manage questions and track scoring in real-time. Contestants get points for both the correct answer as well as submission time of their query. Faster submissions get more points!

As contestants continue to submit their answers the master instance shows a real-time display of scoring and statistics in Splunk. Link switchers and form inputs are used to change the question as well as show/hide hints. Results of each round are stored in a lookup table and then tallied up in a summary dashboard at the end of the competition.

This year also has some exciting news! Both the Add-on for SPLBee and SPLBee App are now available to download on Splunkbase so you can run your own SPL’ing Bees!

Below are some action shots from the competition:

This year’s winners were:

Day #1
1^st Place: BrianSerocki
2^nd Place: supersleepwalker
3^rd Place: Lowell K

Day #2
1^st Place: Mason
2^nd Place: ehudb
3^rd Place: d_flo_yo

Congratulations!

I hope you can take these two Apps and build/conduct your own SPL’ing Bees!

Have fun and Happy “SPL’ing”!

– Stephen

↧

Smart AnSwerS #80

October 21, 2016, 9:56 am

≫ Next: Creating McAfee ePO Alert and ARF Actions with Add-On Builder

≪ Previous: Can you SPL?

Hey there community and welcome to the 80^th installment of Smart AnSwerS.

The Splunk Pledge was announced last month, which is our commitment to research, education, and community service. Through Splunk4Good, a minimum of $100 million will be donated over the course of 10 years in software licenses, training, support, and education to nonprofit organizations and educational institutions. If there are any nonprofits or academic institutions engaging in positive social change that you feel could benefit from a free 10GB Splunk Enterprise license, standard support, and Splunk eLearning access, please do encourage them to apply!

Check out this week’s featured Splunk Answers posts:

Is there documentation comparing the features of Splunk User Behavior Analytics (Splunk UBA) and Splunk Enterprise Security?

tomasmoser couldn’t find any resource that clearly compared Splunk Enterprise Security and Splunk User Behavior Analytics. vnakra answered with a concise overview of the major differences and uses cases between these two applications. ChrisG added to the conversation with contact information to get more specific questions answered.
https://answers.splunk.com/answers/443840/is-there-documentation-comparing-the-features-of-s.html

How to use the concurrency command to timechart the top 10 concurrencies by field sourceip?

jgcsco was using the concurrency command to try and find the concurrency of an event by sourceip in a time chart, but was getting unexpected results. Luckily, Splunk search guru and SplunkTrust member sideview explains that the concurrency command isn’t the best approach for splitting by a field to visualize in a time chart. Instead, he shares a search string he’s crafted throughout the years to calculate concurrency by a split by field, and explains how the various SPL commands operate to get the required chart.
https://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html

Splunk sub-processes start/stop every minute (splunk-admon, splunk-powershell, etc). How do we prevent this?

hortonew needed to configure a Windows universal forwarder to prevent the behavior of Splunk processes from constantly starting and stopping every sixty seconds. jtacy had faced this same issue before, and showed how he configured modular input processes to run only once using the interval setting in inputs.conf.
https://answers.splunk.com/answers/444108/splunk-sub-processes-startstop-every-minute-splunk.html

Thanks for reading!

Missed out on the first seventy-nine Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

↧

Creating McAfee ePO Alert and ARF Actions with Add-On Builder

October 24, 2016, 8:29 am

≫ Next: Preparing for a successful Enterprise Security PS engagement

≪ Previous: Smart AnSwerS #80

One of the best things about Splunk is the passionate user community. As a group, the community writes amazing Splunk searches, crafts beautiful dashboards, answers thousands of questions, and shares apps and add-ons with the world.

Building high quality add-ons is perhaps one of the more daunting ways to contribute. Since the recently-updated Splunk Add-On Builder 2.0 was released, however, it’s never been easier to build, test, validate and package add-ons for sharing on SplunkBase.

Technical Add-Ons, aka TAs, are specialized Splunk apps that make it easy for Splunk to ingest data, extract and calculate field values, and normalize field names against the Common Information Model (CIM). Since the release of version 6.3, Splunk Enterprise also supports TAs for modular alert actions. This allows users to take actions on Splunk alert search results by integrating with nearly any type of open system.

While I am no developer, I have tinkered with scripted alert actions in the past. Scripted alert actions existed before modular alert actions, but were more difficult to share and implement. When I saw that new version of the Splunk Add-On Builder had been released, and that it not only supported modular alert actions but also Enterprise Security Adaptive Response Framework (ARF) actions, I had to give it a try. In particular, I wanted to see if I could turn my scripted alert action that tags system in McAfee ePolicy Orchestrator (ePO) into a modular alert action and ARF action.

I downloaded and installed the Splunk Add-On Builder 2.0 to my home Splunk Enterprise 6.5 server. I went into the app and clicked “Create an add-on.” I then clicked the button to create a modular alert action. Most of the other great features of this tool around data ingestion, extraction and normalization weren’t relevant. I was quickly dropped into a very handy wizard that walks you through the entire process needed to make modular alert actions.

The wizard takes you through all the steps you need to create and describe the add-on, collect initial setup data from the user, and collect data needed for each individual alert. Perhaps the biggest hurdle to creating modular alerts in the past was the effort required to generate the initial setup screens and securely store the passwords. The Add-On Builder takes care of all of that for you! All I had to do was drag a few boxes onto a couple of screens and describe the data I was collecting – the Add-On Builder took care of everything else, including enabling secure password collection/storage, as well as providing sample code to access all the collected data in the alert action script.

Collect Setup Info and Passwords Securely

Specify Required Alert Inputs

Adding optional functionality to support Enterprise Security 4.5’s great new Adaptive Response Framework was incredibly simple. I had to ensure that I had the latest Common Information Model installed on my system, and just had to fill out 3 drop-down lists and 3 text fields to categorize the action. Enabling Splunk users to automate security responses has never been easier!

Simple Enterprise Security ARF Integration

The next step was to actually code the alert action in the tool using a little Python. The Add-On Builder provides a syntax-highlighting GUI for creating/editing the script, sample code so even a coding dunce like me will understand how to work with alert variables and search results, and a robust testing tool with logging. It’s all documented right here and here.

All I had to do was a little cut and paste, a bit of research on how to interface with the McAfee ePO web API, and the usual code troubleshooting that needs be done when you have a guy with only a history degree writing Python scripts. The helper functions in the sample code made most of it trivially easy. It was even a simple matter to enable robust logging for end users so they can troubleshoot their own deployment of my add-on.

Code and Test in the Add-On Builder

The only steps that remained were to validate that my app passed all the recommended best practices, and package it up so I could upload it to SplunkBase. Well, guess what? The Add-On Builder automates that process entirely! There’s a 1-click validation test, along with a button to package the add-on as an SPL file suitable for upload to SplunkBase.

Validate and Package

If you’re a Splunk user that uses McAfee ePO in your environment today, I recommend you check out my add-on. It will enable you to search for anything in Splunk that indicates an issue with an ePO-managed server or endpoint, and automatically tag that system so ePO can apply different policies and tasks as needed to address the issue. In addition, if you use Splunk Enterprise Security, you’ll be able to use this feature automatically when a correlation search fires and/or as an ad-hoc action when investigating notable events.

For example, if a Splunk query detects a server or endpoint is communicating with a known malicious host (e.g. through proxy logs with threat intel), this add-on can be used to tag that system as “compromised” or “infected” in ePO. ePO can then automatically run tag-specific tasks such as aggressive virus scans, and/or apply policies like blocking outbound communications via the endpoint firewall or HIPS on the compromised host. This enables true end-to-end automation between any data in Splunk and McAfee endpoint security tools.

Modular Alert Action in Use

And to take this further, if you have an idea for creating your own modular alert action to create a new Splunk integration, I strongly recommend you start by downloading the Splunk Add-On Builder from SplunkBase. It will greatly simplify the process and enable you to give back to the Splunk community. If you do so, please be sure to post a comment here – I’d love to see how others have made use of this incredible tool.

↧

Preparing for a successful Enterprise Security PS engagement

October 24, 2016, 3:05 pm

≫ Next: How to: Splunk Analytics for Hadoop on Amazon EMR.

≪ Previous: Creating McAfee ePO Alert and ARF Actions with Add-On Builder

(Hi all–welcome to the latest installment in the series of technical blog posts from members of the SplunkTrust, our Community MVP program. We’re very proud to have such a fantastic group of community MVPs, and are excited to share what we learn from them.

–rachel perkins, Sr. Director, Splunk Community)
————————————————————————
Hi, I’m Doug Brown, Information Security Analyst at Red Hat, and member of the SplunkTrust.

Over the last few years I’ve spoken with a number of Enterprise Security customers from different regions, and I’ve received mixed feedback about their deployments. The good news is that there are some easily-avoidable common pitfalls, and by being aware of these before engaging Splunk Professional Services, hopefully you’ll be able to derive the greatest value possible from Enterprise Security.

The most common issue I hear about is performance. There are a number of compounding reasons why this can be the case. Although the reference hardware for indexers suggests a minimum disk performance of 800 IOPS, the requirements for a production Enterprise Security deployment well exceeds this. The performance of storage being used for hot/warm buckets should be at least 20 times greater. Splunk makes hacking at big, dodgy datasets look trivial, but there’s no avoiding the significant load required to perform these tasks. As such, ensure you have the necessary spec before Splunk Professional Services comes on site, because Splunk Enterprise, let alone Enterprise Security, will never fly without sufficient dedicated IOPS. The de facto way we stress test IOPS is using bonnie++
(http://www.coker.com.au/bonnie++/). Just be sure to let your storage admins know before running the test.

Disk isn’t everything though–the operating system does make a difference. Your organisation might consider itself to be a “Windows shop”, but using an enterprise-grade Linux distribution is essential to good performance. Ensure that Transparent Huge Pages is turned off, and the default ulimits have been increased. Also, virtual doesn’t have to equal poor performance when it comes to Splunk. It’s true that certain virtualised configurations can be sub-optimal, but a well configured enterprise-grade hypervisor on good hardware, with LUNs presented from decent
storage can be just as good as bare metal, and provide greater flexibility.

Before Splunk Professional Services arrive, be sure to upgrade your existing Splunk Enterprise servers across your infrastructure. It will have to be done before Enterprise Security is installed anyway, so you may as well do it yourself. In this way, your time with Professional Services can be better spent on the deployment itself.

Well ahead of time, it’s important to identify your organisation’s obscure, custom, but important data sources that Enterprise Security needs to “see” (have mapped to the CIM). These are sources of information for which there isn’t an existing certified TA app on Splunkbase. For example, if your organisation has a custom SSO technology, speak with the subject matter experts to find out the key security-relevant events, and document them ready for PS. Better yet, if you’re familiar with writing CIM-compliant TAs, go ahead and write them yourself! (Then be awesome and release them under an Open Source license on Splunkbase.)

Enterprise Security is all about enrichment, and underpinning that is the identity and asset lookups. Without this, the value you can derive from ES is significantly constrained. Keep in mind that PS won’t be able to produce these for you, as they don’t know your organisation. Find someone internally that can help you programmatically produce these lookups according to the format specified in the documentation (docs.splunk.com/Documentation/ES/latest/User/AssetandIdentityLookupReference). This will likely require a significant investment of time and energy to munge and hack data from from various sources, but it’s well worth the effort. Don’t worry too much about populating fields you don’t care about or don’t have a source of truth for (‘priority’ being the exception). In fact, less can be more, as those lookups have to be propagated to the search peers (indexers) at search time. If they’re too big, it can cause performance issues, so try to keep them a reasonable size (ideally below 10MB each).

If Enterprise Security is to be installed in a search head cluster, be sure to have ready a fresh Linux virtual machine that PS can use for staging. There are no spec requirements for the staging machine, anything will do. Finally, if using a proxy, ask Professional Services for the list of Enterprise Security’s threat feeds so you can configure your proxy settings beforehand.

Good luck with your preparation and be sure to check back here soon for the next installment in this series where we’ll discuss what to do when Splunk Professional Services arrive.
————————–
Thanks to Rachel and Alison Perkins, Simon Duff, Russ Uman, Joshua Rodman, and the many SplunkTrust members whose feedback helped shape this post. May the advice here help improve the security of organisations great and small.

↧

How to: Splunk Analytics for Hadoop on Amazon EMR.

October 30, 2016, 4:48 pm

≫ Next: Splunking Kafka At Scale

≪ Previous: Preparing for a successful Enterprise Security PS engagement

**Please note: The following is an example approach outlining a functional Splunk Analytics for Hadoop environment running on AWS EMR. Please talk to your local Splunk team to determine the best architecture for you.

Using Amazon EMR and Splunk Analytics for Hadoop to explore, analyze and visualize machine data

Machine data can take many forms and comes from a variety of sources; system logs, application logs, service and system metrics, sensors data etc. In this step-by-step guide, you will learn how to build a big data solution for fast, interactive analysis of data stored in Amazon S3 or Hadoop. This hands-on guide is useful for solution architects, data analysts and developers.

You will need:

An Amazon EMR Cluster
A Splunk Analytics for Hadoop Instance
Amazon S3 bucket with your data
- Data can also be in Hadoop Distributed File System (HDFS)

To get started, go into Amazon EMR from the AWS management console page:

From here, you can manage your existing clusters, or create a new cluster. Click on ‘Create Cluster’:

This will take you to the configuration page. Set a meaningful cluster name, enable logging (if required) to an existing Amazon S3 bucket, and set the launch mode to cluster:

Under software configuration, choose Amazon EMR 5.x as per the following:

Several of the applications included are not required to run Splunk Analytics for Hadoop, however they may make management of your environment easier.

Choose the appropriate instance types, and number of instances according to your requirements:

** please note that Splunk recommends Hadoop nodes to be 8 cores / 16 vCPU. The M3.xlarge instances were used for demonstration here only.

For security and access settings, choose those appropriate to your deployment scenario. Using the defaults here can be an appropriate option:

Click ‘Create Cluster’.

This process may take some time. Keep an eye on the Cluster list for status changes:

When the cluster is deployed and ready:

Clicking on the cluster name will provide the details of the set up:

At this point, browse around the platform, and get familiar with the operation of the EMR cluster. Hue is a good option for managing the filesystem, and the data that will be analyzed through Splunk Analytics for Hadoop.

Configure Splunk Analytics for Hadoop on AWS AMI instance to connect to EMR Cluster

Installing Splunk Analytics for Hadoop on a separate Amazon EC2 instance, removed from yourAmazon EMR cluster is the Splunk recommended architectural approach. In order to configure this setup, we run up a Splunk 6.5 AMI from the AWS Marketplace, and then add the necessary Hadoop,Amazon S3 and Java libraries. This last step is further outlined on Splunk docs at -http://docs.splunk.com/Documentation/HadoopConnect/1.2.3/DeployHadoopConnect/HadoopCLI

To kick off, launch a newAmazon EC2 instance from the AWS Management Console:

Search the AWS Marketplace for Splunk and select the Splunk Enterprise 6.5 AMI:

Choose an instance size to suit your environment and requirements:

**please note that Splunk recommends minimum hardware specs for a production deployment. More details at http://docs.splunk.com/Documentation/Splunk/6.5.0/Installation/Systemrequirements

From here you can choose to further customize the instance (should you want more storage, or to add custom tags), or just review and launch:

Now, you’ll need to add the Hadoop,Amazon S3 and Java client libraries to the newly deployed Splunk AMI. To do this, first grab the versions from theAmazon EMR master node for each, to ensure that you are matching the libraries on your Splunk server. Once you have them, install them on the Splunk AMI:

Move this to /usr/bin and unpack it.

In order to search theAmazon S3 data, we need to ensure we have access to the S3 toolset. Add the following line to the file /usr/bin/hadoop/etc/hadoop/hadoop-env.sh:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop‌/tools/lib/*

Finally, we need to setup the necessary authentication to access Amazon S3 via our new virtual index connection. You’ll need a secret key ID and access key from your AWS Identity and Access Management (IAM) setup. In this instance, we have setup these credentials for an individual AWS user:

Ensure that when you create the access key, you record the details. You then need to include these in the file located at /usr/bin/hadoop/etc/hadoop/hdfs-site.xml. Include the following within the <configuration> tag:

<property>
   <name>fs.s3.awsAccessKeyId</name>
   <value>xxxx</value>
</property>
<property>
   <name>fs.s3.awsSecretAccessKey</name>
   <value>xxxx</value>
</property>
<property>
   <name>fs.s3n.awsAccessKeyId</name>
   <value>xxxx</value>
</property>
<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>xxxx</value>
</property>

You need to include the s3n keys, as that is the mechanism we will use to connect to the Amazon s3 dataset.

Create data to analyze with Splunk Analytics for Hadoop

We have multiple options for connecting to data for investigation within Splunk Analytics for Hadoop. In this guide, we will explore adding files to HDFS via Hue, and connecting to an existing Amazon S3 bucket to explore data.

Option 1 – S3

From the AWS Management Console, go into Amazon S3, and create a new bucket:

Give the bucket a meaningful name, and specify the region in which you would like it to exist:

Click create, and add some files to this new bucket as appropriate. You can choose to add the files to the top level, or create a directory structure:

The files or folders that you create within the Amazon S3 bucket need to have appropriate permissions to allow the Splunk Analytics for Hadoop user to connect and view them. Set these to allow ‘everyone’ read access, and reduce this scope to appropriate users or roles after testing.

Option 2 – HDFS

**this option is only relevant if you DO NOT want to leverage Amazon S3 for data storage. You’ll need to ensure that you have assigned appropriate disk space on the Hadoop nodes to leverage this method.

First, let’s create or upload some data in HDFS. First we will need a user in HDFS. We will use root, however this may not be the appropriate user in your environment. From the master node:

hadoop fs –mkdir hdfs://masternodeaddress:8020/user/root

hadoop fs –chown root:root hdfs://masternodeaddress:8020/user/root

Now, use hue to upload data to this new directory. Login to hue:

http://masternodeaddress:8888

Select the file browser, navigate to the /user/root directory and create a ‘data’ directory. Navigate into this directory, and then upload some files for use.

This should result in data being available in the Hadoop FS:

Set up Splunk Analytics for Hadoop for data analysis

To proceed, first you’ll need to grab some parameters from the Hadoop nodes:

Collect Hadoop and Yarn variables:

Java Home = type ‘which java’ = /usr/bin/java
Hadoop home = type ‘which hadoop’ = /usr/bin/hadoop
Hadoop version = type ‘hadoop version’ = hadoop 2.7.2-amzn-3
Name node port = In a browser go to http://masternodeaddress:50070 (or click on HDFS name node in the EMR management console screen)
Yarn resource manager scheduler address= In a browser go to http://masternodeaddress:8088/conf (or click on ‘resource manager’ in the EMR management console screen) = look for ‘yarn.resourcemanager.scheduler.address’ = x.x.x:8030
Yarn resource manager address= In a browser go to http://masternodeaddress:8088/conf (or click on ‘resource manager’ in the EMR management console screen) = look for ‘yarn.resourcemanager.address’ = x.x.x:8050

Now, we need to verify that the name node is correct. You can do this by executing this command:

hadoop fs –ls hdfs://masternodeaddress:8020/user/root/data

Now we can configure our Virtual Provider in Splunk. To do this, go to settings, and then Virtual Indexes:

Then choose to create a new provider:

Using the parameters that we gathered earlier, fill this section out:

Save this setup, and go to set up a new Virtual Index:

Here you can specify the path in HDFS that was set up in an earlier step, or choose to point to the S3 bucket that was created:

Option 1 – S3:

Ensure that you use the s3n prefix here.

Option 2 – HDFS:

Save this set up, and you should now be able to search the data within Amazon S3 (or HDFS) using Splunk Analytics for Hadoop!

Click search on the virtual index config:

Which will take you to the Splunk search interface. You should see something like the following:

↧

Splunking Kafka At Scale

October 31, 2016, 12:02 pm

≫ Next: Smart AnSwerS #81

≪ Previous: How to: Splunk Analytics for Hadoop on Amazon EMR.

At Splunk, we love data and we’re not picky about how you get it to us. We’re all about being open, flexible and scaling to meet your needs. We realize that not everybody has the need or desire to install the Universal Forwarder to send data to Splunk. That’s why we created the HTTP Event Collector. This has opened the door to getting a cornucopia of new data sources into Splunk, reliably and at scale.

We’re seeing more customers in Major Accounts looking to integrate their Pub/Sub message brokers with Splunk. Kafka is the most popular message broker that we’re seeing out there but Google Cloud Pub/Sub is starting to make some noise. I’ve been asked multiple times for guidance on the best way to consume data from Kafka.

In the past I’ve just directed people to our officially supported technology add-on for Kafka on Splunkbase. It works well for simple Kafka instances, but if you have a large Kafka cluster comprised of high throughput topics with tens to hundreds of partitions, it has its limitations. The first is that management is cumbersome. It has multiple configuration topologies and requires multiple collection nodes to facilitate data collection for the given topics. The second is that each data collection node is a simple consumer (single process) with no ability to auto-balance across the other ingest nodes. If you point it to a topic it will take ownership of all partitions on the topic and consumes via round-robin across the partitions. If your busy topic has many partitions, this won’t scale well and you’ll lag reading the data. You can scale by creating a dedicated input for each partition in the topic and manually assigning ownership of a partition number to each input, but that’s not ideal and creates a burden in configuration overhead. The other issue is that if any worker process dies, the data won’t get read for its assigned partition until it starts back up. Lastly, it requires a full Splunk instance or Splunk Heavy Forwarder to collect the data and forward it to your indexers.

Due to the limitations stated above, a handful of customers have created their own integrations. Unfortunately, nobody has shared what they’ve built or what drivers they’re using. I’ve created an integration in Python using PyKafka, Requests and the Splunk HTTP Event Collector. I wanted to share the code so anybody can use it as a starting point for their Kafka integrations with Splunk. Use it as is or fork it and modify it to suit your needs.

Why should you consider using this integration over the Splunk TA? The first is scalability and availability. The code uses a PyKafka balanced consumer. The balanced consumer coordinates state for several consumers who share a single topic by talking to the Kafka broker and directly to Zookeeper. It registers a consumer group id that is associated with several consumer processes to balance consumption across the topic. If any consumer dies, a rebalance across the remaining available consumers will take place which guarantees you will always consume 100% of your pipeline given available consumers. This allows you to scale, giving you parallelism and high availability in consumption. The code also takes advantage of multiple CPU cores using Python multiprocessing. You can spawn as many consumers as available cores to distribute the workload efficiently. If a single collection node doesn’t keep up with your topic, you can scale horizontally by adding more collection nodes and assigning them to the same consumer group id.

The second reason you should consider using it is the simplified configuration. The code uses a YAML config file that is very well documented and easy to understand. Once you have a base config for your topic, you can lay it over all the collection nodes using your favorite configuration management tool (Chef, Puppet, Ansible, et al.) and modify the number of workers according to the number of cores you want to allocate to data collection (or set to auto to use all available cores).

The other piece you’ll need is a highly available HTTP Event Collector tier to receive the data and forward it on to your Splunk indexers. I’d recommend scenario 3 outlined in the distributed deployment guide for the HEC. It’s comprised of a load balancer and a tier of N HTTP Event Collector instances which are managed by the deployment server.

The code utilizes the new HEC RAW endpoint so anything that passes through will go through the Splunk event pipeline (props and transforms). This will require Splunk version >= 6.4.0.

Once you’ve got your HEC tier configured, inputs created and your Kafka pipeline flowing with data you’re all set. Just fire up as many instances as necessary for the topics you want to Splunk and you’re off to the races! Feel free to contribute to the code or raise issues and make feature requests on the Github page.

Get the code

↧

Smart AnSwerS #81

November 1, 2016, 7:33 am

≫ Next: Dashboard Digest Series – Episode 3

≪ Previous: Splunking Kafka At Scale

Hey there community and welcome to the 81^st installment of Smart AnSwerS.

The San Francisco Bay Area user group will be meeting tomorrow, Wednesday, November 2^nd @ 6:30PM PDT at Yahoo! HQ. Gregg Daly from the Children’s Discovery Museum of San Jose will be speaking on how the nonprofit has been using the free Splunk Enterprise license donated by Splunk4Good to monitor IT and security operations. Jason Szeto, principal software engineer at Splunk, will be giving a talk and live demo on a new Splunk feature currently under development. If you happen to be in the area, you’re welcome join us! Please visit the SFBA user group event page for more details and to RSVP.

Check out this week’s featured Splunk Answers posts:

Why is my cluster master reporting “Cannot fix search count as the bucket hasn’t rolled yet”, preventing me from meeting my Search Factor?

LiquidTension’s cluster master was reporting 18 pending fixup tasks that were preventing both search and replication factors from being met, and this was an issue affecting several other users as well. Luckily, cluster master rbal from Splunk support answers the question, explaining why these messages occur in an indexer clustering environment, where to investigate in Splunk Web, and how to resolve the issue right away.
https://answers.splunk.com/answers/217020/why-is-cluster-master-reporting-cannot-fix-search.html

How to monitor changes made to the inputs.conf file?

With inputs.conf getting updated periodically, agoyal needed a way to keep track of any changes made to the file. lukejadamec provides the steps for monitoring changes on an inputs.conf file, noting that there may be several Splunk instances that should be taken into account for complete coverage of all changes in a deployment.
https://answers.splunk.com/answers/448625/how-to-monitor-changes-made-to-the-inputsconf-file.html

How to write a search to only keep a certain type of value for a multivalue field?

dmacgillivray had a table with a multivalue field, and was looking for an SPL solution to filter out any values that did not match a certain format, but still maintain the same number of rows. New SplunkTrust member sundareshr provides two search solutions using eval and regex to get the same expected result.
https://answers.splunk.com/answers/447730/how-to-write-a-search-to-only-keep-a-certain-type.html

Thanks for reading!

Missed out on the first eighty Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

↧

Dashboard Digest Series – Episode 3

November 1, 2016, 11:30 am

≫ Next: Drive your Business in Real-Time with Splunk – Part 1

≪ Previous: Smart AnSwerS #81

Welcome to Episode 3 of the Dashboard Digest series! At Splunk we love to eat our own dogfood so in this episode we will see a dashboard showing energy and water usage at Splunk headquarters in San Francisco! Additionally you’ll see a few new custom visualizations that became available for use in Splunk 6.4 as well as use of the Machine Learning Toolkit.

Purpose: Display and analyze building energy and water usage. Use machine learning to forecast energy usage, detect outliers and look for anomalies.
Splunk Version: Splunk 6.4 and above
Data Sources: Sensor data in JSON format coming from Aquicore devices.
Apps: Machine Learning Toolkit, Water Gauge Visualization, Calendar Heatmap Visualization

Summary of tips/tricks used:

1. Status Indicator Visualization
2. Calendar Heatmap Visualization
3. Machine Learning Toolkit – Forecast Time Series Chart
4. Machine Learning Toolkit – Outliers Chart
5. Water Guage Visualization

Tips n’ Tricks:

Let’s take a look! This first dashboard is an overall view of energy usage at Splunk. Using some new custom visualizations from Splunkbase I was able to show current/max/avg energy usage as well as a dynamic status icon. The Status Indicator Visualization allows you to pick icons and colors of your choice from fontawesome.io and have them change depending on the values of your data (in this case, an orange lightning bolt for high power usage and a green leaf for average). The best part about these custom visualizations is that they are plug n’ play! No code required.

Next I used the Calendar Heatmap visualization to show overall historic daily power usage. This is nice to look for daily/weekly/monthly trends.

After that I used the machine learning toolkit to both forecast and predict future energy usage as well as look for outliers and anomalies. The SPL command “anomalydetection” is a simple way to detect anomalies whereas the ML Toolkit’s outlier detection is a great way to detect and visualize outliers with custom thresholds.

Lastly I created a simple dashboard to display current and historical water usage. The Water Gauge Viz simply shows the current Water Usage in gallons.

That’s all for this round! Hope you enjoyed and see you next time with another episode of Dashboard Digest Series! Happy Splunking!

– Stephen

↧

Drive your Business in Real-Time with Splunk – Part 1

November 2, 2016, 9:00 am

≫ Next: 101 things the mainstream media doesn’t want you to know about PowerShell logging*

≪ Previous: Dashboard Digest Series – Episode 3

Why would you use Splunk to drive your Business in Real-Time ?

The answer is because Splunk brings you flexibility and reactivity.

Companies constantly look to build agile and flexible IT to support their evolving businesses, this is why they built micro-services and Service Oriented Architecture (SOA). Splunk aligns to this flexibility at the data level to measure and drive your business performance.

We use the term flexibility not only because it is easy to capture all the required data but first and foremost because iteration is key when you build Key Performance Indicators (KPIs). I mean, KPIs start out as the things that business analysts think would be nice to measure your performance but they always end up changing several times. You need to iterate quickly and you don’t want to have to say no to the business for change complexity reasons.

Reactivity is key for all of us. The more reactive you are, the less time you allow for your business to slow down or for your customers to get disappointed. You need to get the information rapidly so you can still take actions before it’s too late. Gatwick runs the airport really smoothly because they can anticipate passenger traffic through analyzing the data. Domino’s pizza launches targeted promotional campaign to boost sales based on real-time data on what’s selling well. Dunkin Donuts sends coupons to customers when they think they are likely to buy extra donuts. They all build their decisions on data and they all use Splunk to do this. If they don’t react as quickly as possible, they lose customers and they lose revenue.

So, how can I simply drive my business in real-time ?

This is a Splunk use case and like every Splunk use case, the first question is: where is the data?!  Business is run on applications but the best (and easiest) way to collect the data is centrally instead of directly from each application. This means from the Enterprise Service Bus (ESB). The ESB is a middleware that concentrates all the application exchanges and handles tasks like message routing, message validation, message transformation, etc.

“Wow, sounds good, only one single place to monitor your business! How is that possible?” The magic is actually not here at all, the magic is that you can do all this without changing anything in your ESB and without being intrusive at all. You just use Stream.

“Stream?!” Stream lets you collect all the information directly from the HTTP(s) traffic: the technical metadata but most importantly the payload. The payload represents the business data and as we capture all the payload and keep it within Splunk, you’ll be able to iterate and easily answer any business needs, both the current ones but also the future ones.

Unlike traditional Business Activity Monitoring solutions where you need to setup the collect, the database that will receive the data, normalize your data and then build your dashboards, Splunk leverages raw data from the network. There’s no need to normalize it and you can immediately build your KPIs.

The data will appear within Splunk as a JSON format :

As you can see, we capture src_content and dest_content which are the service request and response. This is the business transaction.

Splunk extracts all the JSON fields but also the XML tags within the source and destination so you can immediately pivot on your data and build your KPIs: revenue evolution, revenue distribution, etc.

In the next blog post, I will show you what kind of KPIs we can setup and illustrate this with a real use case: driving the business of a hotel booking website. I’ll show you how they use Splunk to monitor the business, take actions and measure the efficiency of those actions in real-time.

Romain.

↧

101 things the mainstream media doesn’t want you to know about PowerShell logging*

November 8, 2016, 9:29 am

≫ Next: Drive your Business in Real-Time with Splunk – Part 2

≪ Previous: Drive your Business in Real-Time with Splunk – Part 1

At .conf2016 Steve Brant and I presented on how to detect PowerShell maliciousness using Splunk [2]. The only problem is, if you didn’t attend the conference and only read the PowerPoint slides you might say something like “Your presentation is just big photos and SPL”. Which is true. Frankly, we like big fonts and we cannot lie. You other presenters may deny. That when a deck goes up with a big sans-serif font and a bright image in your eyes you get… distracted by where I am going with this paragraph. As such, we are going to create blog postings of our presentation for those of you who didn’t attend our talk in person. In this missive I shall divulge the best bang-for-your-$LOCAL_CURRENCY way to log PowerShell commands. The next blogpost will show what to do with those logs. Please note that this document assumes you already have your local Windows-TA working, you are successfully collecting Windows security event logs, and a recent-ish version of PowerShell/WindowsManagementFramework.

Before we start, please note that we stand on the shoulders of giants. Michael Gough has already done the hard work of educating the world on what to enable for Windows logs. So if you want more info after you finish this article, go off and review his contributions to the subject (http://www.HackerHurricane.com [3]) . What you will quickly learn is that Windows logging guidance is similar to ancient maps with “Thar be sea monsters” scribbled in the margins. So use this blog post as a rutter to navigate yourself through the maze of GPO logging options. And before you begin muttering about how you have PowerShell logging already enabled… I would go double-check. You most likely do not. PowerShell logging is a convoluted path fraught with missteps and GPO settings of worthlessness. *ahem* but I digress once more *ahem*.

So, what is the easiest way to find PowerShell malicious activity on your network? Configure logging for Command Line Processes. Why? Because logging CommandLineProcess is like eating melange on the Microsoft planet ArrakisXPSP3. It allows you to see exactly what PowerShell commands the adversaries are executing on your network. I should note that there are several methods of logging PowerShell but these other methods can be can be bypassed by adversaries [4]. For more info on recording and ingesting PowerShell logs go review the other awesome PowerShell presentation given at .conf2016 by Ryan Chapman and Lisa Tawfall [5].

So, how do we do enable CommandLineProcess logging? The best method is via a GPO. For those of you playing at home, I recommend having an adult (Windows Sysadmin) do this for you. They tend to be very upset when other people modify their group policy settings. Open/create a GPO and navigate to:

Computer Configuration\Windows Settings\Security Settings\Advanced Audit Policy Configuration\System Audit Policies\Detailed Tracking\Audit Process Creation

Once that setting is open, make sure all three boxes are clicked.

Now navigate to and click enable:

Computer Configuration\Administrative Templates\System\Audit Process Creation\Include\command line in process creation events

Sweet! This enables logging PowerShell commands under EventID 4688.

With these settings enabled any commands run on the command line of the machine (including PowerShell). This method (unlike transcript collection aforementioned) cannot be bypassed. Every process created via the command line (for better and worse) is collected in Windows Event Viewer:

Image from https://adsecurity.org/?p=1275

However, there is a catch. This means ANYTHING that someone executes in the command line will be recorded. If a user or an Admin chose to run a script/command that includes a clear text password… this will collect it [6]. Personally… I am ok with that. I’d rather know (and promptly stamp out) any use of clear text passwords in my organization but YMMV with that opinion.

In my next blog post I’ll give some examples of SPL searches you can run to find naughty PowerShell activity and disrupt the adversary. Happy (preparation of) Hunting :-).

~~~~~~

[* 1] (that title isn’t true… its another lie) Clickbait Title Generator
http://www.contentrow.com/tools/link-bait-title-generator

[2] “Hunting the Known Unknowns: The PowerShell Edition”
https://conf.splunk.com/sessions/2016-sessions.html#search=brant&

[3] “PowerShell CheatSheet” https://static1.squarespace.com/static/552092d5e4b0661088167e5c/t/5760096ecf80a129e0b17634/1465911664070/Windows+PowerShell+Logging+Cheat+Sheet+ver+June+2016+v2.pdf

[4] Investigating PowerShell Attacks
https://www.blackhat.com/docs/us-14/materials/us-14-Kazanciyan-Investigating-Powershell-Attacks-WP.pdf

[5] “PowerShell Power Hell: Hunting for Malicious Use of PowerShell with Splunk” https://conf.splunk.com/sessions/2016-sessions.html#search=Chapman&

[6] Command Line process auditing
https://technet.microsoft.com/windows-server-docs/identity/ad-ds/manage/component-updates/command-line-process-auditing

↧

Drive your Business in Real-Time with Splunk – Part 2

November 10, 2016, 8:21 am

≫ Next: Dashboard Digest Series – Episode 4 – NFL Predictions

≪ Previous: 101 things the mainstream media doesn’t want you to know about PowerShell logging*

Hello All !

You remember last week ? We were speaking about ESB and how you can leverage this central component to drive your business in real-time ! Simply by using Splunk Stream to capture ESB traffic on the fly without any modification…

splunk hotel

Today, I will focus on a potential use-case for this at the “Splunk Hotel”! SplunkHotel is a company that owns a few hotels but also references other hotels on its booking website. Those independent hotels pay to be on the website. In exchange we guarantee additional revenue.

Splunk already collects the data in Stream and as we saw last week we collect the business payload so we can get inputs regarding the business, including revenues and trends.

Let’s play with the data! One of my revenue channels is the service offered to independent hotels. So I have to make sure that I generate some revenues for them. Below, you will find an interesting KPI that provides a kind of room performance score and the ratio between availability and booking requests. We want every room to be in the top right, like Splunk on the last Gartner Magic Quadrant. That would mean there is high demand and lots of bookings!!

Room Score

Basically, the x-axis represents the number of times a room appeared in the search results and the y-axis represents the number of times the rooms were booked.

I can see that there are a few rooms with high demand but low bookings. Typically those rooms are probably too expensive or maybe too far down in the search results. On the other side, I can see rooms that do not appear as much in the results and I have to move them further up the ranking so they start appearing on the booking website.

I created a table report listing all the rooms, a few details on each room and an action column. You can take this directly from Splunk and then add two actions: apply a discount and / or “rank up” a room so that the room goes up in the search results. You know what, applying the discount is a Web Service call!

Room Action

As Splunk is capturing all the ESB exchanges, I will have this information (on the previous actions) in Splunk! I can see analytics on this like how are the bookings for this specific room performing since the discounts were implemented? Wow – that’s awesome isn’t it?

I built a few KPIs that allowed me to deep dive into each discounted / ranked up room to measure the efficiency of the action. Let’s click on room 22 to see the impact.

Since the last discount, I can see I have two new bookings, resulting in additional revenue of 1560 euros. This discount costs me the difference between the real price and the discounted price across the number of nights booked, 520 euros.

Measuring the efficiency in real-time helps you iterate and adapt your strategy in real time. It could be highly effective to do this on rooms that are still available for tomorrow to avoid missed revenue and unoccupied rooms …

As a quick summary:

We collected information from the ESB , analyzed the data and took a series of actions.  We then instantly measured the impact of those actions and as a result could adapt our actions to find the best fit.

esb_quote

One of my customers does exactly this on top of his Camel ESB. He works for a global insurer with customers including railway companies that offer ticket insurance on their booking website. When someone is buying a train ticket, if that person asks for insurance, a web-service from the insurance company is called to provide a quote. If the insurance is bought, another web-service is called. Each call is Splunked and then measured in real-time with insight on the revenue, quote number, etc.

Hope this was clear and helpful!

Romain.

↧

Dashboard Digest Series – Episode 4 – NFL Predictions

November 17, 2016, 3:00 pm

≫ Next: Head in the Cloud? Maximize your Operational Intelligence with Even Deeper Integration Between Splunk and AWS

≪ Previous: Drive your Business in Real-Time with Splunk – Part 2

In Episode 4 we will take a look at the four downs of football. We used the Machine Learning Toolkit and more than a decade of NFL data to build models to make predictions during NFL games.

In order to make it quick and easy to plug in a scenario and visualize the most likely outcomes, we made a simple dashboard so editors at Sports Illustrated could try it out during a game. You may have seen the dashboard if you were watching CNN before the Super Bowl earlier this year:

Purpose: Predict the next play
Splunk Version: Splunk 6.4
Data Sources: Every NFL play and player since 1999
Apps: Machine Learning Toolkit, Shapester

The data contains a lot of fields (part of the list below) that are a lot of fun to analyze and ask questions of. Since our goal was to predict plays, we only used a subset of the fields to build the models.

When the dashboard loads it shows some simple form inputs and a basic table. Once a user selects the “scenario” which includes the team, quarter, down, and yards to go for a first down, they can select what model the want to apply to make a prediction. The dashboard will then update and simply show whether the next play is predicted to be a pass or a rush and how it compares to the historical average given the same scenario.

It’s nice to know if it’s going to be a pass or a rush, but it would be much nicer to know where the pass or rush is likely to be. By clicking on “pass” or “rush,” a new panel will be displayed below that shows this information.

What’s happening behind the scenes? By clicking on the next_play result, the information is passed into another search which applies another model to predict the direction and then unhides the panel. There are some great examples of how to pass tokens, show/hide panels, and more in the Dashboard Examples app.

If I were a coach I’d want to visually see where the next play is likely to take place on the field. Especially if I’m trying to predict what the opposing team is going to do next. To accomplish this we used the Shapester app to create polygons for various sections of the field. Now a coach can select the yard line the ball is on and see visually where on the field the ball is likely to go.

We could keep going from here. For example, we could click on a polygon and show another panel with more information such as the probability of a successful pass to that location, etc. What I like about this dashboard is it starts out simple to the observer. There is simply one table for them to view when they select a scenario. It isn’t until they ask another question until we load an additional panel with visualizations. Keeping dashboards simple on load makes it easier to digest. Speaking of digest, I hope you enjoyed this dashboard digest.

Happy Splunking!

↧

Head in the Cloud? Maximize your Operational Intelligence with Even Deeper Integration Between Splunk and AWS

December 1, 2016, 11:15 am

≫ Next: Smart AnSwerS #82

≪ Previous: Dashboard Digest Series – Episode 4 – NFL Predictions

Even more exciting news from re:invent!

In case you weren’t watching the live-stream of the event, you may have missed the keynote announcement this morning about the new service called AWS Personal Health.

Splunk’s integration with AWS Personal Health allows AWS customers to proactively monitor over 70 services and quickly act on personal service interruptions informing their users of things like reserved instance retirement, network issues, even instance failures. Before, if there was a network issue, your only way of knowing was based on regional or availability zone messaging. This integration brings an even more personalized experience to using Splunk for monitoring and managing your mission critical workloads in AWS.

The AWS Health API delivers critical data on AWS service quality and issues, and the Splunk App for AWS instantly transforms this data into actionable insights. Using the Splunk App for AWS, customers can quickly drill down to identify impacted applications, roll out remediation and proactively message impacted users.

As the exclusive launch partner for AWS Health, Splunk is excited to integrate the AWS Health API into our Splunk App for AWS to deliver end-to-end visibility across more than 14 AWS services. Splunk visibility for AWS is available starting out for just $3 per day, less than a cup of coffee, and enables customers to gain real-time security, operational, and cost management insights across their entire AWS environment. The Splunk App for AWS allows you to aggregate all your AWS data into one single pane of glass view. Users can display multiple accounts, search by region and even sort data across accounts by tags.

With the release of the Splunk App for AWS V5, we’ve deepened the product integrations between Splunk and your AWS services, giving you additional visibility into the business critical workloads you’re running on AWS. Additionally, AWS customers can integrate data from the AWS Health alongside other data sources like cloudwatch and cloudtrail, in Splunk’s out of box interactive topology visualization for faster troubleshooting and easier account management.

Give it a try for yourself! Your first 6 months are free on AWS Marketplace. Available at an entry price point of $3 per day.

Also, go ahead and check out the new integrations in the Splunk App for AWS!

Happy Splunking!

Randy Young and Keegan Dubbs

↧

Smart AnSwerS #82

December 2, 2016, 1:56 pm

≫ Next: Easily Create Mod Inputs Using Splunk Add-on Builder 2.0 – Part IV

≪ Previous: Head in the Cloud? Maximize your Operational Intelligence with Even Deeper Integration Between Splunk and AWS

Hey there community and welcome to the 82^nd installment of Smart AnSwerS.

Have you ever wondered what makes the Splunk community so special, and why many people from various backgrounds are so engaged in all things Splunk? Well, look no further! alacercogitatus, aka Kyle Smith of the SplunkTrust, posted this awesome heartfelt blog post from his experiences engaging with users in the community on and offline, emphasizing how the culture plays an essential role in the success of users stepping into the world of Splunk. You’re not simply learning how to use the products – you’re entering a community of users that are incredibly supportive, passionate, and willing to share their knowledge to help you meet the goals you hope to achieve adopting Splunk. Enjoy the read, and we hope to see more of you around. We’re an open and friendly bunch

Check out this week’s featured Splunk Answers posts:

Are there any Splunk training materials for new users?

SplunkTrust member skoelpin needed to create training sessions for new Splunk users in his organization, and rather than reinventing the wheel entirely, he reached out to the rest of the Splunk community on Answers for any existing resources available. The community answered strong with a collection of videos, blogs, and a variety of other helpful avenues for learning. Big thanks for contributions to the question from woodcock, somesoni2, cbreshears, and Melstrathdee. If you have other ideas you don’t see listed yet, feel free to add to the thread!
https://answers.splunk.com/answers/462710/are-there-any-splunk-training-materials-for-new-us.html

How to use LINE_BREAKER from one source with multiple sourcetypes?

sassens1 had logs coming from a single source for FireEye and Palo Alto sourcetypes, and wanted to use LINE_BREAKER and SHOULD_LINEMERGE in props.conf to properly parse both data formats. lquinn commented that LINE_BREAKER would apply before sourcetype transforms, and SplunkTrustee acharlieh seconded this note in his answer with a link to understanding how the indexing pipeline works. He suggested two options to get the expected outcome: having both logs sent to different TCP ports to assign different sourcetypes at input time, or following best practices with syslog to have a universal forwarder monitor and assign sourcetypes based on host.
https://answers.splunk.com/answers/453239/how-to-use-line-breaker-from-one-source-with-multi-1.html

How can I see the search peer that a forwarder is connected to when using indexer discovery?

Lucas K wanted to know how to find out which indexer in a cluster a forwarder is currently sending data to when using indexer discovery. After Lucas K provided additional _internal logs for debugging, and a helpful comment by garethatiag pointing out a connection issue with the forwarder, mmodestino shared follow up troubleshooting steps to check which set the thread in the right direction. Lucas K realized the forwarder was not communicating with the cluster master because of an incorrectly set password. After getting that fixed, he was able to list forwarders and view “Connected to idx=…” messages in splunkd.log.
https://answers.splunk.com/answers/462088/how-can-i-see-the-search-peer-that-a-forwarder-is.html

Thanks for reading!

Missed out on the first eighty-one Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

↧

Easily Create Mod Inputs Using Splunk Add-on Builder 2.0 – Part IV

December 7, 2016, 7:18 pm

≫ Next: Universal or Heavy, that is the question?

≪ Previous: Smart AnSwerS #82

Add-on Builder 2.0 provides capabilities to build modular inputs without writing any code. In this post however, we focus on using an advanced feature of Splunk’s Add-on Builder 2.0 to write custom python while taking advantage of its powerful helper functions.

NB: Future versions of Add-on Builder will obviate the need for some of the techniques mentioned below, most notably techniques in step #6 & step #8.

There is a veritable cornucopia of useful resources for building modular inputs at docs.splunk.com, dev.splunk.com, blogs.splunk.com, and more. This post certainly isn’t meant to replace those. No no, this post will simply walk you through leveraging Splunk Add-on Builder 2.0 to create custom code to query an API.

In this post we will create a modular input using some custom code to have more control while also leveraging Splunk Add-on Builder’s powerful helper functions. We’ll additionally explore some caveats between test mode and final-cut behavior.

If you’re looking for Part I of this blog, it doesn’t exist. Neither does Part II or Part III.
Spoiler Alert, neither did Leonard Part I, Leonard Part II, Leonard Part III, Leonard Part IV, or Leonard Part V. Some consider it regrettable that Leonard Part VI exists but I leave that to you to decide.

For a backstory, Part I would have used the Add-on Builder feature to Add a data input using a REST API and Part II would have used the feature to add a data input using shell commands. Part III would have therefore described adding a data input by writing your own code. They’re well described in those linked docs though, so we start where those stories lead us by expanding on Part III in this Part IV installment. Kind of A New Hope for our custom code.

You may have seen the post announcing Splunk’s Add-on Builder 2.0. If not, that would be a good pre-read too.

In This Post

Step 1 – Install Add-on Builder v. 2.0
Step 2 – Read through your API documentation
Step 3 – Create Your Add-On
Step 4 – Create Input
Step 5 – Initialize Parameters
Step 6 – Custom Code Primer: Single Instance Mode
Step 7 – Custom Code Auto Generated
Step 8 – Customizing The Auto Generated Code
Step 9 – Entering test values
Step 10 – Run Test
Step 11 – Save Work
Step 12 – Finish
Step 13 – Restart
Step 14 – Cooking With Gas

Step 1 – Install Add-on Builder v. 2.0

Download & install Add-on Builder version 2.0, which we’ll henceforth refer to as AoB

Return to Table of Contents

Step 2 – Read through your API documentation

You know what? This should actually be first, so you can decide how to implement this in AoB. The quicker add a data input using a REST API option may be in play. Yet here we are. For this example, we’ll use HackerNews, because I haven’t previously implemented it and it doesn’t require oAuth (I hope to release a Part V to review oAuth before 2017). Here is some documentation about the HackerNews API: https://github.com/HackerNews/API

Reading through it the first time I note we don’t need an API token or access key. I also notice we need to query max number of records each time, as text. The data itself will be returned in json format. We’ll want to use checkpointing to see how far we read in previous queries to see if we need to read more, etc. Fortunately, the AoB custom python option provides us helper functions to do these things, as we’ll see later in this example.

Return to Table of Contents

Step 3 – Create Your Add-On

Click “Create an add-on”

Fill in Add-on details and click Create

Return to Table of Contents

Step 4 – Create Input

Now we’ll define the input. Follow the workflow to create a custom Python input (Configure Data Collection -> Add Data -> Modular Input Code)

Step 4.1

Step 4.2

Step 4.3

Return to Table of Contents

Step 5 – Initialize Parameters

Now we need to specify the data input properties. NB: Default collection interval is 30 seconds, I adjusted to 300.

Next, data input variables are defined.
These are the per-instance variables configured in settings -> data inputs. There will be one of these for each user-configured input.
Generally this would be where user-specific information lives (e.g. API tokens, etc).
As this is a simple example, we will simply use “number of records to query each time” as a way to demo.

Finally, we’ll define the Add-on Setup Parameters. These are the global input parameters defined in the Add-on’s setup page. These will be the universal settings available to all inputs configured using this modular input.
In this example, we’ll specify an API base URI and the API version.

Return to Table of Contents

Step 6 – Custom Code Primer: Single Instance Mode

Modular inputs have a lot of plasticity. One such flexibility is that they can execute in single or multiple instance modes.

AoB 2.0 ‘my custom python’ feature always leverages single instance mode. It is statically defined in supporting-code that is automatically generated for you. It is not recommended to modify that code as it is re-generated each time you save your custom code in step 8.

It is mentioned here so that you understand that single instance mode only runs your custom code once for ALL its defined inputs. This means if you have three inputs, say foo, bar, and baz (each having their own stanzas in inputs.conf), your custom code will need to embed its logic within a loop that iterates over each stanza.

Don’t worry, we’ll solve for that in step 8 in an explicit example.

To further understand this topic, you may read Splunk’s mod input documentation that reviews single & multiple instance modes of a script.

NB: There are plans to make this easier in a future AoB version, this post is specifically written with AoB 2.0 in mind.

Return to Table of Contents

Step 7 – Custom Code Auto Generated

This is the code that is generated automatically. Notice all the guidance provided that is commented out, just ready for you to un-comment and use.

Review it here or in your browser, and skip to step 8.

# encoding = utf-8

import os

import sys

import time

import datetime

”’

IMPORTANT

Edit only the validate_input and collect_events functions.

Do not edit any other part in this file.

This file is generated only once when creating

the modular input.

”’

def validate_input(helper, definition):

“””Implement your own validation logic to validate the input stanza configurations”””

# This example accesses the modular input variable

# query_max = definition.parameters.get(‘query_max’, None)

pass

def collect_events(helper, inputs, ew):

“””Implement your data collection logic here”””

# The following example accesses the configurations and arguments

# Get the arguments of this input

# opt_query_max = helper.get_arg(‘query_max’)

# Get options from setup page configuration

# Get the loglevel from the setup page

# loglevel = helper.get_log_level()

# Proxy setting configuration

# proxy_settings = helper.get_proxy()

# User credentials

# account = helper.get_user_credential(“username”)

# Global variable configuration

# global_api_uri_base = helper.get_global_setting(“api_uri_base”)

# global_api_version = helper.get_global_setting(“api_version”)

# Write to the log for this modular input

# helper.log_error(“log message”)

# helper.log_info(“log message”)

# helper.log_debug(“log message”)

# Set the log level for this modular input

# helper.set_log_level(‘debug’)

# helper.set_log_level(‘info’)

# helper.set_log_level(‘warning’)

# helper.set_log_level(‘error’)

# helper function to send http request

# response = helper.send_http_request(url, method, parameters=None, payload=None,

# headers=None, cookies=None, verify=True, cert=None, timeout=None, use_proxy=True)

# get the response headers

# r_headers = response.headers

# get the response body as text

# r_text = response.text

# get response body as json. If the body text is not a json string, raise a ValueError

# r_json = response.json()

# get response cookies

# r_cookies = response.cookies

# get redirect history

# historical_responses = response.history

# get response status code

# r_status = response.status_code

# check the response status, if the status is not sucessful, raise requests.HTTPError

# response.raise_for_status()

# checkpoint related helper functions

# save checkpoint

# helper.save_check_point(key, state)

# delete checkpoint

# helper.delete_check_point(key)

# get checkpoint

# state = helper.get_check_point(key)

”’

# The following example writes a random number as an event

import random

data = str(random.randint(0,100))

event = helper.new_event(source=helper.get_input_name(), index=helper.get_output_index(), sourcetype=helper.get_sourcetype(), data=data)

try:

ew.write_event(event)

except Exception as e:

raise e

”’

Return to Table of Contents

Step 8 – Customizing The Auto Generated Code

Here we update auto generated code with our logic.

This is just a quick example so I skipped many important elements of a prod solution including (but not limited to) validation of inputs, logging verbosity flexibility, error handling opportunities, etc.

# encoding = utf-8

import os
import sys
import time
import datetime

'''
    IMPORTANT
    Edit only the validate_input and collect_events functions.
    Do not edit any other part in this file.
    This file is generated only once when creating
    the modular input.
'''
def validate_input(helper, definition):
    """Implement your own validation logic to validate the input stanza configurations"""
    # This example accesses the modular input variable
    # query_max = definition.parameters.get('query_max', None)
    pass

def collect_events(helper, inputs, ew):
  # We import json library for use in massaging data before writing the event
  import json
  
  # Return all the stanzas (per step #6)
  stanzas = helper.input_stanzas
  
  # Iterate through each defined Stanza (per step #6)
  # NB: I only ident this with two spaces so I don't have to re-ident everything else
  for stanza in stanzas:
      
    # Another two-space identation keeps all the "give-me" code from step #7 in-play 
    # without more indenting exercises
    helper.log_info('current stanza is: {}'.format(stanza))
    
    """Implement your data collection logic here"""
    # The following example accesses the args per defined input
    opt_query_max = helper.get_arg('query_max')
    # Test mode will yield single instance value, but once deployed, 
    # args are returned in dictionary so we take either one
    if type(opt_query_max) == dict:
        opt_query_max = int(opt_query_max[stanza])
    else:
        opt_query_max = int(opt_query_max)

    # Fetch global variable configuration (add-on setup page vars)
    # same as above regarding dictionary check
    global_api_uri_base = helper.get_global_setting("api_uri_base")
    if type(global_api_uri_base) == dict:
        global_api_uri_base = global_api_uri_base[stanza]
    global_api_version = helper.get_global_setting("api_version")
    if type(global_api_version) == dict:
        global_api_version = global_api_version[stanza]
        
    # now we construct the actual URI from those global vars
    api_uri = '/'.join([global_api_uri_base, 'v' + global_api_version])
    helper.log_info('api uri: {}'.format(api_uri))

    # set method & define url for initial API query
    method = 'GET'
    url = '/'.join([api_uri, 'maxitem.json?print=pretty'])
    # submit query
    response = helper.send_http_request(url, method, parameters=None, payload=None,
                              headers=None, cookies=None, verify=True, cert=None, timeout=None, use_proxy=True)
    # store total number of entries available from API
    num_entries = int(response.text)
    helper.log_info('number of entries available: {}'.format(num_entries))

    # get checkpoint or make one up if it doesn't exist
    state = helper.get_check_point('stanza' + '_max_id')
    if not state:
        # get some backlog if it doesn't exist by multiplying number of queries by 10
        # and subtracting from total number of entries available
        state = num_entries - (10 * opt_query_max)
        if state < 0:
            state = 0
    helper.log_info('fetched checkpoint value for {}_max_id: {}'.format(stanza, state))
    
    # Start a loop to grab up to number of queries per invocation without
    # exceeding number of entries available
    count = 0
    while (count < opt_query_max) or (count + state > num_entries):
        helper.log_info('while loop using count: {}, opt_query_max: {}, state: {}, and num_entries: {}'.format(count, opt_query_max, state, num_entries))
        count += 1
        # update url to examine actual record instead of getting number of entries
        url = '/'.join([api_uri, 'item', str(state + count) + '.json?print=pretty'])
        response = helper.send_http_request(url, method, parameters=None, payload=None,
                              headers=None, cookies=None, verify=True, cert=None, timeout=None, use_proxy=True)
        # store result as python dictionary
        r_json = response.json()  
        # massage epoch to a human readable datetime and stash it in key named the same
        if r_json['time']:
            r_json['datetime'] = datetime.datetime.fromtimestamp(r_json['time']).strftime('%Y-%m-%d %H:%M:%S')   
        helper.log_info('item {} is: {}'.format(state + count, r_json))   
        # format python dict to json proper
        data = json.dumps(r_json)
        # similar to getting args for input instance, find sourcetype & index
        # regardless of if we're in test mode (single value) or running as input (dict of values)
        st = helper.get_sourcetype()
        if type(st) == dict:
            st = st[stanza]
        idx = helper.get_output_index()
        if type(idx) == dict:
            idx = idx[stanza]      
        # write event to index if all goes well
        # NB: source is modified to reflect input instance in addition to input type
        event = helper.new_event(source=helper.get_input_name() + ':' + stanza, index=idx, sourcetype=st, data=data)
        try:
            ew.write_event(event)
            # assuming everything went well, increment checkpoint value by 1
            state += 1
        except Exception as e:
            raise e 
    # write new checkpoint value
    helper.log_info('saving check point for stanza {} @ {}'.format(stanza + '_max_id', state))
    helper.save_check_point('stanza' + '_max_id', state)

Return to Table of Contents

Step 9 – Entering test values

Now that we’ve copied pasta and modified for our own purposes, we can test!

Be sure to update the forms on the tabs Data input Definition & Add-on Setup Parameters to the left of the Code Editor to make sure test mode has parameters with which to work.

Return to Table of Contents

Step 10 – Run Test

I suppose this could have been included in step #9. Run the test once you’ve entered the parameters on both tabs per step #9 by clicking on that Test button.

In AoB 2.0, events successfully written to Splunk will be displayed in Output window on the right hand side.

Logging

A log for your mod input instance will be in $SPLUNK_HOME/var/log/splunk/<ta_name>_<input_name>.log
In my example, it lives at
/opt/splunk/var/log/splunk/ta_hackernews_hackernews.log

Return to Table of Contents

Step 11 – Save Work

Click the Save button

Return to Table of Contents

Step 12 – Finish

Click the Finish button

Return to Table of Contents

Step 13 – Restart

Re-start Splunk if you haven’t been prompted to do so by now

Return to Table of Contents

Step 14 – Cooking With Gas

Your custom code is setup. It is considered a good practice to create a sandbox index for which to test your new add-on. You can keep tweaking it via AoB until you get everything as you want, cleaning the sandbox index as needed, before validating & packaging (using AoB features, of course)

If you get stuck or have challenges, check the AoB Docs and explore Splunk Answers for Add-on Builder. If you don’t find what you need there, post a new question (be sure to tag it with “Splunk Add-on Builder”)

Return to Table of Contents

↧