Universal or Heavy, that is the question?

December 12, 2016, 10:03 am

≪ Previous: Easily Create Mod Inputs Using Splunk Add-on Builder 2.0 – Part IV

Introduction

As a Professional Services Consultant, a discussion that I often encounter when on site with customers is whether to use a Universal Forwarder or a Heavy Forwarder.

Splunk provides two different binaries, the full version of Splunk and the Universal Forwarder. A full Splunk instance can be configured as a Heavy Forwarder. The Universal Forwarder is a cut down version of Splunk, with limited features and a much smaller footprint.

I am going to show in this blog why Splunk Professional Services recommend the use of Universal Forwarders in preference to Heavy Forwarders whenever possible to ensure a faster, more efficient Splunk Platform.

When should the Universal Forwarder be used and why?

The Universal Forwarder is ideal for collecting files from disk (e.g. a syslog server) or for use as an intermediate forwarder (aggregation layer) due to network or security requirements. Limit the use of intermediate forwarding unless absolutely necessary. Some data collection Add ons require a full instance of Splunk which requires a Heavy forwarder (e.g. DBconnect, Opsec LEA, etc..).

Previously Heavy Forwarders were used rather than Universal Forwarders to filter data before indexing. Thought to be the most efficient use of resources, this not only increased the complexity of the environment, it also increased the amount of network IO that the indexers had to handle. In some circumstances this also increased the CPU and memory usage, negating the intended efficiency gain. This increase in network traffic is due to the Heavy Forwarder sending parsed/cooked data over the network with all the index time fields, raw event and additional metadata, rather than just a raw event.

Do all parsing and filtering on the indexers when possible, to keep the network IO down, this will make the configuration simpler to manage through the use of Universal Forwarders making the Splunk administrator’s job easier.

The following table shows the results of sample tests, sending a dataset from a Heavy Forwarder to an indexer. This test was repeated with acknowledgement enabled and then repeated the tests again using a Universal Forwarder as the data source. The test file contained 367,463,625 events.

	Indexer Acknowledgement	Network GB Transferred	Network Avg(KBps)	Indexing Avg(KBps)	Duration (Secs)
Heavy	Yes	39.1	1941	5092	21151
Heavy	No	38.4	1922	5139	20998
Universal	Yes	6.5	863	14344	7923
Universal	No	6.4	1015	17466	6662

The key takeaways are:

The amount of data sent over the network was approximately 6 times lower with the Universal Forwarder.
The amount of data indexed per second was approximately 3 times higher when collected by a Universal Forwarder.
The total data set was indexed approximately 6 times quicker when collected by the Universal Forwarder.

The use of intermediate forwarding/aggregation layer

The use of aggregation layers sitting between collection and indexing tiers should be the exception rather than the rule, as this can have unintended consequences when it comes to your data.

The use an intermediate forwarding tier is an artificial bottleneck, increasing the amount of time from event generation to availability for searching and can also be a cause of data imbalance on the indexing tier that will reduce search performance.

The use of an intermediate tier will cause the data will funnel data to a smaller subset of indexers at any one time, causing hot spots of data for a given time period. When it comes to searching, this could mean that only a one or two of your indexers contain the results for your search and your search would only leverages the power of a few, rather than the power of many/all. As shown in the diagram below.

The distribution of data across your indexing tier will be lower when an intermediate tier of forwarders is used, ultimately causing a detrimental impact on search performance and user experience.

Common questions

Can I send from a Heavy Forwarder -> Universal Forwarder -> Indexer?

Yes. If you were collecting some data from a database on a remote site and had requirements that data goes through an aggregation layer before it left site, or upon arrival at a remote site.

We may need to filter data, so we should use a Heavy Forwarder, right?

A Universal Forwarder can filter windows events at source by Event ID.

A Universal Forwarder cannot filter based on regular expressions. Do this on the indexers, unless the majority of the data is being dropped at source. This is the most performant and easiest to manage at large scale.

We need to route the data to multiple locations

Simple routing and cloning of data can be performed with the Universal Forwarder, only when you need to route different events to different destinations does a heavy forwarder become necessary. As with filtering, do this at the indexers if at all possible.

Conclusions

The Universal Forwarder is great! It should always be chosen over and above the Heavy Forwarder unless you require functionality the Universal Forwarder cannot deliver.

Perform your data filtering on the indexer, data will be indexed quicker and network admins will be happier that Splunk isn’t using massive amounts of bandwidth.

By only parsing your data on the indexers, your configuration will be simpler.

People that know me know that I love the Acronym K.I.S.S. (https://en.wikipedia.org/wiki/KISS_principle). I always have this rattling in my head when I work on anything, the Universal Forwarder is an easy way to achieve that.

Recommendations

Only use the Heavy Forwarder when:

Dropping a significant proportion of the data at source.
Complex UI or addon requirements, e.g. DBconnect, Checkpoint, Cisco IPS.
Complex (per-event) routing of the data to separate indexers or indexer clusters.

Thanks for reading,

Darren Dance

Senior Professional Services Consultant

Tweet to @dazzlindazdance

↧

Smart AnSwerS #83

December 15, 2016, 3:09 pm

≫ Next: Splunk and AWS: Monitoring & Metrics in a Serverless World

≪ Previous: Universal or Heavy, that is the question?

Hey there community and welcome to the 83^rd installment of Smart AnSwerS.

After a dry spell, Splunk HQ is finally experiencing a good amount of rain in the San Francisco Bay Area. As per usual, people have forgotten how to navigate around the city, both on the roads and sidewalks. On the plus side, we can finally see rain water get collected above the courtyard and flow into a huge basin that distributes the water to surrounding plants. Splunkers have been taking breaks to check out the recycled water system in action as a serene escape, making rainy days at the office something to look forward to.

Check out this week’s featured Splunk Answers posts:

Why is the host name I set in a monitor stanza on a universal forwarder not showing as expected for indexed events?

ejwade had an rsyslog server collecting syslog from various devices, and even though he was assigning host names in inputs.conf for each monitor stanza, not all expected host values were found in indexed events. He found a solution by using specific sourcetypes for firewall logs, but didn’t understand why this worked. lguinn helped fill in the gaps by explaining the default behavior for syslog sourcetypes, and how to override the host value if needed.
https://answers.splunk.com/answers/451421/why-is-the-host-name-i-set-in-a-monitor-stanza-on.html

How to enable and disable scheduled searches using Splunk REST API in PowerShell?

vivekriyer was required to disable and enable scheduled searches, but limited to using Powershell for this task. SplunkTrust member acharlieh may not be a PowerShell user, but he’s great at doing research and found a page from another resource that actually had an example POST request using Splunk’s REST API with PowerShell (how convenient!). With some changes to the arguments, this was just what vivekriyer needed to get the job done.
https://answers.splunk.com/answers/453294/how-to-enable-and-disable-scheduled-searches-using.html’

SplunkJS/HTML Dashboards + map command + $foo$ substitution

SplunkTrust member alacercogitatus shared this question and answer for the latest 6.5.x release as an update to the same Q&A posted by fellow SplunkTrustee martin_mueller almost 3 years ago. He shows how you can set tokens to a string on dashboard initialization that can then be replaced when a search is executed in a panel.
https://answers.splunk.com/answers/464453/splunkjshtml-dashboards-map-command-foo-substituti.html

Thanks for reading!

Missed out on the first eighty-two Smart AnSwerS blog posts? Check ‘em out here!
http://blogs.splunk.com/author/ppablo

↧

Splunk and AWS: Monitoring & Metrics in a Serverless World

January 6, 2017, 12:20 pm

≫ Next: Dashboard Digest Series – Episode 5: Maps!

≪ Previous: Smart AnSwerS #83

Bill Bartlett (fellow Splunker) and I have recently had the distinct pleasure of moving some workloads from AWS EC2 over to a combo of AWS Lambda and AWS API Gateway. Between the dramatic cost savings, and wonderful experience of not managing a server, making this move was a no brainer (facilitated as well by great frameworks like Zappa). Both services are pretty robust, and while perhaps not perfect, to us they are a beautiful thing.

While we were using Splunk to monitor several EC2 servers with various bits of custom code via the Splunk App and Add-On for AWS, we realized (ex post facto) that while Lambda was supported out of the box by the Add-On, API Gateway was not. What is an SE to do?! Much like AWS services, Splunk Add-Ons are also a thing of beauty that make the process of gathering and taming your data very simple. The Splunk Add-On for AWS is no exception. In fact, it’s very feature-rich, seamlessly supporting data collection from new AWS services. The rest of the blog will detail this short journey, and hopefully help you integrate Splunk into your AWS “serverless” infrastructure.

First, we will need a quick primer on how to set this up on the AWS side so the Add-On has something to gather. In the AWS console, go to:

“Services”>”API Gateway”>select the API you want to turn on metrics for>go to the “stage” sub-menu of that API>select the stage you want to add metrics to>check the box next to “Enable Detailed CloudWatch Metrics”.

Repeat this step for any API and associated “stages” you see fit.

Next we will want to review the various naming conventions that CloudWatch uses so we can properly configure things in the Splunk add-on. In your AWS console go to:

Services>CloudWatch>Metrics>you should be on the ”All Metrics” tab>scroll to “AWS Namepaces” section>select “ApiGateway”>Select “By Api Name”.

The “ApiName” field is going to be our “Dimension Names” value. Note that the syntax is similar to an array of JSON objects – with a regular expression sprinkled in (give me metrics for all my APIs by “name” of the API):

[{“ApiName”: [“.*”]}]

The “Metric Name” will be those metric names we saw in the AWS console, formatted as an array:

[“5XXError”,”4XXError”,”Count”,”Latency”,”IntegrationLatency”]

The Metric Namespace we can just use the namespace as provided by Amazon, specifically:

“AWS/ApiGateway”

Note that this is not part of the out of the box list provided by the add-on, but because the Splunk add-on is awesome, we can add new namespaces with a little configuration, and it will know how to collect metrics on them.

The “Metric Statistics” you can just grab from the picklist (they are fairly universal across the different namespaces). I personally like having:

Average, Sum, Maximum and Minimum

In the Splunk Add-On UI, you should have something similar to the below.

The rest of the fields are up to you, but generally speaking, using 300 for “Metric Granularity” and “Minimum Polling Interval” should be sufficient for your first go (and you can adjust as needed). Please reference the Splunk documentation for what all these things mean in detail.

Now that both AWS and the Add-On are configured, we can start looking at what data is flowing in. It will be helpful to understand how the AWS add-on classifies the various data, which is informed by how we configured the input. Open to a search page on your Splunk instance and type in the following:

earliest=-60m "sourcetype = aws:cloudwatch"

You should see some data, and if you’ve been using the Add-On previously, then likely we have more than just ApiGateway data from CloudWatch. This is where our “source” key becomes very helpful, as the Add-On does the heavy lifting of assigning an intuitive name for source.

The convention is typically <aws region>:<metric namespace>, so when exploring the data, you can narrow your searches very quickly by simply using a search such as this:

"source=*:AWS/ApiGateway"

This should return CloudWatch metric data across all regions for ApiGateway. You should see something similar to the screenshot below after a few minutes of the input running.

Now that we have API Gateway data flowing, let’s create a few simple searches to explore the data. First, it will be helpful to look at the different metric values across our “metric dimensions”, which is the name of our API in the AWS console:

source=*ApiGateway* | rex field=metric_dimensions "ApiName=\[(?<api_name>.*)\]" |stats sum(Sum) as sum by metric_name api_name | xyseries metric_name,api_name, sum

In this case, Splunk is summing the “Sum” value of each metric name by the API name. The “rex” command is simply there for cosmetic reasons to make the API name easier to read. The resulting visualizations should look something like the following, with your API names substituted, and different distributions of values, etc.:

Another search that might be interesting is mashing up our Lambda and API Gateway data to see the different metrics next to each other.

sourcetype=aws:cloudwatch (source="us-east-1:AWS/Lambda" OR source="us-east-1:AWS/ApiGateway") | rex field=metric_dimensions "[ApiName|FunctionName]=\[(?<dim_name>.*)\]" | stats sum(Sum) as sum by metric_dimensions dim_name metric_name

We can search by our intuitive “source” key again, this time by region. The “rex” command has been modified to grab the API Name in the case of API Gateway metrics, or the “Function Name”, in the case of Lambda metrics. The reason this works is mainly because Zappa automagically sets the API name and the Lambda function name to the same thing, so YMMV with this search.

To take the previous example futher, lets look at grouping our metrics together as a “serverless app” where the Lambda function name and API name are the same. In this example, the “chart” command gives us a nice way to group things together.

sourcetype=aws:cloudwatch (source="us-east-1:AWS/Lambda" OR source="us-east-1:AWS/ApiGateway") | rex field=metric_dimensions "[ApiName|FunctionName]=\[(?<serverless_app>.*)\]" | chart sum(Sum) as Sum over serverless_app by metric_name

If we want to focus on a specific metric, e.g. “Latency”, we can leverage that same grouping and look at “end-to-end” latency from the API Gateway to our Lambda function. In this example, “Duration” is considered latency.

An important note to consider is that any external services called within the Lambda function contribute to the duration of the function. If your Dynamo table, for example, is having problems then it’s likely to cause a spike in “Latency.”

sourcetype=aws:cloudwatch (source="us-east-1:AWS/Lambda" OR source="us-east-1:AWS/ApiGateway") metric_name=Latency | rex field=metric_dimensions "[ApiName|FunctionName]=\[(?<serverless_app>.*)\]" | timechart sum(Sum) as end_to_end_latency by serverless_app

end_to_end_latency

Troubleshooting

We can take a look at the Splunk Add-On’s internal logs to ensure we are collecting data. As a handy search:

"index=_internal sourcetype="aws:cloudwatch:log" namespace=AWS/apigateway"

You should see results similar to the below:

internal_metrics

We’re excited about the wide range of possibilities that ‘serverless’ architectures on AWS present. In closing, we hope to have shown you equally compelling opportunities to utilize Splunk to monitor and visualize your serverless environments on AWS.

-Kyle & Bill

↧

Dashboard Digest Series – Episode 5: Maps!

January 10, 2017, 11:00 am

≫ Next: Improving Visibility in Security Operations with Search-Driven Lookups

≪ Previous: Splunk and AWS: Monitoring & Metrics in a Serverless World

“A map does not just chart, it unlocks and formulates meaning; it forms bridges between here and there, between disparate ideas that we did not know were previously connected.” ― Reif Larsen, The Selected Works of T.S. Spivet

Welcome to Episode 5 of the Dashboard Digest series!

Maps play a critical role in visualizing machine data in almost any industry for thousands of use cases. We’ve been continuously adding more mapping functionality to Splunk and with the recent addition of Custom Visualizations in Splunk 6.4 you (the community) have too! This is exciting news as I’ve noticed many times the first panel on a dashboard that draws attention is a map. The best part is that each of these displays is either native functionality or plug n’ play for Splunk making it easier than ever to visualize your geographic machine data in real-time.

In this post I’ll briefly go over some of the options for visualizing geographic data today and how to use them. Enjoy!

Purpose: Display the different options for mapping geographic data in Splunk.
Splunk Version: Splunk 6.0 (added native pie chart map), Splunk 6.3 (added choropleths), Splunk 6.4 (added custom visualizations)
Data Sources: N/A.
Apps: Shapester, Geo Heatmap, Custom Cluster Map, Clustered Single Value Map, Location Tracker

In this post I’ll cover the following:

Native Pie Chart
Custom Cluster
Custom Clustered Single Value
Native Choropleths
Custom Choropleths
Geo Heatmap
Location Tracker

1. Native Pie Chart

While being the first map type used in Splunk’s native maps, the pie chart can quickly tell a powerful story. Using the geostats command, you can calculate statistics (just like the stats command) and plot the results using latitude/longitude coordinates. The larger the statistic, the larger the pie chart. And even better you can split by another field for additional context (see example below).

Example Syntax #1: … | geostats sum(price) by action

*Note: If your latitude and longitude fields are named something other than latitude and longitude such as “lat” and “lon” you will need to add the following arguments to your search.

Example Syntax #2: … | geostats latfield=your_latitude_field longfield=your_longitude_field count by threat

2. Custom Cluster

The Custom Cluster Map is another way of representing quantities or values of a specific field. This particular custom visualization is a remake of the Google Maps add-on back in Splunk 5.0. You can change colors, clustering density and other options. It’s a simple and effective way to determine abnormal values geographically.

Example Syntax #1: … | geostats count
Example Syntax #2: … | geostats latfield=your_latitude_field longfield=your_longitude_field avg(speed)

3. Custom Clustered Single Value

The Custom Clustered Single Value visualization is one of my new favorites and contains a set of extremely powerful configuration options including the ability to add description popups with HTML support, color and style markers, add icons, disable clustering and plot nothing but single values. All of this can update dynamically in real-time! There are tons of configuration options that you can learn about from the app page.

Example Syntax #1: … | table latitude, longitude, title, description | eval icon=if(match(title,"SHIP\d+"),"ship","circle") | eval markerColor=if(match(title,"SHIP\d+"),"green","blue")
Example Syntax #2: index=chicago_crime | eval description = "<b>".your_description_field."</b>" | table latitude, longitude, description

4. Native Choropleths

Splunk 6.3 brought us a great addition in Choropleth Maps. These use shading to show relative metrics, such as population or election results, for predefined geographic regions. Out of the box Splunk supports the 50 states in the USA and countries around the world. Different color modes (Sequential, Divergent and Categorical) can be selected as well as customizable bin ranges to enhance granularity. The geom command is responsible for building the geographic boundaries and applying metrics/shading to them.

5. Custom Choropleths

Custom Choropleth Maps allow the use of custom polygons created from .kmz files or drawn from the Shapester app! This capability really expands the use cases for choropleths and I highly recommend you try it out whether you create your own or use a pre-made .kmz file. Just think – creating real-time alerts using your own custom built geo-fences all without having to write any code.

Example Syntax #1: … | stats latest(Av_Level) by Zone | geom geo_avalanche_zones featureIdField=Zone

6. Geo Heatmap

The Geo Heatmap represents quantities or values of fields in a well… heatmap fashion! The settings are configureable to change colors, transparency and map background. Additionally there is an option to play back data over time.

Example Syntax #1: index=noaa | stats max(wind_speed) by latitude longitude
Example Syntax #2 (For Time Playback): index=noaa | timechart span=1h latest(latitude) as latitude latest(longitude) as longitude max(wind_speed) as value by title

7. Location Tracker

The Location Tracker is one of my favorites due to its ability to not only show where an object(or multiple!) currently is but trace out its path over time. Trying to get the native Splunk Pie Chart Map to do the same thing was never a fun task whereas now it’s easy as pie. Pun intended! You can use stats to aggregate statistics but really you can just use 4 fields in a a table. See example syntax below:

Example Syntax #1: ... | table _time latitude longitude
Example Syntax #2 (multiple objects): ... | table _time latitude longitude vehicle_type

It’s as simple as that!

Maps are an incredible way to display information. I’m hoping this summarization of mapping capabilities gives you some ideas on the art of possible for mapping out geographic machine data in real-time!

That’s it for now .. Happy New Year and Happy Splunking!

Stephen

Improving Visibility in Security Operations with Search-Driven Lookups

January 11, 2017, 10:08 am

≫ Next: Visual link analysis with Splunk and Gephi

≪ Previous: Dashboard Digest Series – Episode 5: Maps!

Looking back on 2016, Splunk Enterprise Security added significant capabilities to its platform for security operations, including Adaptive Response, User & Entity Behavior Analytics (UEBA) integration and Glass Tables. Another capability that was added, but has received less attention is a new type of search that Splunk calls Search-Driven Lookups. Because there has not been as much attention put to this, I wanted to share a bit about this capability and how it can be used.

Search-Driven Lookups originated from a question that users of legacy SIEM providers often asked; how can Splunk dynamically create watchlists that can then be used in correlating new events against a watchlist? Enterprise Security has had the ability to correlate against a watchlist for a few years and the method to create watchlists has existed since Splunk Enterprise 4.x, with additional features having been added along the way. Basically, a user can pipe search results through the outputlookup command and place returned values into a lookup. This lookup can then be used in subsequent searches using the inputlookup command.

Starting with Enterprise Security 4.2 in Splunk Cloud and continuing with ES 4.5, the search-driven lookup is available via Configure -> Content Management and provides 25+ searches that populate lookups and can be used with correlation searches, dashboard panels, and other knowledge objects.

Let’s look at a few ways that search-driven lookups can be used and how they could be applied to security operations.

In this example, a search-driven lookup is being used to track when a particular set of events were first observed and most recently observed. The Malware Tracker search-driven lookup populates a list of malware detections first seen and last seen, grouped on host/IP address and signature. This information is updated by default at 10 minutes after the hour, every hour to the malware_tracker lookup.

These observations can then be used to populate dashboards and panels within Enterprise Security. In this case, the Oldest Infection panel within the Malware Operations dashboard reflects this data . This panel provides an analyst with a list of systems that have been infected by malware, the first and last time that malware was identified on the system and then a calculation to determine how many days it has been active on the system.

Another way to use search-driven lookups is to calculate statistical values including standard deviations, minimum and maximum values across populations of events. These statistical values have applicability across security operations for tracking values like network traffic byte counts or web browser user agent strings.

Analyzing user agent strings and their variances across the enterprise may identify outliers that should be investigated. In this case, the search-driven lookup, User Agent Length Tracker, calculates statistical values of minimum length, maximum length, standard deviation of the population and the lengths of the user agent string ranges that are associated with their Z scores.

From here, this data can then be used in the HTTP User Agent Analysis dashboard to populate the User Agent Details panel based on the selection of the Standard Deviation Index dropdown. The search column in the lookup is passed as a token to the search to bound the relevant user agents of interest.

A third way, and possibly the most often requested way to use a search-driven lookup, is to leverage specific values like IP addresses or hostnames to generate a watchlist of values that can then be used in correlation searches to apply additional scrutiny to these watchlisted systems.

The ES Notable Events search-driven lookup generates a lookup that contains values including the correlation rule that triggered the notable event, its associated urgency, source and destination addresses, the status of the notable event as it applies to workflow, the owner of the notable event and additional values. By default, these events are gathered every 10 minutes and kept for 48 hours in this lookup before aging out.

With this list of known offenders, additional correlation searches could leverage these values to further scrutinize specific sources or destinations while utilizing additional values like urgency or the rule name to ensure these additional correlation rules are bound to the most critical events.

These are a few ways that Enterprise Security uses search-driven lookups. That said, there are a number of other things that can be done with this capability. One example of that is the Address Tracker dashboard that I created using the Search And Destination Tracker search-driven lookup. Search And Destination Tracker looks across multiple data models and provides source and destination for web, network traffic and intrusion detection events.

Address Tracker gives the user the ability to search a source address, destination address or both and returns actions like allowed or blocked, sourcetype of the events and when the source and destination pairs were first seen and most recently seen in Splunk. The text inputs can handle wildcards and the date range drop-down will bound the search to returning values where the last seen date/time falls between the earliest and latest time. With this, an analyst could easily check a connection and see what data sets it came from, it if was allowed or blocked and when it was seen.

I hope this provides a greater understanding of what search-driven lookups are and how they can be used. Collecting data sets and associating by date/time in the form of first seen and last seen, generating statistical values and ranges, as well as establishing watchlists are a few ways that Splunk has used this capability and with over 25 of these searches already built into Enterprise Security, they are ready for you to take advantage of. Like any Splunk search, they can be modified and additional search-driven lookups can be created to fit your specific use case.

Thanks,

John Stoner
Federal Security Strategist
Splunk Inc.

Follow @splunkgov

Follow @splunk

↧

Visual link analysis with Splunk and Gephi

January 18, 2017, 7:33 pm

≫ Next: Enhancing Enterprise Security for Ransomware Detection

≪ Previous: Improving Visibility in Security Operations with Search-Driven Lookups

As cyber-security risks and attacks have surged in recent years, identity fraud has become all too familiar for the common, unsuspecting user. You might wonder, “why don’t we have the capabilities to eliminate these incidents of fraud completely?” The reality is that fraud is difficult to characterize as it often requires much contextual information about what was occurring before, during, and after the event of concern in order to identify if any fraudulent behavior was even occurring at all. Cyber-security analysts therefore require a host of tools to monitor and investigate fraudulent behavior; tools capable of dealing with large amounts of disparate data sets. It would be great for these security analysts to have a platform to be able to automatically monitor logs of data in real-time, to raise red flags in accordance to certain risky behavior patterns, and then to be able to investigate trends in the data for fraudulent conduct. That’s where Splunk and Gephi come in.

Gephi is an open-source graph visualization software developed in Java. One technique to investigate fraud, which has gained popularity in recent years, is link analysis. Link analysis entails visualizing all of the data of concern and the relationships between elements to identify any significant or concerning patterns – hence Gephi. Here at Splunk, we integrated Gephi 0.9.1 with Splunk by modifying some of the Gephi source code and by creating an intermediary web server to handle all of the passing of data and communication with the Splunk instance via the Splunk API. Some key features that we implemented were:

Icon visualization of data types.
Expanding and collapsing of nodes into groups by data type.
Enhancing the timeline feature to include a Splunk style bar graph.
Drilling down into nodes (calling the Splunk API and populating data on the graph).

Gephi can populate a workspace or enrich the data already contained in a workspace by pulling in properly formatted data. We implemented this by setting up two servers, one of which would act as an intermediary and determine what kinds of data a node could pull in based on it’s nodetype, and another server which contained all the scripts that interacted with a Splunk instance to run Splunk searches, pull back the results, then format it in a way Gephi could already understand.

To make all this happen, Gephi makes a GET request to the Gephi-Splunk server (GSS) containing the nodetype, which prompts the GSS to return a list of available actions for that nodetype (Note: The list is statically defined in Gephi to simplify things for the demos). Each of these actions can be used (along with information about the node) to construct another GET request which gets sent again to the GSS then forwarded to a script server to execute that action. The action is completed by running a script held on the script server, actions involving Splunk searches are completed by using Splunk oneshot searches as defined in the Splunk API (http://dev.splunk.com/view/python-sdk/SP-CAAAEE5). The script server takes in the results of the search, formats it, and forwards it to the GSS, which responds to the original request from Gephi with a formatted output that Gephi can render. The architecture is defined visually below.

The reason for the separation of servers into a “permissions” server and a script server is to make it easier to expand this project to serve multiple use cases and leverage multiple Splunk instances, while keeping organization simple and limited to a single point. In other words, resources are separated, but management is centralized.

Install by following the instructions here: https://github.com/splunk/gephi-splunk-project/tree/master

The first screenshot shows a use-case in which an analyst might have six IP addresses to be investigated. The analyst can start out with only the six IP addresses shown on the graph, and then choose to select the “drilldown” menu option to make a call to Splunk for more information. Our Gephi instance will then populate the graph with all of the data received from Splunk, creating nodes with connections if the nodes do not already exist in the visualization, and only adding connections if the nodes do already exist in the visualization. The analyst can also choose to “playback” the data via the timeline to see how events were occurring through time.

Shown in the second screenshot is a use case in which an analyst might have a large dataset but no clues of where to start investigating. Importing the data into Gephi would allow for recognition of clusters of correlated events (shown as large red nodes in the screenshot). The timeline would also assist in seeing how these resources were being accessed through time.

In addition to anti-fraud use cases, the Gephi + Splunk integration can be applied to any datasets that have cause and effect relationships. The example we provide is of IP address, username, session ID, and user agent data. In order to use other datasets, you will have to change some of the code to display the correct icons and to drilldown into the nodes correctly (see “Altering Data Sources” section of the github docs).

Disclaimer: This integration is provided “as is” and should not be expected to be supported. The application has not been extensively tested with large data sets, so use with caution. Depending on the searches being run in Splunk, and the size of the underlying data set, searches may take a while to complete. The purpose of this application was to provide a proof of concept of using the Splunk API with an open-source graph visualization tool. At the moment, there are no official plans to integrate a graph visualization into the Splunk native web framework. If you intend on adapting this integration for your own uses, please be aware that it will require knowledge and use of Java and Python.

More information about Gephi can be found at their website: https://gephi.org/ and on their github repository: https://github.com/gephi/gephi

If you have any comments, questions, or feedback about this project, please send all inquiries to Joe Goldberg at jgoldberg@splunk.com

Special thanks to the Intern Team (Phillip Tow, Nicolas Stone, and Yue Kang) for making all this possible!

—
Gleb Esman,
Sr. Product Manager, Anti-Fraud

↧

Enhancing Enterprise Security for Ransomware Detection

January 24, 2017, 12:03 pm

≫ Next: Carrot vs Stick: A Case for Incentive Driven User Access

≪ Previous: Visual link analysis with Splunk and Gephi

Ransomware isn’t going away

Ransomware is a profitable business model for cyber criminals with 2016 payments closed at the billon dollar mark. According to a recent survey by IBM, nearly 70% of executives hit by ransomware have paid to get their data back. Those survey results do not include smaller organizations and consumers who are also paying to get their data back.

With the threat from ransomware growing, aside from prevention, detection is key to removing compromised devices from the network. Unfortunately, signature based detection alone will not catch everything, instead using it in combination with hunting techniques in Splunk can enhance your security posture. In this blog, we will walk through adding the free ransomware intelligence feed from abuse.ch to Splunk Enterprise Security.

Requirements

Internet Access for Splunk Enterprise Security Search Head
Splunk Enterprise Security
Knowledge of updating Splunk Configurations

Configuration

There are two paths forward, which will depend on the level of access you have to the enterprise security search head. Command line is the simplest option since you can copy paste the configuration from this page, while using the GUI will require you to manually input the data via Splunkweb.

The configuration file walthrough requires you to create a new inputs.conf file or add to an existing one in the SA-ThreatIntelligence app’s local directory.

$ vi /opt/splunk/etc/SA-ThreatIntelligence/local/inputs.conf

inputs.conf

[threatlist://ransomware_ip_blocklist] 
delim_regex = : 
description = abuse.ch Ransomware Blocklist 
disabled = false 
fields = ip:$1,description:Ransomware_ip_blocklist 
type = threatlist 
url = https://ransomwaretracker.abuse.ch/downloads/RW_IPBL.txt

Once completed, restart the splunkd service.

$ /opt/splunk/bin/splunk restart

GUI Walkthrough:

Locate the Enterprise Security Configuration Page.

From the Enterprise Security Configuration page, select Threat Intelligence Downloads.

Click new, and fill in the text fields on the resulting page with the same information:
threatlist

threatlist_edit

Name: ransomware_ip_blocklist
Type: threatlist
Description: abuse.ch Ransomware Blocklist
URL: https://ransomwaretracker.abuse.ch/downloads/RW_IPBL.txt
Delimiting regular expression: :
Fields: ip:$1,description:Ransomware_ip_blocklist

Once configured, Enterprise Security will download the threat intelligence and begin alerting on any events found which match the threatlist. These can be reviewed and triaged as part of your workflow in the notable events page.

↧

Carrot vs Stick: A Case for Incentive Driven User Access

January 25, 2017, 2:19 pm

≫ Next: Analyzing BotNets with Suricata & Machine Learning

≪ Previous: Enhancing Enterprise Security for Ransomware Detection

Houston, We’ve Got A Problem

Out of the box, a Splunk user has the capabilities to do some powerful stuff – but as Uncle Ben tells us, “with great power comes great responsibility“.

Scenario: Bort is a well intentioned user at Gift Store, Inc. (an ecommerce known for its novelty stores from the 1990’s). Soon after getting his Splunk access, Bort starts throwing down some awesome real time searches and learning some sick new insights from his data. Bort is finding his searches in Splunk so valuable that he decides to set an alert from one of the real time searches to email him when the eStore runs out of “Bart” novelty license plates. Then, to impress his teammates (who are obviously also all named Bort) he shares a dashboard he built that contains eight real time searches showing different sales metrics and system stabilities. Unfortunately for Bort, as soon as his co-workers open the link to view the dashboard, the entire Splunk platform comes to a crawl.

Obviously, Bort is freaking out, and even more, without the real time alert he set up, the eStore might run out of it’s inventory of “Bart” novelty licence plates right when a customer wants one! Fortunately, Bort’s Splunk Admin, John F. (wait, that’s too obvious, let’s go with J. Frink), is an excellent admin and already remedied the situation but at the cost of disabling Bort’s alert and dashboard.

Frink notices Bort’s passion for Splunk and teaches him all sorts of Splunk Fu Ninja Nunchuck Skillz. Thanks to his exploring the data in Splunk, Bort understands the eStore’s business so well that he gets promoted and even speaks a Splunk’s .conf every year. A real dream ending. Just perfect.

But what if it went another way? What if Bort never got a chance to create the real time alert and dashboard? Would Bort have ever come across Frink? Would Frink have ever taught Bort the Splunk Fu he needed to be successful? Even worse, I can’t even imagine a .conf without Bort. That’s just the worst.

But maybe it’s not so bad…

Scooby Doo Ending: We all know Bort. Soon after getting his Splunk access, Bort starts throwing down some awesome historical searches and learning some sick new insights from his data. Bort is finding his searches in Splunk so valuable that he decides to save them. While saving his work, Bort notices that there’s an option to turn his search into an alert – but it’s disabled! Bort decided to reach out to his admin to learn more.

Fortunately, Bort’s Splunk Admin, Frink, is an excellent admin and explains to Bort that while he is happy to convert Bort’s search into an alert, Bort can also gain the capability to do this on his own in the future. Bort is intrigued and learns from Frink that with completion of every Splunk Education course, Frink can grant Bort more capabilities in Splunk.

Bort embraces Frink’s offer and takes it a few steps further, eventually landing his Power User Certification and gaining all sorts of juicy Splunk capabilities. Bort’s proficiencies with Splunk inspires his peers (still also all named Bort) who do the same. Eventually, the entire team is so successful with the insights they gain from Splunk that it becomes a competitive advantage for the company. As a result, the company becomes ridiculously profitable and buys the local Nuclear Power Plant for diversification. Bort’s contributions makes him wildly successful and he retires early to raise horses (although he names each of them Buttercup…for some reason). Don’t worry, in this story Bort also speaks at .conf every year as well.

Oh, and since this was the Scooby Doo Ending, we should point out that Bort eventually found who had been stealing all of the “Bart” novelty license plates. It was Old Man Terwilliger who was dressed up as an intern. Very strange dude. Very strange.

Ok, Enough With The Story

Alright, alright. The point is that instead of reacting to bad behavior, Frink could create incentives that inspire users to be motivated to learn more on their own…and that I really like The Simpsons…but mostly that first thing about incentives.

Out-of-the-box, Splunk empowers users with a ton of access, which most deployments keep as-is. But, as someone who has been the admin of an maxed-0ut Splunk deployment, I can tell you that less is more. By flipping the switch and exploring options available in a Splunk role’s definition, I was able to create a viral solution where interested users earned their way to being power users and further distributed Splunk knowledge to their peers who in turn became Power Users, and so on and so forth.

Details and “Gotchas”

Review the different aspects of a role. Do you want a new user to be limited on how long their search can run for? How far back in history it can search? What data sets they have access to? Ability to consume Splunk resources when they are not directly logged in?

While the documentation elaborates on the ways of controlling those types of things, consider how you can objectively measure a user’s proficiency before promoting their access. You probably don’t want to be in the political battle of subjectively determining when one user appears “qualified” or not. Instead, an objective measure can be used. There’s a ton of options here, but perhaps you require them to show their Splunk Education completion certificate, or a Power User certification, or even a level of Karma points on answers.splunk.com. Any of those individual or together would work – think of it like a game: the user has to earn enough “experience” to level up.

Consider making a Splunk role for data access that is separate from the capabilities. That way you can create various permutations of roles with blends of data access and differing capabilities. This was a brilliant contribution from jnguyen413.

Of course, if you’ve not being doing this already, you’ll want to make sure you have your management’s support so they can back you up when folks complain that they suddenly have less access than they did before you implemented this approach.

Return on Investment

The bottom line is that by embracing such an Incentive Driven Access approach, you can motivate your users to learn more, get more value from the Splunk investment, and run a more efficient Splunk infrastructure (less poor performing searches).

Thanks!

Thanks again to jnguyen413 other customers who’ve tried this and shared their experiences. Internally, thank you to fellow Splunkers Aly, Tammy, Dave C. This idea could not have gotten fleshed out without everyone’s contribution.

↧

Analyzing BotNets with Suricata & Machine Learning

January 30, 2017, 10:23 pm

≫ Next: Implementation of Incentive Driven User Access

≪ Previous: Carrot vs Stick: A Case for Incentive Driven User Access

Since the official rollout at the year’s. conf of the Machine Learning Toolkit(MLTK), Splunkers have been pursing some interesting use cases ranging from IT operations, planning, security and business analytics. Those use cases barely scratch the surface of what is possible with machine learning and Splunk. As an example, I will use the machine learning toolkit and data collected from Suricata to analyze botnet populations. This population analysis will be used to create a model for predicting the Mirai botnet based on network features.

Suricata

Suricata is an open source threat detection engine, which can be run in passive mode for intrusion detection or inline for intrusion prevention. My lab environment is configured for intrusion detection, meaning Suricata will not make any attempt to prevent an intruder from accessing my system. This is a “good” thing because the behavioral signature of Mirai (and variants) use specific usernames for IoT devices found in the scanner.c module in the sshd logs & telnet sessions of the server it attempts to infect.

Analysis

The analysis largely builds upon the previous blog post (Analyzing the Mirai Botnet with Splunk), which correlated the failed logins of specific usernames and ip addresses. This threatlist of suspected Mirai ip addresses can be analyzed for various features such as geography, IANA registration, frequency, etc…

Combining this threatlist with our passive intrusion detection netflow data creates an enriched dataset for building a model. Adding, contextual and detailed information about each access attempt at the packet level provides insights into the activity attempted by that IP or block of IPs during a 24 hour window. As an example, we can determine which tcp flags were present in the packets both client side & server side in each flow transaction and begin grouping similar events together. We can also create a ratio of packets_in v. packets_out and classify these flows into various producer consumer ratio(PCR) categories.

MLTK

The MLTK is handy because of the many assistants which ship with the tool, you don’t need to know the exact SPL syntax to begin making use of it. Using the clustering assistant, I can attempt to discern different botnet populations based on the features present in my dataset. In the below example, I have selected 50k random Suricata flow events, where the dest_port is 22. I have picked features which have some relation to each other, but are enriched by the PCR metric. I have created a label of isMirai with possible values of 0 | 1, depending on the IP address associated with that flow event. I have opted for Kmeans clustering with a value of k=5.

kmeans_clusters_botnet_activity

Interestingly, a clear visual pattern emerges with cluster_4. It is clearly an outlier compared to the rest of the population, but is there anything special about it? From an isMirai 0 | 1, perspective there is a mixture of both 1’s and 0’s. The packet_pcr_range, is 3:1 Import, with varying ratios, which seems to be the only common feature of cluster_4.

kmeans_apply_model_filter_cluster4

Using a model for prediction

MLTK isn’t intended to create models for the sake of creating models, it also allows you to operationalize those models for predicting based on features found in the model, one such feature we get from kmeans is the cluster_distance. This number describes the distance an event is from the centroid.

Using the prediction assistant, the Kmeans model can be loaded in search before selecting features from the dropdown to use for prediction. We can then select the features we wish to use for prediction: cluster_distance, packet_pcr_range, packet_ratio, and packet_total. The prediction assistant also gives you the ability to adjust the specific algorithm to use for prediction, I have opted for Random Forest.

Next Steps

The model appears to be very good at predicting 0 (not Mirai), while it is reasonably good at predicting 1 (89.4%). This is an improvement over Suricata, which did not detect Mirai with the emerging threats ruleset. This may imply that there is an indicator of compromise for the Mirai botnet at the packet level. Proof of this requires further investigation, and independent validation to understand why the model can predict Mirai so effectively to eliminate bias or mistakes. Collaboration with others who have also gathered traffic from botnets is a great way to validate the model against a data set it has not seen before to further validate. If an indicator of compromise can be discerned from this analysis it could be converted into an IDS signature for future detection of Mirai infection attempts.

using_kmeans_apply_use_random_forest_accuracy

↧

Implementation of Incentive Driven User Access

February 1, 2017, 1:16 pm

≫ Next: How to Stop Playing the Blame Game in Your IT Department

≪ Previous: Analyzing BotNets with Suricata & Machine Learning

Out of the box, a Splunk user has the capabilities to do some powerful stuff – but as Uncle Ben tells us, “with great power comes great responsibility“. In my prior post, we reviewed the scenario and purpose behind Incentive Driven User Access. In a this post, we’ll dive into the conf files and explore what settings are worth reviewing to implement such a solution.

Authentorizationing….?!

Let’s conceptually differentiate the settings for authorization and those of authentication. The names are so darn similar that without understanding their differences, you’re bound to mix them up.

When you first navigate to your Splunk deployment, you need to prove that you’re a valid user. To do this, Splunk will need to verify that the username and password you provide are valid, or authentic. Splunk does this by checking your credentials against a user repository (local, ldap, etc…). This process, of checking if you’re authentic, is an authentication process. As such, the details that inform Splunk how and where to check the credentials are stored in a configuration file, authentication.conf.

After Splunk knows it’s ok to let you in, it needs to know what things you can do. Are you allowed to run searches? View apps? Explore data? For all of these activities, and more, Splunk must determine if you’re authorized or not. As such, the details that inform Splunk what you’re allowed (or not allowed) to do and see are stored in a configuration file, authorize.conf.

authentication.conf = user validation definition
authorization.conf = feature and data access definition

Since our official Splunk documentation describes step by step ways of using these files, consider the above description an easy access map in case you get lost through this reading.

All Your Authorize Belongs To Us

With that all squared away, let’s jump into some specific authorize.conf attributes that might be helpful here. Remember that the focus is on what items can be used as rewards for demonstrating Splunk proficiency. In other words, focus on how the absence or limitation of a given feature can be used to incentivize a user into learning more Splunk and demonstrating the proficiencies appropriate for using that feature responsibly.

Capabilities

When it comes to granting capabilities, I delineate what features to give to uneducated users as compared to educated power users with this motive:

Will this feature impact the Splunk deployment when the user is NOT logged in?

For example, if an uneducated user is able to create a mal-formed realtime search, and schedule it to run all time – that’s very impacting to the environment and I want to limit or mitigate that from occurring.

With that in mind, here are some capabilities worth tweaking, broken out by reason. This is not a review of every capability since out-of-the-box, Splunk already provides guidance on many (example: delete_by_keyword). Refer to the List of available capabilities for explanations of each capability. (This writing is current as of 6.5.2.)

Splunkers With Benefits

As you read these, keep in mind that when you disable these capabilities, users will approach their Splunk experts (hey, that’s you!) for help or even specific access to those features. This becomes a wonderful opportunity for the Splunk expert to collaborate with the user on best practices, thereby improving their Splunk experience and Splunk education. As their skill set builds, they’ll eventually demonstrate enough proficiency for you to promote them to a role with some of these capabilities. In turn, they’ll become the Splunk expert for their team, answering usage questions so you don’t have to. It’s like a pyramid scheme…or viral…or something – the point is, you get to refocus on your work because you’ve empowered your user base.

Accelerations: accelerate_datamodel, accelerate_search, output_file

Acceleration of Reports and Data Models are amazing. They make an otherwise sluggish search return in superman speed…BUT, at a cost of compute and storage. That’s right, when such an object is configured for acceleration, searches run in the background to generate those acceleration details, and then those details are stashed on disk for you. And of course, those searches count against your concurrent search capacity, both by user, and the overall system…until someone disables the acceleration, which in my experiences, is never the user cleaning up stuff they no longer use (as if).

Similarly, disk usage can quickly disappear if users are creating (and never destroying) uniquely named (like using the date stamp) lookup files with every run of a search.

Scheduled Searches: schedule_search, schedule_rtsearch

If you disable the scheduled searches, your users can’t implement alerts. But that also means they can’t cause Splunk to get sluggish running their mal-formed search when they’re not around to appreciate how bad it is. Even more, if you investigate some of the alerts you have in your environment, you may find that many are e-mail alerts that go to folks that aren’t around anymore, or others who’ve set up mail rules to delete the alerts.

Real Time Searches: rtsearch

In reality, a realtime search isn’t necessary. I’ve yet to come across a search that couldn’t instead just be rerun as frequently as needed. This is important to understand given the impact realtime searches have on the environment – a ton! A realtime search is a persistent search process on the Search Head as well as any Indexers. In contrast, a non-realtime search only persists until the search completes and therefore returns resources to the system for other users. By disabling realtime search, you mitigate the [eventual] situation where a group of users all load the same dashboard where every panel is a realtime search, thereby consuming all resources in the Splunk environment and preventing other usage. Plus, as of 6.5.0, any panel in Splunk can be set with a periodic refresh (in the Edit Search pane) to automatically rerun the search for the latest data.

Search Limits: srchJobsQuota, srchMaxTime, srchTimeWin, srchDiskQuota, rtSrchJobsQuota

These may not be necessary if you’ve embraced changes with the other capabilities highlighted here. On the other hand, they can be super helpful to limit the impact of poor Splunk behavior by putting a cap on the quantity of concurrent searches, how long those searches might go on for, how far of a time window of data can be searched, and/or how much storage is used to retain the results of the search.

Be Practical

In the end, see what works best for your organization and your company culture. If you modify these capabilities, remember that initially you’ll do more “teaching” to your users, but only until they learn enough from you and are promoted to a role with more capabilities and features – then, the student becomes the teacher as they empower their peers directly.

↧

How to Stop Playing the Blame Game in Your IT Department

February 3, 2017, 1:32 pm

≫ Next: How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think)

≪ Previous: Implementation of Incentive Driven User Access

It’s a familiar scenario: a problem is discovered, and a Service Desk Team gets a help ticket. The Service Desk Team tells Operations that there’s an outage. The Operations Team suggests that the problem could be the result of bad code and passes the issue to Dev. The Dev Team responds that it doesn’t have the tools to solve the problem and asks for logs from production systems.

Suddenly the situation is escalated.

A war room’s assembled. Here you’ll often find a DBA, Docker specialist, network specialist, release manager, site reliability engineer and a developer, sometimes calling in remotely from separate locations. The pressure’s on for everyone to prove their innocence and confirm individual components of the infrastructure are ok. If everyone survives this step, people start blaming each other while the clock continues to tick. And you haven’t even begun solving the problem yet.

This is the Blame Game in a nutshell. And unlike most games, it’s one that you don’t want to play. The process of problem solving and getting teams together is time-consuming, stressful, and inefficient. And in the meantime, your reputation and revenue are at stake—you begin losing users quickly if they can’t access the service and there’s no indication as to when the issue will be resolved.

The Blame Game is similar to the game Sorry!, except there’s no clear winner and no one apologizes.

Good news: It’s possible to stop playing the Blame Game and start recovering your sanity. The key is for everyone to quickly align around your organization’s data. Additionally, you must:

Be able to trust the data that you and other teams are using
Have easy, real-time access to data across silos
Empower your teams by giving them access to the same data

Chances are you’re using a bunch of tools to do this already. That’s a good first step. But the Blame Game only ends when you get full visibility from all of the tools and data sources within your organization. This is achieved with a data platform.

Gaining visibility into all of your tools and data from a single platform is key because it helps you quantify issues. It also doesn’t leave any data “unattended”—in a world where all data is relevant, your bases are covered. Perhaps mostly importantly, it enables everyone to be on the same page instantly—saving you time and stress over whose data is right. When everyone can see the same error that’s from a particular process, you can all agree that’s where you need to focus your investigation. You’ve successfully replaced stress with data and can work in parallel to find and solve the issue—making your users (and bottom line) happy.

Nothing ends the Blame Game faster than trusted data.

Learn more about using a platform approach to monitor and troubleshoot applications and the infrastructure that supports them.

Bill Emmett
Director, Solutions Marketing, Application and Mobile Intelligence
Splunk Inc.

Follow @billemmett000

Follow @splunk

↧

How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think)

February 3, 2017, 3:28 pm

≫ Next: Your Splunk Workspace

≪ Previous: How to Stop Playing the Blame Game in Your IT Department

At AWS re:Invent 2016, Splunk released several AWS Lambda blueprints to help you stream logs, events and alerts from more than 15 AWS services into Splunk to gain enhanced critical security and operational insights into your AWS infrastructure & applications. In this blog post, we’ll walk you through step-by-step how to use one of these AWS Lambda blueprints, the Lambda blueprint for CloudWatch Logs, to stream AWS CloudWatch Logs via AWS Lambda and into Splunk for near real-time analysis and visualization as depicted in the diagram below. In the following example, we are interested in streaming VPC Flow logs which are stored in CloudWatch Logs. VPC Flow logs capture information about all the IP traffic going to and from network interfaces, and is therefore instrumental for security analysis and troubleshooting. With that said, the following mechanism applies to any logs stored in CloudWatch Logs.

Here’s the outline of this guide:

First, a note on pull vs push ingestion methods

Splunk supports numerous ways to get data in, from monitoring local files or streaming wire data, to pulling data from remote 3rd-party APIs, to receiving data over syslog, tcp/udp, or http.

One example of pulling data from remote sources is the widely popular Splunk Add-on for AWS which reliably collects data from various AWS services.
One example of pushing data is via AWS Lambda function which is used to stream events over HTTPS to Splunk HTTP Event Collector (HEC).

These two pull and push models apply to different use cases and have different considerations. This post pertains to the push model which is particularly applicable for microservice architectures and event-driven computing such as AWS Lambda. Since there are no dedicated pollers to manage and orchestrate, the ‘push’ model generally offers the following benefits:

Lower operational complexity & costs
Easier to scale
Low friction
Low latency

Step-by-Step walkthrough to stream AWS CloudWatch Logs

The following instructions use VPC Flow Logs as an example. If you would like to stream any other CloudWatch Logs besides VPC Flow Logs, you can skip to step 2, and simply rename your resources such as Lambda function differently to match your use case.

1. Configure VPC Flow logs

Skip to step 2 if have already enabled Flow Logs on your VPC(s).

1a. Create a Flow Logs role to give permissions to VPC Flow Logs service to publish logs into CloudWatch Logs. Go ahead and create a new IAM role with the following IAM policy attached:

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogGroups", "logs:DescribeLogStreams" ], "Effect": "Allow", "Resource": "*" } ] }

Take note of the role name, say vpcFlowLogsRole, as you’ll need it in subsequent step.
You’ll also need to set a trust relationship on this role to allow the flow logs service to assume this role. Click on ‘Edit Trust Relationship’ under ‘Trust Relationships’ tab of the newly created role, delete any existing policy then paste the following:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "vpc-flow-logs.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

1b. Enable Flow Logs on your VPCs() from the AWS VPC Console as described in AWS VPC docs. For the rest of this guide, let’s say you specified vpcFlowLogs as the destination CloudWatch Logs group, which we’ll reference in a subsequent step. Within a few minutes, you should start seeing flow logs records in CloudWatch Logs console under that log group.

2. Configure Splunk input

Now that you have flow logs being recorded, we’ll start setting up the data pipeline from the end, that is Splunk, working our way backwards.

2a. Install Splunk Add-on for AWS. Note that since we’ll be using Splunk HEC, we will *not* be relying on any modular input from the Add-on to collect from CloudWatch Logs or VPC Flow Logs. However, we will leverage the data parsing logic (i.e. sourcetypes) that already exist in the Add-on to automatically parse the VPC Flow logs records and extract the fields.

2b. Create an HEC token from Splunk Enterprise. Refer to Splunk HEC docs for detailed instructions.
When configuring the input settings, make sure to specify “aws:cloudwatchlogs:vpcflow” as sourcetype. This is important to enable automatic fields extractions. Make sure to take note of your new HEC token value.
Note: For Splunk Cloud deployments, HEC must be enabled by Splunk Support.

Here’s how the data input settings would look like:

1. Configure Splunk HEC

3. Configure Lambda function

The pipeline stage prior to Splunk HEC is AWS Lambda. It will be execute by CloudWatch Logs whenever there are logs in a group, and stream these records to Splunk. Luckily, there’s already a Lambda blueprint published by Splunk for exactly that purpose.

3a. Create Lambda function using the “CloudWatch Logs to Splunk” Lambda blueprint from AWS console by clicking here. Alternatively, you can navigate to AWS Lambda console, click ‘Create a Lambda function’, then search for ‘splunk’ under ‘Select blueprint’. At that point you can select splunk-cloudwatch-logs-processor Lambda blueprint.

3b. Configure Lambda function trigger. Select ‘CloudWatch Logs’ as trigger if it’s not already selected. Then specify vpcFlowLogs as the log group. Enter a name for ‘Filter Name’, say vpcFlowLogsFilter. You can optionally enter a value for ‘Filter pattern’ if you want to restrict what gets delivered to Lambda. Before clicking ‘Next’, make sure ‘Enable trigger’ is checked. This is an example of how this form would look like:

Add trigger for Lambda function

This is also known as a CloudWatch Logs subscription filter which effectively creates a real-time feed of logs events from the chosen log group, in this case vpcFlowLogs.

Note that, when adding this Lambda trigger from the AWS Console, Lambda will add the required permissions for CloudWatch Logs service to invoke this particular Lambda function.

3c. Configure Lambda function. The function already implements the necessary logic to process the CloudWatch Logs data, including decoding it and decompressing it, and breaking the events before sending to Splunk HEC. You’ll need to set the following required parameters:

At the top: specify your Lambda function name, say vpcFlowLogsProcessor
Under function code: fill in Splunk settings under Environments variables as shown in screenshot below, where:
- SPLUNK_HEC_URL is the Splunk URL for HEC endpoint, e.g https://:8088/services/collector where host is your Splunk fully qualified domain name or IP address. Note that default port for HEC is 8088
- SPLUNK_HEC_TOKEN is the token value from HEC input you created earlier
Under function handler and role: in Role, select “Choose an existing role” and then for Existing role, select “lambda_basic_execution” which gives Lambda function minimum required permissions for writing its own logs to CloudWatch Logs.

Configure Lambda env vars and role

Note that AWS Lambda encrypts the environment variables at rest using a Lambda service key, by default. Environments variables are decrypted automatically by AWS Lambda when the function is invoked. While not required for the purpose of this set up, you also have the option to encrypt the environment variables before deploying the Lambda function. For more information, see Create a Lambda function using Environment Variables to Store Sensitive Information.

At this point, you can click ‘Next’ after reviewing your Lambda configuration which should look as follows:

Review Lambda function settings

After few minutes, you should start seeing events in Splunk Enterprise.
You can search by sourcetype

sourcetype="aws:cloudwatchlogs:vpcflow"

Or by source which is set by Lambda function to a default value of “lambda:<functionName>”:

source="lambda:vpcFlowLogsProcessor"

Search flow logs in Splunk

Bonus traffic & security dashboards!

By using Lambda-based data ingestion, not only you can benefit from the simple setup above, but you can also leverage the advanced dashboards & sophisticated traffic & security analysis of VPC flow logs that come with Splunk App for AWS. If you set the correct sourcetype, for example “aws:cloudwatchlogs:vpcflow” in the case of VPC Flow logs as shown above, then you should see relevant dashboards populate automatically. Once installed, navigate to Splunk App for AWS, and view “VPC Flow Logs: Traffic Analysis” dashboard under Traffic & Access dropdown menu and “VPC Flow Logs: Security Analysis” dashboard under Security dropdown menu:

VPC Traffic Analysis

VPC Security Analysis

Troubleshooting

If you’re not seeing events in Splunk, you can troubleshoot this one pipeline stage at a time following the data flow direction:

Ensure VPC flow logs are captured in the CloudWatch log group you specified. If you still don’t see any logs, here are possible causes:
- It can take several minutes to collect and publish flow logs to CloudWatch logs, once a flow log is first created.
- The log group in CloudWatch Logs is only created when traffic is recorded. Make sure there’s traffic on the network interfaces of the selected VPC(s).
- VPC flow logs service doesn’t have adequate permissions. Review the IAM role & policy as detailed in step 1 above.
Ensure Lambda function is being triggered with CloudWatch Logs events. First, ensure that the trigger is enabled by going to AWS Lambda Console -> Functions -> (Your function name), and selecting ‘Triggers’ tab. When enabled, CloudWatch Logs trigger should show ‘disable’ button. At this point, the best place to troubleshoot Lambda function is from its logs captured in CloudWatch Logs. Select the ‘Monitoring’ tab, and click on ‘View logs in CloudWatch’. By default, the Lambda function blueprint logs the decoded data batch from CloudWatch Logs, then the response from Splunk along with number of processed log events. If you see request errors, here are some common causes:
- Splunk HEC port is behind firewall
- Splunk HEC token is invalid, which would return unauthorized status code

Conclusion

We’ve shown you how you can configure a low-overhead & highly scalable data pipeline to stream your valuable CloudWatch Logs into your existing Splunk Enterprise by leveraging AWS Lambda & Splunk HEC together. That data pipeline enables near real-time processing & analysis of data by Splunk Enterprise.

As an example of CloudWatch Logs, we used VPC Flow logs that are stored in CloudWatch. That data is critical to understand the traffic in a VPC and any security considerations. However, note that VPC flow logs are themselves captured every few minutes, so the analysis of VPC Flow logs can only be done in batches.

Click here to get started with Lambda blueprints for Splunk directly from your AWS Console. We look forward to see how you’ll leverage the power of AWS Lambda & Splunk HEC to build your own serverless architectures and data pipelines. Leave us a note below with any feedback or comment, or on Splunk Answers for any question you may have.

↧

Your Splunk Workspace

February 7, 2017, 10:00 am

≫ Next: Everything You Need to Know About Splunk ITSI

≪ Previous: How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think)

What is a Workspace? In my mind, it’s a well defined area within which one can construct and create without impact to and by externalities.

Implemented in Splunk, it’s a user logging into Splunk, getting escorted to content for their domain, and not being distracted or impacted by the activities of others.

As you might have guessed, this concept IS implemented already in Splunk by means of visible “apps.” Unfortunately, many of us don’t embrace apps in this fashion – and for good reason! We often associate apps with the rich contributions available on Splunkbase and rarely consider the simplest of apps, as a Workspace for user groups.

Let’s change that today. Let’s reset how we think about apps and the entire Splunk UI experience, for that matter. For now on, let’s refer to any app visible in the UI as a Workspace. Seems too subtle to make a difference? Watch as it changes your entire perspective on the Splunk user experience.

Implementation

Out-of-the-box, Splunk comes with the Launcher and the Search & Reporting Workspaces Splunk Web . This is awesome, flexible, and customizable for our technical users, but probably not the most effective starting point for a Splunk n00b. Instead, let’s configure Splunk to provide a Web-App based experience such that users are sent right to their Workspaces and not distracted by other items deployed to the Splunk environment.

Screen Shot 2017-02-07 at 12.18.12 PM First thing’s first. We need an app to become a Workspace, so let’s create an app. To do this, navigate to the Manage Apps view (either by selecting the gear icon (if in Launcher) or the ‘Apps’ dropdown (if viewing an app) from the upper-left corner of Splunk Web.
Select the “Create app” button. If it’s grayed out then you either don’t have permissions or are using a Search Head Cluster – in either case, ask your Admin for help. On the resulting form, you should fill out the fields according to the user’s role for which you want to make the Workspace but make sure to leave the Template dropdown selected to ‘barebones’. Don’t worry – you can edit the app later if you change your mind. On the right is an example if for an Operations team.

Next, navigate to the associated role within Splunk Web and set the ‘Default app’ to the newly created one. If the team already has a commonly used dashboard, go ahead and set it as default in the navigation so users are presented with it instead of the basic search page. If no such dashboard exists, I recommend creating a “Welcome” page and using that. Don’t forget to move over other config that might have already been created in other locations.

Congratulations! You now have a working Workspace! Login as a user of that role and see how they get to skip the Launcher and are sent directly to their Workspace and default dashboard.

Less is More

Inevitably, users will grow curious and accidentally get lost after navigating into other Workspaces. To mitigate this, I suggest making the other Workspaces invisible, thereby limiting a user group (role) to only their Workspace and not messing with other team’s.

To do this, you need only edit and remove the read permissions for the unrelated groups of a given app. In other words, the Operations app will have read permissions for the Operations role but no other roles. The result is that no other group knows there is an Operations app, let alone accidentally start messing with their work.

Additionally, I recommend removing visibility of the Search & Reporting app. I know that might sound crazy but it eliminates yet another place users might stumble to without impacting functionality. To do so, select the “No” radio button for the Visible attribute of the Search & Reporting app. To validate all functionality still works, you can navigate to the ‘search’ endpoint of your Workspace and see how searching works as expected.

Screen Shot 2017-02-07 at 12.42.21 PM As you remove permissions for other Workspaces, you’ll notice that the Splunk user experience is simplified. Selecting the ‘Apps’ drop down on the upper left has a lot less clutter and distractions. Just keep in mind the different approaches here: by removing permissions, a user could never know the app exists, whereas by making an app not visible, the app and it’s artifacts are still accessible, just hidden from direct navigation.

N00bs are Powerful

For those of you that are hesitant about this approach, just remember that your n00bs are not incompetent. In fact, I’d argue that they are the most important users of your environment because they get the most value relative to their effort since most of them consume insights from dashboards and other things already created.

Implementing a Workspace will make their experience more effective by sending them directly to what they need without distraction. It’s also worth noting that you should trust that over time, some of them will grow curious and dig deeper into Splunk thereby increasing their effectiveness and value from Splunk. As an admin, I was always impressed to uncover non-technical users that wrote their own searches by reverse engineering a panel they curiously clicked into.

The bottom line is the Workspace provides containment for work without limiting their the functionality of Splunk. A Workspace becomes a domain for a user group to create and share Splunk insights without the distraction and clutter from what are otherwise unrelated other groups.

If you’ve implemented this, then congratulations on your cleaner Splunk environment! Happy Splunking!

↧

Everything You Need to Know About Splunk ITSI

February 7, 2017, 11:23 am

≫ Next: Splunk AWS Quick Start: Deploy Your AWS Splunk Environment In Minutes

≪ Previous: Your Splunk Workspace

ITSI_Point With the latest version of Splunk IT Service Intelligence (ITSI), you can apply machine learning and advanced analytics to:

Simplify operations with machine learning
Prioritize problem resolution with event analytics
Align IT with the business with powerful real-time service-level insights

So how do you get started?

Learn More About Splunk ITSI’s Benefits and Features

The Splunk ITSI webpage provides a 1-stop shop for Splunk ITSI content
The Splunk ITSI product brief provides an overview to key benefits of the solution
Blog on what’s new in Splunk ITSI

Watch this 2-minute overview of Splunk ITSI:

Getting ready for a deployment? For a closer look at Splunk ITSI’s capabilities, check out these resources.

The Splunk ITSI tech brief discusses key concepts needed for IT service intelligence and helps you get started with a deployment
The Splunk ITSI modules tech brief explains how to get up and running with a collection of useful metrics, entities, service templates and detailed dashboards

See What Other Splunk Ninjas Are Doing With Splunk ITSI

Cox Automotive

Cox Automotive rapidly identifies incidents, minimizes disruptions and improves service reliability and the user experience. The result? The company has reduced its auction incidents by 90 percent.

DeKalb County School District

DeKalb County School District reduced mean-time-to-investigate and repair from days to minutes.

Surrey Satellite

SSTL gained overarching insights to improve service availability, reliability and security.

Anaplan

Anaplan proactively improves customer experience and supports and secures operations 24/7 with service-level insights.

Get a Deeper Dive

Embracing The Strategic Opportunity of IT: This white paper explains how to make IT systems smarter to align with business objectives.

Experience the Power of Splunk ITSI for Yourself: Get 7 days of access to a pre-populated environment to begin exploring the power of Splunk ITSI right away.

Technical Documentation: Learn more about Splunk ITSI concepts and features, installation instructions in our Installation and Configuration Manual. For detailed user instructions, see the Splunk ITSI User Manual.

Splunk-Sponsored 1-Day Workshops: Splunk offers a 1-day Splunk-funded on-site engagement, guided by Splunk ITSI experts. In this workshop, we’ll help you model a measurable implementation of business and technical Contact us for more details.

Have more questions?

Send us your queries and we’ll respond to you right away.

↧

Splunk AWS Quick Start: Deploy Your AWS Splunk Environment In Minutes

February 9, 2017, 2:42 pm

≫ Next: Using machine learning for anomaly detection research

≪ Previous: Everything You Need to Know About Splunk ITSI

If I told you that a fully operational Splunk Enterprise deployment in AWS could be yours in a matter of minutes, would you be interested? Sit down, relax, and I’ll tell you all you need to know to have a Splunk Enterprise deployment ready to index; fully configured with indexer replication and search head clustering in less than an hour.

Late last year, I wrote a deployment guide for Splunk Enterprise on AWS that explains your options when deploying Splunk Enterprise in AWS. Today, it gets better: I’m happy to report that document has been expanded upon, and Splunk has released an official Splunk Enterprise AWS Quick Start.

If you’re not familiar with AWS Quick Start, the underlying principle is to help the end user rapidly deploy reference implementations of software solutions on AWS. In addition to the updated deployment doc, the Splunk Enterprise Quick Start includes a CloudFormation template. (CloudFormation is an AWS service that provides a predictable, automated way to create and manage a collection of related AWS resources.) The template will ask a few of questions about how you would like to deploy Splunk; which instance type, how many indexers, the replication factor for your indexer cluster, etc. There are options to provision in a new VPC or an existing VPC, and appropriate subnet configurations for both. The following screenshot shows most of the questions asked when deploying to a new VPC.

Once you’ve answered each of the questions, CloudFormation takes over and provisions your requested Splunk Enterprise deployment. (Depending on the options you’ve chosen, the launch time varies between about 10-30 minutes.) Cloudformation creates everything you need from your security groups and VPC ACLs, to configuring the Splunk indexing cluster and search head(s), with optional support for a search head cluster. The template even configures distributed search as well as creating a license master and cluster master. It has everything you’ll need to get started with your Splunk Enterprise deployment.

Taking a closer look, if you were to launch the template with the new VPC option, and select an indexer cluster with 3 nodes and a search head cluster, the architecture would look something like this:

Each and every Splunk deployment is a special snowflake as unique as the fingerprints of the team deploying it. The aim of this Quick Start is to give you a fantastic place to start. These templates are designed to be expanded upon and tailored to your specific needs.

In the end, I want you to spend as much time as possible enjoying the benefits of what Splunk can do. I hope this helps you spend less time deploying and more time enjoying.

If you have questions or comments, I’d love to hear them in the comment section below. If you happen to find a bug, you can report them via GitHub.

↧

Using machine learning for anomaly detection research

February 15, 2017, 5:31 am

≫ Next: Splunk DB Connect 3 Released

≪ Previous: Splunk AWS Quick Start: Deploy Your AWS Splunk Environment In Minutes

Over the last years I had many discussions around anomaly detection in Splunk. So it was really great to hear about a thesis dedicated to this topic and I think it’s worth sharing with the wider community. Thanks to its author Niklas Netz in advance!

Obviously anomaly detection is an important topic in all core use case areas of Splunk, but each one has different requirements and data, so unfortunately there is not always an easy button. In IT Operations you want to detect systems outages before they actually occur and proactively keep your depending services up and running to meet your business needs. In Security you want to detect anomalous behavior of entities to detect potential indicators for breaches before they occur. In Business Analytics you might want to spot customer churn or find patterns that indicate severe business impacts. In IoT you may want to find devices that suddenly turn into an unhealthy state or detect anomalies in sensor data that indicate potentially bad product usage.

Before we start with solutions let’s take a step back and raise a more fundamental question: “What is an anomaly?” or “What does anomaly detection mean (in your context)?” One common answer from Wikipedia “is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.”

So this means that we need to know an “expected pattern” or a normal status what is often referred with the term “baselining”. Sometimes people do not yet have a clear answer to the question what anomaly or normality means in the context of their use cases, so finding the right approach is obviously even harder then.

Historically there have been many in-depth studies around anomaly detection but recently there was a thesis published by Niklas Netz who took a closer look at different ways to spot anomalies specifically with Splunk. His research was part of the cooperation between Hamburg University of Applied Sciences and OTTO group together with Splunk partner LC Systems who also jointly presented the results at .conf 2016:

http://conf.splunk.com/files/2016/slides/anomaly-detection-on-business-items-with-machine-learning-algorithms.pdf

Now Niklas’ thesis (in german) is published and definitely worth a read for anybody who wants to go in depth and detail with anomaly detection in Splunk. He addresses the basic challenges and compares different approaches and solutions that span from basic SPL commands for anomaly detection over 3^rd party apps to Splunk App for Machine Learning. Read the full text here: http://edoc.sub.uni-hamburg.de/haw/volltexte/2016/3691/pdf/Bachelorarbeit_Netz.pdf

As a brief summary, Niklas concluded that getting the right data, cleaning and transforming it so that it was sufficient for his goals was the most time consuming part in the process. He decided to evaluate different machine learning models for categorical classification to detect data points that were labeled as anomaly if they were crossing a threshold of relative change compared to the hour or day before. So according to his goal he defined conditions and engineered features that helped to model what’s normal and in relation to that what is an anomaly. In his case a RandomForestClassifier did the best job. With his work he paved the road for further development of machine learning and anomaly detection use cases at OTTO, but I also hope the wider Splunk community will find his work valuable.

Finally I want to share a few links to useful products and resources that help to tackle anomaly detection in Splunk for specific areas or in general:

Discover Splunk premium apps like Splunk IT Service Intelligence with automated anomaly detection and dynamic thresholding or Splunk Enterprise Security and Splunk User Behavior Analytics
Splunk SPL commands for finding anomalies
Splunk Machine Learning App
Anomaly detectors
Explore apps on splunkbase around anomaly detection
Anomaly detection topics on answers.splunk.com

↧

Splunk DB Connect 3 Released

February 20, 2017, 5:13 pm

≫ Next: Splunking Microsoft Azure Network Watcher Data

≪ Previous: Using machine learning for anomaly detection research

Splunk DB Connect has just gotten a major upgrade! Let’s take a look at it.

What’s New

Splunk DB Connect 3.0 is a major release to one of the most popular Splunk add-ons. Splunk DB Connect enables powerful linkages between Splunk and the structured data world of SQL and JDBC. The major improvements of this release are:

Performance improvement. Under similar hardware conditions and environment, DB Connect V3 is 2 to 10 times faster than DB Connect V2, depending on the task.
Usability improvement. A new SQL Explorer interface assists with SQL and SPL report creation.
Improved support for scripted configuration, via reorganized configuration files and redesigned checkpointing system. Note that rising column checkpoints are no longer stored in configuration files.
Stored procedures support in dbxquery.
Retry policy on scheduled tasks is improved (no more need for auto_disable)

Backward Compatibility Changes

As part of this major release, we are making changes that will affect some users. The features that will have backward compatibility changes are:

Resource pooling is removed. If you are now using resource pooling, the configuration will be removed and all scheduled tasks will operate on the master node only. Resource pool nodes can be repurposed.
Scheduled tasks (inputs, outputs) are disabled on search head cluster. Scheduled tasks are disabled, but you can still perform output using dbxoutput command on search head cluster. If you are now using scheduled tasks on DB Connect V2, you need to move the configuration files from a cluster node to a heavy forwarder, then upgrade in-place to DB Connect 3.
Lookups redesigned. For performance and clarity reasons, automatic and scripted lookups have been replaced with a simpler, more performant dbxlookup command. If you are now using scripted lookups for their caching behavior, you can replicate this behavior and avoid search changes by creating a scheduled dbxquery task which outputs a lookup with the same name. If you are now using automatic lookups for live database access, you need to edit the searches to use the dbxlookup command instead of lookup.
dbxquery command options changed. The options output and wrap are deprecated and have no effect. The value for output and wrap is set to CSV and False by default. The value for shortnames is set to true by default.

Migration

DB Connect users should review documentation and test upgrade before moving DB Connect 3 into production. If you just upgrade the existing package in production, data will no longer flow. The version 3 package includes a migration script, see http://docs.splunk.com/Documentation/DBX/3.0.0/DeployDBX/MigratefromDBConnectv1 for documentation. Users of Spark SQL, Teradata, or Oracle databases may need to take additional manual steps to complete driver migration.

↧

Splunking Microsoft Azure Network Watcher Data

February 20, 2017, 9:41 pm

≫ Next: SSL Proxy: Splunk & NGINX

≪ Previous: Splunk DB Connect 3 Released

Microsoft has released a new service in Azure called Network Watcher. Network Watcher is a network performance monitoring, diagnostic, and analytics service which enables you to monitor your network in Azure. The data collected by Network Watcher is stored in one or more Azure Storage Containers. The Splunk Add-on for Microsoft Cloud Services has inputs to collect data stored in Azure Storage Containers which provides valuable insights for operational intelligence regarding Azure network workloads. In this blog post, we will explore how to get Azure Network Security Group (NSG) Flow Logs into Splunk and some possible use case scenarios for the data.

Getting Azure NSG Flow Log data into Splunk

NSG flow logs allow you to view information about ingress and egress IP traffic on their Network Security Groups. These flow logs show the following information:

Outbound and Inbound flows on a per Rule basis
Which NIC the flow applies to
Tuple information about the flow (Source/Destination IP, Source/Destination Port, Protocol)
Information about whether the traffic was allowed or denied

Getting Azure NSG Flow Log data into Splunk involves two basic steps:

Configure NSG Flow Logs in the Azure Portal
Setup the Splunk Add-on for Microsoft Cloud Services to read the NSG Flow logs from the specified Azure Storage Container from step 1

Configuring NSG Flow Logs in the Azure Portal

From the Azure Portal, select Browse -> Network security groups

Select an existing security group and choose Settings -> Diagnostics to turn on data collection.

Choose a storage account to send the logs and enable NetworkSecurityGroupFlowEvent

Configuring the Splunk Add-on for Microsoft Cloud Services to ingest NSG Flow Logs

Download and install the Splunk Add-on for Microsoft Cloud Services in accordance with the documentation.

After installation of the add-on, connect the add-on to the Azure Storage Account specified above.

The NSG Flow Log data is kept in an Azure Storage blob container named insights-logsneh-networksecuritygroupflowevent.

Configure an Azure Storage Blog input for this container.

Notice that the sourcetype is set to mscs:nsg:flow. You do not have to set your sourcetype to this. I just chose this as an easy way to differentiate the data. Here is a handy props.conf configuration to break the JSON array into individual events:

[mscs:nsg:flow]
LINE_BREAKER = \}([\r\n]\s*,[\r\n]\s*)\{
SEDCMD-remove_header = s/\{\s*\"records\"\:\s*\[\s*//g
SEDCMD-remove_footer = s/\][\r\n]\s*\}.*//g
SHOULD_LINEMERGE = false
KV_MODE = json
TIME_PREFIX = time\":\"
REPORT-tuples = extract_tuple

Here is a handy transforms.conf delimiter for the tuples in the data:

[extract_tuple]
SOURCE_KEY = properties.flows{}.flows{}.flowTuples{}
DELIMS = ","
FIELDS = time,src_ip,dst_ip,src_port,dst_port,protocol,traffic_flow,traffic_result

Searching the NSG Flow Log Data with Splunk

Once the input from above is created, the NSG Flow Log data will be available to search in Splunk. Some potential use cases for this data include:

Monitoring Protocols – this is a security and compliance use case. Ensure only the correct protocols are in use and monitor the traffic usage of each protocol over time.

sourcetype=mscs:nsg:flow | top protocol by dst_ip

Monitoring Traffic Flow – this is useful to identify potential rouge communication. For instance, if a source machine in your Azure environment exhibits destination traffic to a known bad address, this could indicate potential malware.

sourcetype=mscs:nsg:flow | stats count by src_ip dst_ip

This search could be visualized on a Sankey Diagram as well to visualize the flow.

Monitoring Allowed vs. Denied Traffic – this could indicate an attack or a misconfiguration. If you are seeing a lot of denied traffic, this could indicate a misconfiguration of software that is trying to communicate with your Azure resources.

sourcetype=mscs:nsg:flow | stats count by traffic_result src_ip

Top Destination Addresses/Ports – this is useful for security and monitoring usefulness of services hosted in Azure

sourcetype=mscs:nsg:flow |top dst_port by dst_ip

Conclusion

Even though NSG Flow Logs are a new data source made available by Microsoft Azure, the Splunk Add-on for Microsoft Cloud Services is ready to ingest this data source today in order to give you an even greater degree of operational insight and intelligence for you Microsoft Azure environment.

↧

SSL Proxy: Splunk & NGINX

February 20, 2017, 10:29 pm

≫ Next: From API to easy street within minutes

≪ Previous: Splunking Microsoft Azure Network Watcher Data

Who is this guide for?

It is a best practice to install Splunk as a non-root user or service account as part of a defense in depth strategy. This installation choice comes with the consequences of preventing the Splunk user from using privileged ports (Anything below 1024). Some of the solutions to this problem, found on Splunk Answers require iptables rules or other methods. In my experience, the iptables method is not that reliable, and many newer distributions of Linux are abandoning iptables in favor of firewalld as the default host firewall. In this guide, I will show you how to use Nginx, and Let’s Encrypt to secure your Splunk Search Head, while allowing ssl traffic on port 443.

Prerequisites

• OS which supports the latest version of Nginx
• Linux OS required for Let’s Encrypt (If you choose to use that as your CA)
• Root access to the search head

Configuration

The easiest way to get both products installed is to use yum or apt depending on your flavor of Linux.

Install Let’s Encrypt, Configure Splunk Web SSL

In a previous blog post, I provided a guide to generate SSL certs and configure Splunkweb to make use of them. You should follow that guide to generate your certs or your own organizational process for generating certificates before proceeding with the next steps.

Install Nginx

$ sudo apt install nginx

Configure Nginx to use SSL

Create a configuration for your site, it is best to use the hostname/domainname of the Splunk server. This file should be created in

/etc/nginx/sites-enabled

$ touch /etc/nginx/sites-enabled/splunk-es.anthonytellez.com

To configure Nginx for SSL, you only need three pieces of information:
• location of the certificate you plan to use
• location of the private key used to generate the certificate
• ssl port(s) to redirect

Example Configuration of splunk-es.anthonytellez.com:

server {
    listen 443 ssl;
    ssl on;
    ssl_certificate /opt/splunk/etc/auth/anthonytellez/fullchain.pem;
    ssl_certificate_key /opt/splunk/etc/auth/anthonytellez/privkey.pem;
    location / {
        proxy_pass https://127.0.0.1:8000;
    }
}

Reload Nginx:

 $ nginx -s reload

Optional: Redirect all http requests
To prevent users from seeing the default webpage served by Nginx, you should also redirect traffic over port 80 to port 443 to prevent leaking information about the version of Nginx running on your server.

server {
    listen 80;
    server_name splunk-es.anthonytellez.com;
    return 301 https://$host$request_uri;
}

Optional: Enable HSTS
HSTS is web security policy mechanism which helps to protect websites against protocol downgrade attacks and cookie hijacking. Enabling this in Nginx can help to protect you if you are ever accessing your Splunk instance from an unprotected network. The included example is set with a max-age of 300 seconds, you can increase this to a larger time once you have validated the configuration is working.

server {
    listen 443 ssl;
    add_header Strict-Transport-Security "max-age=300; includeSubDomains" always;
    ssl on;
    ssl_certificate /opt/splunk/etc/auth/anthonytellez/fullchain.pem;
    ssl_certificate_key /opt/splunk/etc/auth/anthonytellez/privkey.pem;
    location / {
        proxy_pass https://127.0.0.1:8000;
    }
}

HSTS will force all browsers to query the https version of the site once they have processed this header. If you have issues validating if HSTS is working in your browser of choice, check out this resource on stack exchange: How can I see which sites have set the HSTS flag in my browser?

↧

From API to easy street within minutes

February 21, 2017, 2:48 pm

≫ Next: Splunking Microsoft Azure Network Watcher Data

≪ Previous: SSL Proxy: Splunk & NGINX

30? 20? …15? It all depends on how well you know your third-party API. The point is that polling data from third-party APIs is easier than ever. CIM mapping is now a fun experience.

Want to find out more about what I mean? Read the rest of this blog and explore what’s new in Add-on Builder 2.1.0.

REST Connect… and with checkpointing

Interestingly this blog happens to address a problem I faced back on my very first project at Splunk. When I first started at Splunk as a Sales engineer, I worked on building a prototype of the ServiceNow Add-on. Writing Python, scripted inputs vs mod input, conf files, setup.xml, packaging, best practices, password encryption, proxy and even checkpointing… the list goes on. It was tough dealing with all of these, to say the least. Was wondering why this can’t be much easier.

Fast forward to today, and an easy solution has finally arrived. You can now build all of the above with the latest version of Add-on Builder, all without writing any code or dealing with conf files. If you know your third-party API, you could be building the corresponding mod input in minutes.
One powerful addition to our new data input builder is checkpointing. In case you were wondering, checkpoints are for APIs what file pointers represent for file monitoring. Instead of polling all data from an API, checkpointing allows you to do it incrementally for new events only, at every poll. Checkpointing is a pretty complicated concept at times but very essential to active data polling. Luckily, I can say that this is no longer as complex as it used to.

For an example of doing this in Add-on Builder 2.1.0, check out Andrea Longdon’s awesome walkthrough using the New York Times API. This cool example will show you how to monitor and index NY Times articles-based user-defined key words.

You will be able to define your app/add-on setup and automatically encrypt passwords using the storage password endpoint, in a drag and drop interface.

CIM update at run-time

CIM mapping has the following major enhancements:

A new UI that makes it possible to compare fields from your third-party source and CIM model fields side by side.
You can also update CIM mapping objects even if they are built outside of Add-on Builder with no restart needed. In other words, can now update CIM mapping at run time in one single view from Add-on builder.

What else is new?

The Add-on Builder has a new and enhanced setup library consistent with modern Splunk-built add-ons. This allows you to have more flexibility over the setup components you are building. That, in addition to automatically handling password encryption.

You can now import and export add-on projects, allowing you to work on an add-on on different computers and share projects with others. For details, see Import and export add-on projects.
One of my favorites: no more interruptions caused by having to restart Splunk Enterprise when building new data inputs, creating a new add-on, or any other step. Go through the end-to-end process, undisturbed.

Please check out our latest release. We would love to hear from you. Teaser alert, in the next blog post, I will share information about how to build SolarWinds Add-on using Add-on Builder 2.1.0.

Happy Splunking!

↧