A customer of mine runs daily vulnerability scans using a myriad of tools across servers and applications in their data center. As you would expect, they can use the native reporting capabilities of the various tools to review a given set of scan results, but if you’ve done this sort of thing before then you understand the issues associated with trying to get any real understanding by looking at disparate, clunky reporting tools, each in their own silo.
Ingesting vulnerability data into Splunk offers tons of powerful abilities while breaking down any sort of silos that could be a hindrance to correlation and true security analytics. In fact, our Common Information Model includes a Vulnerabilities model that can help normalize this type of data, and our Enterprise Security App includes a Vulnerability Center plus other dashboards and correlations for this data.
Some common use cases for vulnerability data in Splunk include understanding key elements of a host’s security posture, correlating vulnerabilities with CVEs or other indicators of attack, and monitoring vulnerability/patching cycles and trending over time.
This customer, though, had a very specific use in mind for this vulnerability data – they wanted to see deltas (changes) in vulnerabilities reported across hosts from one day to the next. So for a given scan type, host and unique vulnerability ID we need to ask 2 questions:
- Is this a new vulnerability detection, i.e. it wasn’t there yesterday, but is there today?
- Has a vulnerability that was there yesterday been mitigated so that it doesn’t show up on the scan today?
Why would an organization be interested in knowing about vulnerability deltas? In this particular case it was largely about compliance reporting integration with another tool. Beyond that, organizations might find deltas useful to:
- understand the effectiveness of patching/mitigation
- detect benign or malicious installation of vulnerable software on critical systems
- better understand and detect changes in system state
We started by ingesting the vulnerability scan data into Splunk. You can do this all sort of ways, and in fact there are many apps and add-ons available on SplunkBase to support ingestion of data from specific tools and vendors, but in our case we just consumed flat file output from the vulnerability scanners.
Once we brought the data in, we normalized it against the Common Information Model. We didn’t strictly need to do this, but not only does it simplify normalization of data across different vulnerability scanning tools, it also makes it easy to reuse the search logic without having to change field names.
After a little bit of data exploration and some trial and error, here is the search we came up with for Nessus vulnerability scan data:
index=”nessus” sourcetype=”nessus” [search index=”nessus” sourcetype=”nessus” | dedup date_mday | head 2 | fields date_mday]
| stats list(date_mday) as dates list(_time) as time count by dest,signature | where count <2
| appendcols [search index=”nessus” sourcetype=”nessus” [search index=”nessus” sourcetype=”nessus” | dedup date_mday | head 2 | fields date_mday] | stats list(_time) as _time count by dest,signature | where count <2 |stats latest(_time) as today earliest(_time) as yesterday] | filldown today yesterday
| eval condition=case(time=today,”New Detection”,time=yesterday,”Removed Detection”)
| fields – dates,count,today,yesterday
| convert ctime(time)
| eval uid=dest.” | “.signature
| table uid time dest signature condition
| outputlookup nessus_delta.csv
It’s probably not the most efficient way to do this sort of search, but since vulnerability scan data isn’t particularly voluminous and we only had to run it once per day, it worked well for us. Here’s what your output might look like:
Let’s break this search down a bit:
index=”nessus” sourcetype=”nessus” [search index=”nessus” sourcetype=”nessus” | dedup date_mday | head 2 | fields date_mday]
This is our base search, which returns all my Nessus scan data. Yes, I could have used a data model for this and probably would do that in production. The subsearch (the part in [] brackets) does the interesting work here. In short, in ensures that I only have scan results from the last 2 days on which I ran a scan by adding “date_mday=<latest day in my results> OR date_mday=<second latest day in my results>” to the base search.
Note that this works great as long as you don’t have more than 1 scan per day, and can easily handle skipped days or wrapping around month boundaries, but you would need to write something a little different if you had more than 1 scan result set per day (perhaps using the streamstats command). Also, don’t set your search time window to more than 30 days or so.
| stats list(date_mday) as dates list(_time) as time count by dest,signature | where count <2
The stats command creates a table with a row for each unique combination of a scanned system (dest field) and unique scan identifier (signature field). Those 2 fields in combination represent a unique detection. For each of those rows it lists out the month date or dates on which the detection occurred (probably not necessary, but was useful for validation) and the time stamp (absolutely necessary).
The where statement is how we filter out the results to only show rows where there’s a delta. Think about how it works – if the row has a count of 2, that means it was detected on both days in both scans, i.e. no change. If it only shows up once in those 2 scans, it means that there’s been a change. We now just need to figure out if it’s a new detection or a resolved/mitigated detection.
| appendcols [search index=”nessus” sourcetype=”nessus” [search index=”nessus” sourcetype=”nessus” | dedup date_mday | head 2 | fields date_mday] | stats list(_time) as _time count by dest,signature | where count <2 |stats latest(_time) as today earliest(_time) as yesterday] | filldown today yesterday
The appendcols command here, with nested subsearches, simply adds 2 columns with epoch time stamps representing dates/times of the 2 most recent scans (i.e. the scans we are including per the base search). We need these for comparison to know what kind of delta it is (new vs. removed detection). This works here because our data had a single timestamp for all events from a given scan. If yours doesn’t, then you’ll need to extract or eval a value representing just the date here, and for comparison in the original stats command. We also used the filldown command to populate each row with the time values for today and yesterday, instead of just the first row.
| eval condition=case(time=today,”New Detection”,time=yesterday,”Removed Detection”)
The eval command compares the time for each row against the known values for the 2 previous scans. If the detection timestamp matches the “yesterday” value, then we know it’s a removed detection (it was here yesterday, but not today), but if it matches the “today” value, then it’s a new detection (it’s here today, but wasn’t here yesterday). You can’t have a row with both yesterday and today because we filtered out the results with 2 detections after our original stats command with the where statement.
| fields – dates,count,today,yesterday
| convert ctime(time)
| eval uid=dest.” | “.signature
| table uid time dest signature condition
The above portion of the search is largely about cleanup. We remove fields we no longer need, format the epoch timestamps into something more human-readable, create a unique ID field (uid) consisting of a concatenation of dest and signature fields, and then re-arrange the columns in the table to make more sense.
| outputlookup nessus_delta.csv
Finally, we used the outputlook command to create a nicely formatted CSV file on the system for integration with another tool. If you do it this way, you can use a cron job to script out moving the file from the <splunk_home>/etc/apps/<your app>/lookups/ directory to a network share and perhaps putting a timestamp in the file name.
Even better would be to save this search (minus the outputlookup) as an Alert, then use a scripted alert response to write the data directly to a network share with a unique file name that includes a timestamp. You could probably do it in well under 20 lines of python or similar scripting language.
If you’ve got vulnerability or other scan data, pull it into Splunk. Try to understand your trends and deltas, and see what else you can learn from it. It’s a treasure trove of valuable security and compliance data.
Happy Splunking!