Solved: LINE_BREAKER for nmap output

howyagoin · ‎04-23-2011

I must be high on Easter chocolate, as I just can't get this to work right.

Problem: I have nmap verbose output (such as from running the nse scripts) for a scan across a subnet. Thus, I want each host to be an event. The issue should be, in theory, as simple as saying that with the right line_breaker, the event will be properly seen.

I thought that putting this in my props.conf would do the trick:

[nmap-verbose]
LINE_BREAKER = Nmap scan report for (\d+\.\d+\.\d+\.\d+)
TRANSFORMS-nmap=nmap-host

And my transforms.conf:

[nmap-host]
REGEX = Nmap scan report for (\d+\.\d+\.\d+\.\d+)
FORMAT = dst_ip::$1

I've tried this without the transform, and all sorts of combinations on the regex, but don't get anywhere.

Raw data looks like this:

Nmap scan report for 123.123.123.123
Host is up (0.015s latency).
PORT   STATE SERVICE
21/tcp open  ftp

Nmap scan report for 123.123.123.124
Host is up (0.014s latency).
PORT   STATE SERVICE
21/tcp open  ftp
| ftp-anon: Anonymous FTP login allowed (FTP code 230)
|_drwxr-xr-x    2 0        0            4096 Mar 08 05:54 pub

Ideally, I'd like to get the IP address in the "scan report" line seen as the destination IP of the scan as a field I can use, but the way I'm doing it now either results in thousands of events, pretty CRLF separated, or, a handful events with many hundreds of lines per event. Also tried with and without the linemerge true/false set...so I'm obviously missing something..

Thanks!

howyagoin · ‎05-04-2011

I found a better way to get what I want, and am including my search here so that others may hopefully benefit. I've run my nmap scans with -oG to generate the "greppable" format. With that being read into Splunk, the following search generates some pretty useful ways of looking at/for data:

The raw input looks like:



1.2.3.4 ()  Ports: 21/closed/tcp//ftp///, 22/closed/tcp//ssh///, 23/closed/tcp//telnet///, 80/open/tcp//http//Microsoft IIS httpd/, 139/open/tcp//netbios-ssn///

And my search:

index=nmap open 
| rex "\\tPorts:(?P<ports>[^\\t]+)" 
| makemv delim="," ports 
| rex "^Host: (?<target>\d+\.\d+\.\d+\.\d+)" 
| rex field=_raw "\s(?<port>\d+)/(?<status>[^/]+)/(?<proto>[^/]+)//(?[^/]+)//(?<desc>[^/]+)/" 
| search port="80" 
| stats count by desc

Translating this into plain English, I have all of my nmap output in a specific index, and am only looking for the lines which have something open. A system which reports everything as closed could be interesting as well, but, that's not what I was after.

From there, I extract the Ports bit of nmap's output and run that through makemv to break down the individual port and status //// combinations.

After that, I break down those into their respective components, which then leads to the ability to search for specific things like port=80, or 22 or whatever, and finally create a nice table of values of descriptions from the -sV flag in nmap.

There are no doubt far more graceful ways of doing this, but, if someone else comes to the splunkbase looking for nmap hints, maybe this will help.

View solution in original post

kore · ‎10-08-2012

This (howyagoin's) post helped me greatly in working out how to get it in operation - those interested in parsing greppable Nmap with Splunk may want to check it out (below and more information in my full post).

Additional answer for this issue - to extract all fields (and shorten your search)
Field Extraction Help Gnmap and Troubleshooting

You can then search on port, state, daemon and banner in a more succinct search. You may have issues with a very small number of daemons (e.g. nfs and rpc) as the nmap output is slightly incorrect for those services at this point in time - inconsistent use of the field separators "/" (A work-around sed command is in my full post).

inputs.conf

[monitor:///path/to/greppable/nmap/*.gnmap]
index = nmap
sourcetype = nmap
queue = parsingQueue
disabled = 0

props.conf

[nmap]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)
TRANSFORMS-nmap = NMAPsetnull,NMAPsetparsing
EXTRACT-ip = (?i)Host: (?P<ip>[^ ]+)
EXTRACT-hostname = (?i)^[^\(]*\((?P<hostname>[^\)]+)
EXTRACT-subdomain = (?i)\(.*?\.(?P<subdomain>\w+\.\w+\.\w+\.\w+)(?=\))
EXTRACT-domain = (?i)\..*?\.(?P<domain>\w+\.\w+\.\w+)(?=\))
REPORT-ports = ports

transforms.conf

[NMAPsetnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[NMAPsetparsing]
REGEX = Ports:
DEST_KEY = queue
FORMAT = indexQueue

[ports]
REGEX = \s(?<port>\d+)/(?<state>[^/]+)/(?<proto>[^/]+)//(?<daemon>[^/]*)//(?<banner>[^/]*)/
DEFAULT_VALUE = null
MV_ADD = TRUE

howyagoin · ‎05-04-2011

I found a better way to get what I want, and am including my search here so that others may hopefully benefit. I've run my nmap scans with -oG to generate the "greppable" format. With that being read into Splunk, the following search generates some pretty useful ways of looking at/for data:

The raw input looks like:



1.2.3.4 ()  Ports: 21/closed/tcp//ftp///, 22/closed/tcp//ssh///, 23/closed/tcp//telnet///, 80/open/tcp//http//Microsoft IIS httpd/, 139/open/tcp//netbios-ssn///

And my search:

index=nmap open 
| rex "\\tPorts:(?P<ports>[^\\t]+)" 
| makemv delim="," ports 
| rex "^Host: (?<target>\d+\.\d+\.\d+\.\d+)" 
| rex field=_raw "\s(?<port>\d+)/(?<status>[^/]+)/(?<proto>[^/]+)//(?[^/]+)//(?<desc>[^/]+)/" 
| search port="80" 
| stats count by desc

Translating this into plain English, I have all of my nmap output in a specific index, and am only looking for the lines which have something open. A system which reports everything as closed could be interesting as well, but, that's not what I was after.

From there, I extract the Ports bit of nmap's output and run that through makemv to break down the individual port and status //// combinations.

After that, I break down those into their respective components, which then leads to the ability to search for specific things like port=80, or 22 or whatever, and finally create a nice table of values of descriptions from the -sV flag in nmap.

There are no doubt far more graceful ways of doing this, but, if someone else comes to the splunkbase looking for nmap hints, maybe this will help.

gkanapathy · ‎04-23-2011

You should use either:

SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=Nmap scan report for \d+\.\d+\.\d+\.\d+)

or

SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = Nmap scan report for \d+\.\d+\.\d+\.\d+

The former is much more efficient, but the latter may be easier to understand.

Event breaking occurs in two steps:

Line breaking, which uses the LINE_BREAKER regex to split the incoming stream of bytes into separate lines. By default, the LINE_BREAKER is any sequence or newlines and carriage returns (i.e, ([\r\n]+)). The first capture group in the regex is discarded from the input, but Splunk breaks the incoming stream into lines here.
Line merging, which only occurs when SHOULD_LINEMERGE is set to true, which it is by default. This stage uses all the other line merging settings (e.g. BREAK_ONLY_BEFORE, BREAK_ONLY_BEFORE_DATE, MUST_BREAK_AFTER, etc.) to merge the previously-separated lines into events.

If the second step does not run, then the events are simply the same as the individual lines. The first step is relatively efficient, while the second is relatively slow. If you are clever with the LINE_BREAKER regex, you can often make Splunk get the desired result using only the (efficient) first step, and skipping the (slow) second step.

gkanapathy · ‎04-25-2011

I guess I don't really know why or whether to use a lookahead. Just seems like a good idea and a little safer.

mslvrstn · ‎04-23-2011

gkanapathy, just curious .. why does the second part of the LINE_BREAKER have to be a lookahead? The documentation isn't completely clear about where the text after the first capture group ends up. I thought it should be enough to say

LINE_BREAKER = ([\r\n]+)Nmap scan report for

but that didn't work when I tried it. Can you explain?

howyagoin · ‎04-23-2011

Hmm, okay, this gets me a bit closer - sort of. Problem now is that it's parsing the timestamps in the FTP directory listings and generating errors:

04-24-2011 15:07:51.795 +1000 WARN DateParserVerbose - Time parsed (Tue Oct 19 09:45:00 2010) is too far away from the previous event's time (Thu Mar 29 04:35:00 2007) to be accepted. If this is a correct time, MAX_DIFF_SECS_AGO (3600) or MAX_DIFF_SECS_HENCE (604800) may be overly restrictive. Context="source::/nmap/ftp/anonymous.txt|host::nmap|nmap-verbose-t|"

Even so, the result still isn't right - tried both formats.

RMcCurdyDOTcom · ‎08-17-2023

I used the XtremeNmapParser to convert the xml to JSON and then used HEC to send it all to Spunk!

https://github.com/xtormin/XtremeNmapParser/issues/1

RMcCurdyDOTcom · ‎09-26-2023

got nasty gram for posting links

search online for freeload101 github in scripts nmap_fruit.sh

LINE_BREAKER for nmap output

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

LINE_BREAKER for nmap output

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...