Need advice/help with how to parse data file

rvoninski_splun · ‎01-13-2017

Need help parsing file

Each file represents a unique complete test. Here is a snippet of what we have.

Some notes:

It breaks on the word BREAKERWORD

Can have multiple type of events

First part with SYSTEM_PARM is just a key value pair that we need to parse

Second and third parts (in-between BREAKERWORD). Name is the Name of the test. DATA represent parts of the test. ANd then the BEGIN and END sections map to the 'parts of the test' in the order that they are presented. In other words looking at refFirstItem maps to the 1st BEGIN and END section with these two values --> -7.107569270486e+01 and -8.107569270486e+01.

If you have an actual solution to this problem a massive thanks. But even if you have suggestions on how to attack this or if it is reasonably possible to complete with Splunk please pipe and and give a hand. Thank you in advance

BREAKERWORD
NAME PARAMETERS
#SYSTEM_PARM SourceQueue ABCD123
#SYSTEM_PARM SourceNode EFGHI456
#SYSTEM_PARM DestinationQueue JKL789
#SYSTEM_PARM DestinationNode MNO012
BREAKERWORD
NAME ABCD
DATA refFirstItem
DATA refSecondItem 
DATA refThirdItem
BEGIN
-7.107569270486e+01
-8.107569270486e+01
END
BEGIN
-2.767100000000e+01
END
BEGIN
-1.345589265277e+01
END
BREAKERWORD
NAME EFGH
DATA ArefFirstItem
DATA ArefSecondItem 
BEGIN
-7.107569270486e+01
-8.107569270486e+01
END
BEGIN
-2.767100000000e+01
END

arkadyz1 · ‎01-13-2017

The file you are showing has quite an unusual structure, hard to interpret using pure Splunk means. I suggest you look at modular inputs - they let you ingest any kind of data, flatten the structure and pass it to Splunk. Start by reading about modular inputs in "Splunk Enterprise Getting Data" manual of your Splunk version.

As @cusello wisely pointed out, you'll have to make your decision on assigning timestamps. With modular input, you'll also need to determine how you ingest the file. Personally, I only have experience with getting the data on a TCP port, and I made that port a parameter to your modular input (you specify the parameter when you create an actual input using your module). My modular input is backed by Python script, but you can use other scripting languages or even executables. Anyway - the script in your modular input will have all the knowledge about the file structure, and will transform that into a simple "field1=val1, field2=val2" format.

rvoninski_splun · ‎01-13-2017

Thank you very much for this answer. I haven't played with modular inputs. But it might be the right for m of attack since my colleague and I were actually talking about just writing a script to put it into a format better suited for splunk. Will need to review. As far as timestamp - I really don't care. I have a MySQL database I have already connected to that lists the 'sourcefile' name that there will be for each of these tests and I was planning on using that to match up.

gcusello · ‎01-13-2017

Hi rvoninski [Splunk],
what is the scope of your search?
You could ingest these events line_interrupted by BREAKERWORD assigning to each one timestamp from the file or the indextime.
Supposing that every test is ingested separately, you could correlate events by _time.
After you could use multi volume commands to separate and list all the values.
In this way you could display results, if this is your scope.
Bye.
Giuseppe

rvoninski_splun · ‎01-13-2017

I'm not really worried about time since i can match the tests by the sourcefile name. One complete test is in every file. Can you give me an example of how to use multi volume commands? Thanks. Rich

Need advice/help with how to parse data file

Splunk Developers: Go Beyond the Dashboard with These .Conf25 Sessions

Index This | How do you write 23 only using the number 2?

Splunk ITSI & Correlated Network Visibility

Are you a member of the Splunk Community?

Need advice/help with how to parse data file

Splunk Developers: Go Beyond the Dashboard with These .Conf25 Sessions

Index This | How do you write 23 only using the number 2?

Splunk ITSI & Correlated Network Visibility