Splunk Search

Where to find unmatched regex events?

Path Finder

I know this is a silly question but for some cases I need to know where the unmatched events go because my regex is to matched what I will index but there are some cases that the data will be corrupted so the events will not gonna match to my regex. For validation of the data I just need to see that corrupted events that didnt match my regex construction.

Hope you guys can help me with this.

Tags (2)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

If you are trying to look for events at search time, you can try the command "regex" and do a negative matching.

example to find matching events

    <mysearch> | regex _raw="myregex"

and to find not matching events

    <mysearch> | regex _raw!="myregex"

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

If you are trying to look for events at search time, you can try the command "regex" and do a negative matching.

example to find matching events

    <mysearch> | regex _raw="myregex"

and to find not matching events

    <mysearch> | regex _raw!="myregex"

View solution in original post

0 Karma

Path Finder

Ok thanks thats what I wanted I thought because of my transforms.conf it didnt get index.

0 Karma

Ultra Champion

Ok, anything which matches the corrupted transform is dropped, and never indexed.

You will have no record of them in Splunk at all.
(Though if you still have the source log files, that data will still of course be in there)

0 Karma

Path Finder

Ahhmm.. ok sad to say I need to figure out how can it be indexed

0 Karma

Ultra Champion

Going forwards, you need to remove DEST_KEY = queue
FORMAT = nullQueue
and instead write it to an index, as per the previous transforms stanza, but the historic data is gone.

0 Karma

Path Finder

Hmmm... the reason why I put that is because when My regex match all the data it will go to the nullQueue and the only event that will index is the corrupted events
Note: the corrupted events doesnt have a pattern that is needed to match so thats why I used nullQueue because It index what is unmatch in the regex and that thing affects my indexQueue in indexing what I also needed events

0 Karma

Ultra Champion

Actually, just comment out TRANSFORMS-corrupted-txhistmain = CORRUPTED_TXHISTMAIN from props.conf and it will end up in the same index, if that’s what you want?

If you want it in a different index, let me know.

0 Karma

Path Finder

I also tried that thing the result was the corrupted events didnt indexed because it didnt match the my regex in indexQueue part

0 Karma

Ultra Champion

Can you post the indexQueue config - that was not in your post.

0 Karma

Path Finder

Ohh that thing Sorry i just said indexQueue to make it more clear because my transforms doesnt have indexQueue but its functionality is also the same because of the WRITE_META = true that directs the matches data to the index

0 Karma

Ultra Champion

oh, sorry I misunderstood.

Just to check..
You match a line which starts with 'HISTMAIN' against two transforms - EXTRACTTXHISTMAIN, and SETHISTMAIN
The first one performs field extractions.
The second one sets the sourcetype to 'TXHISTMAIN'

When you talk about 'corrupted' events, do you mean lines which do not begin 'HISTMAIN', or do you mean events which DO match 'HISTMAIN' but don't match the extraction regex?

0 Karma

Path Finder

Yes thats right thats what exactly the problem I want to solve ... its like the line that matches the histmain but it got corrupted lets just say there are tabs that didnt generate or a field that is not its data type for example a transaction number field and because the tab was not generate the field got the wrong data for example the data generated will be a string instead of a digit or number... the unmatch data where the extraction did not met because of the corrupted events thats what I want to index without affecting the other extraction

0 Karma

Ultra Champion

Ok, so unless I am overlooking something obvious (it's been a long day) your config looks good to me.

What happens if you search sourcetype=txhistmain (NOT Transact=* NOT Branch=*) over all time, do you see any events?

0 Karma

Path Finder

Hmmm thats also an option but for my case I have more than 10 billion events so if I search all time to figure out what corrupted events it will take too long before I get the result and besides it also has 100+ different fields..

0 Karma

Ultra Champion

10Bil - ha that's nothing 🙂
Pick a smaller range then - if you get any results then I you are already indexing the 'corrupt' data

0 Karma

Path Finder

Actually I didnt tried to search in prod server instead i used a test server before i push my configs I did a several test with corrupted events indicated in the file but that certain events did not indexed because it didnt match the histmain regex so its like a filtration process but I only get either 1 of those either the events I should be indexed or the events that the unmatch because I realize that transforming/extracting the data is just like piping the search in splunk that when you do a base search the transform that you can able to transform is only the base search and so on like when you do a nullQueue the other events that is ignore will not be retrieve because you already transform it in indextime

0 Karma

Ultra Champion

Which regex are you referring to?
Field extractions in props.conf, searchtime'rex' commands or nullQueue routing/event breaking in props/transforms?

0 Karma

Path Finder

For my case it is in forwarder indextime regex because usually forwarder doesnt extract data so I put regex into my transforms.conf and the match events will go to a certain index my problem is how can I get the unmatch events because im pretty sure that events is corrupted and I need that data for some validation.

0 Karma

Ultra Champion

Can you post the props.conf and transforms.conf stanzas you are referring to?

0 Karma

Path Finder

here is my props.conf

Sourcetype use by forwarder, use to separate all Sourcetypes in TX files

[unisourcetypetx]
CHARSET=AUTO
DATETIMECONFIG=CURRENT
NO
BINARYCHECK=true
SHOULD
LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true

TX Extraction

TRANSFORMS-forward-histmain = EXTRACTTXHISTMAIN
TRANSFORMS-set-sourcetype-histmain = SET
HISTMAIN
TRANSFORMS-forward-histsub = EXTRACTTXHISTSUB
TRANSFORMS-set-sourcetype-histsub = SET
HISTSUB
TRANSFORMS-forward-reghist = EXTRACTTXREGHIST
TRANSFORMS-set-sourcetype-reghist = SET
REGHIST
TRANSFORMS-forward-invheader = EXTRACTTXINVHEADER
TRANSFORMS-set-sourcetype-invheader = SET
INVHEADER
TRANSFORMS-forward-deposits = EXTRACTTXDEPOSITS
TRANSFORMS-set-sourcetype-deposits = SET
DEPOSITS
TRANSFORMS-forward-invitems = EXTRACTTXINVITEMS
TRANSFORMS-set-sourcetype-invitems = SET
INVITEMS

[TXCORHISTMAIN]
TRANSFORMS-corrupted-txhistmain = CORRUPTED
TXHISTMAIN
TRANSFORMS-set-sourcetype-corhistmain = SET_CORHISTMAIN

and my sample transforms.conf
[EXTRACTTXHISTMAIN]
REGEX = HISTMAIN\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\w+)?\s+(\w+)?\s+(\D?\d+\D\d+)?\s+(\D?\d+\D\d+)?\s+(\d+)?
FORMAT = Transact::$1 Branch::$2 Register::$3 Cashier::$4 Receipt::$5 TranDate::$6 TranTime::$7 RepDate::$8 Mode::$9 TranType::$10 Items::$11 Amount::$12 Diners::$13
WRITE
META = true
SOURCE_KEY = _raw

[CORRUPTEDTXHISTMAIN]
REGEX = HISTMAIN\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\w+)?\s+(\w+)?\s+(\D?\d+\D\d+)?\s+(\D?\d+\D\d+)?\s+(\d+)?
DEST
KEY = queue
FORMAT = nullQueue

Setting a TXsourcetype at index time extraction

[SETHISTMAIN]
REGEX = ^(HISTMAIN)
SOURCE
KEY = raw
DEST
KEY = MetaData:Sourcetype
FORMAT = sourcetype::TX_$1

That structure of transformation is by using unisourcetypetx sourcetype to separate 1 file with different column headers into different sourcetypes by extracting the match index.In this case when I want to filter the unmatch regex and set to nullQueue I get the corrupted events but the match events didnt get indexed and when I set to index match data the unmatch didnt get index because I used both indexQueue and nullQueue and according to several test I can only used 1 of it but I want both of it to be indexed. By the way In this test I used a batch file which monitors a single file with sourcetype already configured and will go directly to unisourcetypetx and will be extracted by transforms and props

0 Karma