Splunk Search

Where to find unmatched regex events?

ejmin
Path Finder

I know this is a silly question but for some cases I need to know where the unmatched events go because my regex is to matched what I will index but there are some cases that the data will be corrupted so the events will not gonna match to my regex. For validation of the data I just need to see that corrupted events that didnt match my regex construction.

Hope you guys can help me with this.

Tags (2)
0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

If you are trying to look for events at search time, you can try the command "regex" and do a negative matching.

example to find matching events

    <mysearch> | regex _raw="myregex"

and to find not matching events

    <mysearch> | regex _raw!="myregex"

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

If you are trying to look for events at search time, you can try the command "regex" and do a negative matching.

example to find matching events

    <mysearch> | regex _raw="myregex"

and to find not matching events

    <mysearch> | regex _raw!="myregex"
0 Karma

ejmin
Path Finder

Ok thanks thats what I wanted I thought because of my transforms.conf it didnt get index.

0 Karma

nickhills
Ultra Champion

Ok, anything which matches the corrupted transform is dropped, and never indexed.

You will have no record of them in Splunk at all.
(Though if you still have the source log files, that data will still of course be in there)

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

Ahhmm.. ok sad to say I need to figure out how can it be indexed

0 Karma

nickhills
Ultra Champion

Going forwards, you need to remove DEST_KEY = queue
FORMAT = nullQueue
and instead write it to an index, as per the previous transforms stanza, but the historic data is gone.

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

Hmmm... the reason why I put that is because when My regex match all the data it will go to the nullQueue and the only event that will index is the corrupted events
Note: the corrupted events doesnt have a pattern that is needed to match so thats why I used nullQueue because It index what is unmatch in the regex and that thing affects my indexQueue in indexing what I also needed events

0 Karma

nickhills
Ultra Champion

Actually, just comment out TRANSFORMS-corrupted-txhistmain = CORRUPTED_TXHISTMAIN from props.conf and it will end up in the same index, if that’s what you want?

If you want it in a different index, let me know.

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

I also tried that thing the result was the corrupted events didnt indexed because it didnt match the my regex in indexQueue part

0 Karma

nickhills
Ultra Champion

Can you post the indexQueue config - that was not in your post.

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

Ohh that thing Sorry i just said indexQueue to make it more clear because my transforms doesnt have indexQueue but its functionality is also the same because of the WRITE_META = true that directs the matches data to the index

0 Karma

nickhills
Ultra Champion

oh, sorry I misunderstood.

Just to check..
You match a line which starts with 'HISTMAIN' against two transforms - EXTRACT_TXHISTMAIN, and SET_HISTMAIN
The first one performs field extractions.
The second one sets the sourcetype to 'TXHISTMAIN'

When you talk about 'corrupted' events, do you mean lines which do not begin 'HISTMAIN', or do you mean events which DO match 'HISTMAIN' but don't match the extraction regex?

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

Yes thats right thats what exactly the problem I want to solve ... its like the line that matches the histmain but it got corrupted lets just say there are tabs that didnt generate or a field that is not its data type for example a transaction number field and because the tab was not generate the field got the wrong data for example the data generated will be a string instead of a digit or number... the unmatch data where the extraction did not met because of the corrupted events thats what I want to index without affecting the other extraction

0 Karma

nickhills
Ultra Champion

Ok, so unless I am overlooking something obvious (it's been a long day) your config looks good to me.

What happens if you search sourcetype=txhistmain (NOT Transact=* NOT Branch=*) over all time, do you see any events?

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

Hmmm thats also an option but for my case I have more than 10 billion events so if I search all time to figure out what corrupted events it will take too long before I get the result and besides it also has 100+ different fields..

0 Karma

nickhills
Ultra Champion

10Bil - ha that's nothing 🙂
Pick a smaller range then - if you get any results then I you are already indexing the 'corrupt' data

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

Actually I didnt tried to search in prod server instead i used a test server before i push my configs I did a several test with corrupted events indicated in the file but that certain events did not indexed because it didnt match the histmain regex so its like a filtration process but I only get either 1 of those either the events I should be indexed or the events that the unmatch because I realize that transforming/extracting the data is just like piping the search in splunk that when you do a base search the transform that you can able to transform is only the base search and so on like when you do a nullQueue the other events that is ignore will not be retrieve because you already transform it in indextime

0 Karma

nickhills
Ultra Champion

Which regex are you referring to?
Field extractions in props.conf, searchtime'rex' commands or nullQueue routing/event breaking in props/transforms?

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

For my case it is in forwarder indextime regex because usually forwarder doesnt extract data so I put regex into my transforms.conf and the match events will go to a certain index my problem is how can I get the unmatch events because im pretty sure that events is corrupted and I need that data for some validation.

0 Karma

nickhills
Ultra Champion

Can you post the props.conf and transforms.conf stanzas you are referring to?

If my comment helps, please give it a thumbs up!
0 Karma

ejmin
Path Finder

here is my props.conf

Sourcetype use by forwarder, use to separate all Sourcetypes in TX files

[unisourcetypetx]
CHARSET=AUTO
DATETIME_CONFIG=CURRENT
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true

TX Extraction

TRANSFORMS-forward-histmain = EXTRACT_TXHISTMAIN
TRANSFORMS-set-sourcetype-histmain = SET_HISTMAIN
TRANSFORMS-forward-histsub = EXTRACT_TXHISTSUB
TRANSFORMS-set-sourcetype-histsub = SET_HISTSUB
TRANSFORMS-forward-reghist = EXTRACT_TXREGHIST
TRANSFORMS-set-sourcetype-reghist = SET_REGHIST
TRANSFORMS-forward-invheader = EXTRACT_TXINVHEADER
TRANSFORMS-set-sourcetype-invheader = SET_INVHEADER
TRANSFORMS-forward-deposits = EXTRACT_TXDEPOSITS
TRANSFORMS-set-sourcetype-deposits = SET_DEPOSITS
TRANSFORMS-forward-invitems = EXTRACT_TXINVITEMS
TRANSFORMS-set-sourcetype-invitems = SET_INVITEMS

[TX_CORHISTMAIN]
TRANSFORMS-corrupted-txhistmain = CORRUPTED_TXHISTMAIN
TRANSFORMS-set-sourcetype-corhistmain = SET_CORHISTMAIN

and my sample transforms.conf
[EXTRACT_TXHISTMAIN]
REGEX = HISTMAIN\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\w+)?\s+(\w+)?\s+(\D?\d+\D\d+)?\s+(\D?\d+\D\d+)?\s+(\d+)?
FORMAT = Transact::$1 Branch::$2 Register::$3 Cashier::$4 Receipt::$5 TranDate::$6 TranTime::$7 RepDate::$8 Mode::$9 TranType::$10 Items::$11 Amount::$12 Diners::$13
WRITE_META = true
SOURCE_KEY = _raw

[CORRUPTED_TXHISTMAIN]
REGEX = HISTMAIN\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\w+)?\s+(\w+)?\s+(\D?\d+\D\d+)?\s+(\D?\d+\D\d+)?\s+(\d+)?
DEST_KEY = queue
FORMAT = nullQueue

Setting a TXsourcetype at index time extraction

[SET_HISTMAIN]
REGEX = ^(HISTMAIN)
SOURCE_KEY = raw
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::TX
$1

That structure of transformation is by using unisourcetypetx sourcetype to separate 1 file with different column headers into different sourcetypes by extracting the match index.In this case when I want to filter the unmatch regex and set to nullQueue I get the corrupted events but the match events didnt get indexed and when I set to index match data the unmatch didnt get index because I used both indexQueue and nullQueue and according to several test I can only used 1 of it but I want both of it to be indexed. By the way In this test I used a batch file which monitors a single file with sourcetype already configured and will go directly to unisourcetypetx and will be extracted by transforms and props

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...