I know this is a silly question but for some cases I need to know where the unmatched events go because my regex is to matched what I will index but there are some cases that the data will be corrupted so the events will not gonna match to my regex. For validation of the data I just need to see that corrupted events that didnt match my regex construction.
Hope you guys can help me with this.
If you are trying to look for events at search time, you can try the command "regex" and do a negative matching.
example to find matching events
<mysearch> | regex _raw="myregex"
and to find not matching events
<mysearch> | regex _raw!="myregex"
If you are trying to look for events at search time, you can try the command "regex" and do a negative matching.
example to find matching events
<mysearch> | regex _raw="myregex"
and to find not matching events
<mysearch> | regex _raw!="myregex"
Ok thanks thats what I wanted I thought because of my transforms.conf it didnt get index.
Ok, anything which matches the corrupted transform is dropped, and never indexed.
You will have no record of them in Splunk at all.
(Though if you still have the source log files, that data will still of course be in there)
Ahhmm.. ok sad to say I need to figure out how can it be indexed
Going forwards, you need to remove DEST_KEY = queue
and instead write it to an index, as per the previous transforms stanza, but the historic data is gone.
FORMAT = nullQueue
Hmmm... the reason why I put that is because when My regex match all the data it will go to the nullQueue and the only event that will index is the corrupted events
Note: the corrupted events doesnt have a pattern that is needed to match so thats why I used nullQueue because It index what is unmatch in the regex and that thing affects my indexQueue in indexing what I also needed events
Actually, just comment out TRANSFORMS-corrupted-txhistmain = CORRUPTED_TXHISTMAIN
from props.conf and it will end up in the same index, if that’s what you want?
If you want it in a different index, let me know.
I also tried that thing the result was the corrupted events didnt indexed because it didnt match the my regex in indexQueue part
Can you post the indexQueue config - that was not in your post.
Ohh that thing Sorry i just said indexQueue to make it more clear because my transforms doesnt have indexQueue but its functionality is also the same because of the WRITE_META = true that directs the matches data to the index
oh, sorry I misunderstood.
Just to check..
You match a line which starts with 'HISTMAIN' against two transforms - EXTRACT_TXHISTMAIN, and SET_HISTMAIN
The first one performs field extractions.
The second one sets the sourcetype to 'TXHISTMAIN'
When you talk about 'corrupted' events, do you mean lines which do not begin 'HISTMAIN', or do you mean events which DO match 'HISTMAIN' but don't match the extraction regex?
Yes thats right thats what exactly the problem I want to solve ... its like the line that matches the histmain but it got corrupted lets just say there are tabs that didnt generate or a field that is not its data type for example a transaction number field and because the tab was not generate the field got the wrong data for example the data generated will be a string instead of a digit or number... the unmatch data where the extraction did not met because of the corrupted events thats what I want to index without affecting the other extraction
Ok, so unless I am overlooking something obvious (it's been a long day) your config looks good to me.
What happens if you search sourcetype=txhistmain (NOT Transact=* NOT Branch=*)
over all time, do you see any events?
Hmmm thats also an option but for my case I have more than 10 billion events so if I search all time to figure out what corrupted events it will take too long before I get the result and besides it also has 100+ different fields..
10Bil - ha that's nothing 🙂
Pick a smaller range then - if you get any results then I you are already indexing the 'corrupt' data
Actually I didnt tried to search in prod server instead i used a test server before i push my configs I did a several test with corrupted events indicated in the file but that certain events did not indexed because it didnt match the histmain regex so its like a filtration process but I only get either 1 of those either the events I should be indexed or the events that the unmatch because I realize that transforming/extracting the data is just like piping the search in splunk that when you do a base search the transform that you can able to transform is only the base search and so on like when you do a nullQueue the other events that is ignore will not be retrieve because you already transform it in indextime
Which regex are you referring to?
Field extractions in props.conf, searchtime'rex' commands or nullQueue routing/event breaking in props/transforms?
For my case it is in forwarder indextime regex because usually forwarder doesnt extract data so I put regex into my transforms.conf and the match events will go to a certain index my problem is how can I get the unmatch events because im pretty sure that events is corrupted and I need that data for some validation.
Can you post the props.conf and transforms.conf stanzas you are referring to?
here is my props.conf
[unisourcetypetx]
CHARSET=AUTO
DATETIME_CONFIG=CURRENT
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true
TRANSFORMS-forward-histmain = EXTRACT_TXHISTMAIN
TRANSFORMS-set-sourcetype-histmain = SET_HISTMAIN
TRANSFORMS-forward-histsub = EXTRACT_TXHISTSUB
TRANSFORMS-set-sourcetype-histsub = SET_HISTSUB
TRANSFORMS-forward-reghist = EXTRACT_TXREGHIST
TRANSFORMS-set-sourcetype-reghist = SET_REGHIST
TRANSFORMS-forward-invheader = EXTRACT_TXINVHEADER
TRANSFORMS-set-sourcetype-invheader = SET_INVHEADER
TRANSFORMS-forward-deposits = EXTRACT_TXDEPOSITS
TRANSFORMS-set-sourcetype-deposits = SET_DEPOSITS
TRANSFORMS-forward-invitems = EXTRACT_TXINVITEMS
TRANSFORMS-set-sourcetype-invitems = SET_INVITEMS
[TX_CORHISTMAIN]
TRANSFORMS-corrupted-txhistmain = CORRUPTED_TXHISTMAIN
TRANSFORMS-set-sourcetype-corhistmain = SET_CORHISTMAIN
and my sample transforms.conf
[EXTRACT_TXHISTMAIN]
REGEX = HISTMAIN\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\w+)?\s+(\w+)?\s+(\D?\d+\D\d+)?\s+(\D?\d+\D\d+)?\s+(\d+)?
FORMAT = Transact::$1 Branch::$2 Register::$3 Cashier::$4 Receipt::$5 TranDate::$6 TranTime::$7 RepDate::$8 Mode::$9 TranType::$10 Items::$11 Amount::$12 Diners::$13
WRITE_META = true
SOURCE_KEY = _raw
[CORRUPTED_TXHISTMAIN]
REGEX = HISTMAIN\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\d+)?\s+(\d+\D\d+\D\d+)?\s+(\w+)?\s+(\w+)?\s+(\D?\d+\D\d+)?\s+(\D?\d+\D\d+)?\s+(\d+)?
DEST_KEY = queue
FORMAT = nullQueue
[SET_HISTMAIN]
REGEX = ^(HISTMAIN)
SOURCE_KEY = raw
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::TX$1
That structure of transformation is by using unisourcetypetx sourcetype to separate 1 file with different column headers into different sourcetypes by extracting the match index.In this case when I want to filter the unmatch regex and set to nullQueue I get the corrupted events but the match events didnt get indexed and when I set to index match data the unmatch didnt get index because I used both indexQueue and nullQueue and according to several test I can only used 1 of it but I want both of it to be indexed. By the way In this test I used a batch file which monitors a single file with sourcetype already configured and will go directly to unisourcetypetx and will be extracted by transforms and props