Hi,
I have a couple of comma separated cisco log files which is suppose to have different set of headers or fields. The said log files have common fields like so:
header#1: Timestamp,RDR_ID,SUBSCRIBER_ID,CLIENT_IP
header#2: Timestamp,RDR_ID,SUBSCRIBER_ID,SKIPPED_SESSIONS,CLIENT_IP
sample data#1:
1361171830137,4042321984,001ffb25b1d1@smartbro.net,192.168.1.1
1361171830473,4042321984,001ffb0f90bb@smartbro.net,192.168.1.2
1361171831107,4042321984,001ffb0f90bb@smartbro.net,192.168.1.3
sample data#2
1361171830137,4042323000,001ffb25b1d1@smartbro.net,0,192.168.1.1
1361171830473,4042323000,001ffb0f90bb@smartbro.net,1,192.168.1.2
1361171831107,4042323000,001ffb0f90bb@smartbro.net,0.192.168.1.3
my props.conf
[smart_sce_sourcetype]
REPORTS-multi = Transaction_Usage_RDR, Block_RDR
my transforms.conf
[Transaction_Usage_RDR]
REGEX="\W4042323000,"
DELIMS=","
FIELDS="TIMESTAMP","RDR_ID","SUBSCRIBER_ID","CLIENT_IP"
[Block_RDR]
REGEX="\W4042321984,"
DELIMS=","
FIELDS="TIMESTAMP","RDR_ID","SUBSCRIBER_ID","SKIPPED_SESSIONS","CLIENT_IP"
The RDR_ID(2nd column of the actual data) determines w/c header to use. You'll notice this on my regex. The 2 sample data are indexed and both headers are generated but client_ip data is going on the skipped_sessions. Also some of the columns are missing. I removed the other headers for briefness of presenting the problem. Generally speaking the indexed data is messed up. Kindly advice.
If I understand you correctly you're trying to create a conditional extraction so when a line matches one regex, one delims-based extraction will be applied and if it matches the other regex the other extraction will be used. It doesn't work that way. (for good reasons - which one would Splunk decide to use if both regexes match?)
You can only define one delims-based extraction at a time, so given one sourcetype you can't have multiple extractions like that. What you could do is create two regex-based extractions instead that do the same thing:
[Transaction_Usage_RDR]
REGEX = ^([^,]+),([^,]+),([^,]+),([^,]+)$
FORMAT = TIMESTAMP::$1 RDR_ID::$2 SUBSCRIBER_ID::$3 CLIENT_IP::$4
[Block_RDR]
REGEX = ^([^,]+),([^,]+)([^,]+)([^,]+)([^,]+)$
FORMAT = TIMESTAMP::$1 RDR_ID::$2 SUBSCRIBER_ID::$3 SKIPPED_SESSIONS:$4 CLIENT_IP::$5
Just replace the commas with pipes - but you need to escape the pipes ("\|
") because pipes are special characters in regular expressions.
Finally got it working. Many many thanks Ayn ^_^
Just a follow up question, but this is with regards to another project, similar in nature; if the data delimiter is a pipe(not a comma) character like so "|" then I would need to replace the second comma with the said pipe character, i.e. ([^,])|([^,])|([^,]*| ...). So sorry, my regex know-how is a bit messy. tia(tnx in advance)
You could just change the +
sign to *
. +
means "1 or more of the preceding" whereas *
means "0 or more of the preceding" so if there's no match at all it should work fine anyway.
Hi,
This works, except when it encounters a blank(not space or not whitespace) just the comma(or null) like so ,,, data, it will not work. It will still index but some of the fields although not blank will be affected and will not be index if it falls on the same column with blank data 😞 I tried to include |(OR regex char) then \S on the regex but still not working. Kindly advice.