I have a long, that gets pretty long, and currently splunk is ingesting it as a whole. this log gets up a couple hundred lines long, and there are multiple events within this log that I need to extract. I am currently using REGEX to do the extraction, but it is only pulling the most recent instance of the extraction and not extracting the other instances within the log.
For example, here is my extraction:
NOTE:\sPROCEDURE\s(?<procedure>\w+)\sused
And here is the log file that I am consuming.
NOTE: Deleting WORK.CONTENTS (memtype=DATA).
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
MACROGEN(CONTENTS_CNTR): data _null_ ;
MACROGEN(CONTENTS_CNTR): file "/idn/wsmis/SDPMON_Raw/Logs/SDPMONRaw_Job45_error_log_FDT20150423_RD20151116.txt" mod ;
MACROGEN(CONTENTS_CNTR): put "*** value for list_of_files cnt/freq:" @80 "8" @93 "***;";
ommiting 197 lines...
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MACROGEN(CONTENTS_CNTR): data _null_ ;
MACROGEN(CONTENTS_CNTR): set contents ;
MACROGEN(CONTENTS_CNTR): if _n_ = 1 ;
MACROGEN(CONTENTS_CNTR): call symput('no_obs',strip(put(NOBS,comma12.)));
MACROGEN(CONTENTS_CNTR): call symput('desc',"list_of_files_last");
In this example you can clearly see there there are two PROCEDURES
the first is called DATASETS
and the next is called CONTENTS
.
My extraction is only pulling out the DATASETS value, and then not pulling out the other. Should I be adding a setting to my sourcetype to allow for multiple values here?
Adding the search / index time extractions as requested:
search time settings:
EXTRACT-rT_cpUT = The SAS System used:\s+real\s+time\s+(?<totalRealTime>[^s]+)[^.*]+cpu\stime\s+(?<totalCPUTime>[^s]+)\s+
EVAL-totalCPUTime = replace(totalCPUTime, "^(\d{2})\.(\d{2})","00:00:\1.\2")
EXTRACT-proc = NOTE:\sPROCEDURE\s(?<procedure>\w+)\sused
EXTRACT-logFile = \/idn\/saslogs\/Altlogs_Linux\/(?<fileDate>\d+)\/(?<user>[^-]+)-(?<version>[^-]+)-\d+-(?<startTime>\d+)-PID(?<pid>\d+) in source
EXTRACT-logFile2 = \/idn\/saslogs\/Altlogs\/(?<fileDate>\d+)\/(?<user>[^-]+)-(?<version>[^-]+)-\d+-(?<startTime>\d+)-PID(?<pid>\d+) in source
index time settings:
NO_BINARY_CHECK=1
LINE_BREAKER = ((*FAIL))
SHOULD_LINEMERGE = false
TRUNCATE = 9999999
Thank you for any help!!
You need to add MV_ADD = 1
to the appropriate stanza in transforms.conf
. This does the same thing:
... | rex max_match=0 "NOTE:\sPROCEDURE\s(?<procedure>\w+)\sused"
Why don't you use the LINE_BREAKER expression to properly break your events (And what are you trying to archieve with LINE_BREAKER = ((*FAIL))
help here)?
unfortunatley, the logs are not clean enough to use a line breaker. Event start/stop is not clearly delineated.
Did you read the props.conf documentation carefully? There are a bunch of possibilities to break events (not only the LINE_BREAKER
😞
/edit I cut that out again, way too much ugly formatted text. Search for LINE_BREAKER, there are several pages regarding event breaking.
You need to add MV_ADD = 1
to the appropriate stanza in transforms.conf
. This does the same thing:
... | rex max_match=0 "NOTE:\sPROCEDURE\s(?<procedure>\w+)\sused"
I tried this, though it didn't seem to work. When I say this, I mean the 'rex' format you mentioned above. I didn't adjust this in the props.
You will need to add MV_ADD=1
to props.conf for the file to work correctly. Then you will have to use mv*commands to process the multi-valued 'procedure' variable.
Please provide the entire props.conf stanza for this sourcetype, if you're doing an index-time extraction.
If you're doing a search-time extraction, please provide the search.