Getting Data In

Finding a timestamp in a large event

Communicator

Hi all,

I am putting some JSON events into Splunk which are rather large (can be upwards of 100K characters). This is due in part to the API that the data is being fetched from. I'm investigating other means of of cutting the events down to size, but the main issue I have is that timestamp recognition often fails with an event this large.

So the config itself is easy enough:

[my_sourcetype]
TIME_PREFIX = \"start\": \"
TIME_FORMAT=%s
MAX_TIMESTAMP_LOOKAHEAD=13
TRUNCATE=0

as the bit of JSON I'm looking for looks like this:

{"QueryTime": {"start": "1477390200000", "end": "1477390799999"}

Is there anything I can do to ensure the correct timestamp recognition of an event that is 50K+ characters long?

Thanks and best regards,
Alex

0 Karma

Super Champion

%s is 10 digit epoch time. the last 3 digits are milli seconds (%3N)
can you please try -

    TIME_FORMAT=%s%3N

for TIME_PREFIX, did you try including the QueryTime as well ?!?!

TIME_PREFIX={\"QueryTime\": {\"start\": \" 

0 Karma

Communicator

So I have had %s%3N before, but it didn't really make much odds either way -- though I'll put it back in for further testing.

Unfortunately it's not always in that order as it can just as easily output the data as:

{"QueryTime": {"end": "1477390799999", "start": "1477390200000"}

Which then means I have to apply some regex in order to get past this possibility.

0 Karma

Super Champion

lets try this regex...
TIME_PREFIX=.*start\":\s\"\d{13}
or
TIME_PREFIX=.*start\S\S\s\S\d{13}

0 Karma

Communicator

So I have much the same issue with this as with previous regex. It seems that the indexers will only look up to a given number of characters before it gives up, seems to be about 10K characters.

I've had other regex patterns that have looked for other timestamps in the JSON, but these seem even less reliable than just looking for the start time of the data set requested from the API.

0 Karma