Initially I used the python script to create a log handler to send out JSON formatted log message, but I notice that most of my events had 60 to 70 JSON objects. Some of my events have a single JSON object, which is what I need.
Next I made bash script which uses curl to send each message separately and I got the same thing. I don't understand what's going on here because I'm making a new connection each time ( for each event ) yet sometimes the events are stored as compilations.
$ head -100 test.log | grep Completed
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-287d3871","timestamp":"634908540742893112","message":"SinterConsumer: Completed"}
$ cat test.log | grep Completed | wc -l
197
% cat test.log | grep Completed | while read line ; do curl -u x:$TOKEN "https://api.splunkstorm.com/1/inputs/http?index=XXXXXXXXX&sourcetype=json" -H "Content-type: text/plain" -d "$LINE"; done
The problem here is when I do something like:
spath "instanceID" | search "instanceID"="i-2a7d3873"
I get 6 results ( events ), when in reality there are 197. I pasted event #2 and event #3 so you I can see what I'm talking about.
2 » 2/7/13
12:01:42.000 AM
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
{"instanceID":"i-2a7d3873","timestamp":"634908540715990001","message":"SinterConsumer: Completed"}
Show all 67 lines
host=.... Options|
sourcetype=json Options|
source=.... Options
3 » 2/7/13
12:01:35.000 AM
{[-]
instanceID : "i-2a7d3873",
message : "SinterConsumer: Completed",
timestamp : "634908540715990001"
}
Show as raw text
host=.... Options|
sourcetype=json Options|
source=.... Options
json
is not a supported sourcetype, it means that splunk will try to guess the timestamp / linebreaking.
please try with one of the 3 supported json sourcetypes, probably the third one.
json_no_timestamp
json_auto_timestamp
json_predefined_timestamp
see http://docs.splunk.com/Documentation/Storm/latest/User/Sourcesandsourcetypes