Hello,
I have recently started using Splunk and I think I have made good progress getting to grips with the basics by reading the documentation.
However, I have been struggling to create a search during the past few days that will identify events which match a certain set of criteria.
So here is an overview of what I am trying to achieve. I have 2 different hosts, lets call them host1 and host2.
host1 is responsible for logging certain types of events that occur when 'Objects' are created in the system. So for example, host1 logs information when 'Objects' are created such as the ObjectID, ObjectName, ObjectType. However, this information is all logged in an array format for each Object e.g
ObjectDetails = {"ObjectID": 1, "ObjectName": "TestObject", "ObjectName": "Car", "ObjectStatus": "Active" .....etc}.
host2 is then responsible for logging events that change the attributes of an 'Object'. So for example, if the status of an Object is changed this will be logged by host2. This is also logged in an array format e.g
UpdatedObjectInfo = {"Date": "01-01-2017", "ObjectID": 73, "ObjectName": "MyFavouriteBook", "ObjectName": "Book", "ObjectStatus": "Finished"...etc}
What I am trying to do for each ObjectType is return
1) The total number of objects created (in host1) in the past 30 days
2) The total number of Objects that have been updated to a status of 'Finished' (in host2) where the Object was created (in host1) during the last 30 days. I should note that there are a significant number of objects created and updated each month (>200,000). I am only interested in Objects that have been created AND updated in the past 30 days to a status of 'Finished'.
So I have tried to approach this is a number of different ways:
Firstly, I tried using a subsearch by extracting the ObjectID from each host and displaying the count of created tasks and updated tasks:
index=MyIndex host="host2" | eval UpdatedObjectStatus=spath(_raw, UpdatedObjectInfo.ObjectStatus) | where UpdatedObjectStatus="Finished" | eval ObjectID=spath(_raw, UpdatedObjectStatus.ObjectID) | join type=inner ObjectID [search index=MyIndex host="host1" | eval ObjectID=spath(_raw, ObjectDetails.TaskId) | fields + ObjectID] | stats dc(ObjectID)
However, this does not seem to be working because the search takes a very long time (>10 minutes) and returns 0 results.
However, if i shorten the timespan to 7 days, then I do get some results (approx 12,000) which is strange? Maybe I am reaching the limit for time/number of events in the subsearch.
I am just wondering if anyone has any suggestions on what I may be doing wrong? I also thought about using a transaction instead of a subsearch but since I have to extract the ObjectID from both hosts, I don't think this is an option as I could not get the below search working.
index=MyIndex (host="Host1 OR host="Host2") | eval UpdatedObjectStatus=spath(_raw, UpdatedObjectInfo.ObjectStatus) | where UpdatedObjectStatus="Finished" | eval ObjectID=spath(_raw, UpdatedObjectStatus.ObjectID) | eval ObjectID=spath(_raw, ObjectDetails.TaskId) | transaction ObjectID startswith="ObjectCreatedMessage" endswith="ObjectSuccesfullyUpdated"
Any help would be greatly appreciated
Thanks 🙂
You would definitely be hitting the subsearch limits if there are 200K+ results. Give this a try
index=MyIndex (host="Host1" OR host="Host2")
| eval inputfield=coalesce(UpdatedObjectInfo,ObjectDetails)
| spath input=inputfield
| rename inputfield.* as *
| eval createdObject=if(host=host1,objectID,null()
| stats latest(ObjectStatus) as ObjectStatus values(createdObject) as createdObject by objectID
| stats dc(createdObject) as "Objects Created" count(eval(ObjectStatus="Finished")) as "Objects Finished"