I am using REST App (https://www.splunk.com/blog/2013/06/18/getting-data-from-your-rest-apis-into-splunk.html# ) to get data into Splunk. The App works well but my REST Api sends all the data from the day and I want to ingest only the data from last 1 hour. So I would like to filter events based on timestamp and not re-ingest the same data again.
I am not sure how can I do this filtering before the data is indexed. I checked transforms.conf however except for REGEX filtering I could not find timestamp filtering.
Just trim it in the python script inside of the app; the code is quite obvious and it would probably be 3 lines of code.
newList=[]
data_list=json.loads(str)
date_now=datetime.datetime.now().strftime("%d-%b-%Y")
hour_now=datetime.datetime.now().hour-1
for element in data_list:
date=datetime.datetime.strptime(element['usage_timestamp'],"%d-%b-%Y %H:%M:%S").date().strftime("%d-%b-%Y")
hour=datetime.datetime.strptime(element['usage_timestamp'],"%d-%b-%Y %H:%M:%S").hour
if date==date_now and hour==hour_now:
newList.append(element)
data=json.dumps(newList)
That sounds like a plan. The proof is in the testing. Does it dump right?
I would like to confirm my approach based on your advice.
I found rest.py in bin of the app.
Mostly "data" field in this python script looks like the one I need to modify.
The flowchart looks to me like:
1. Take data field and covert it into dictionary.
2. For each element of this dictionary do a date and time check.
3. Create a new dictionary with only the required timestamps (latest in this case)
4. Convert this dictionary into json object
5. set this json object as data variable