I'm working on a corporate Splunk instance where we do not have access to rename fields when indexing, or make any similar modifications due to security and compliance requirements.
I'm trying to create a timechart based on the number of events per hour by host
. My issue is that the JSON data has a host field in addition to the Splunk built in host
field.
IE a sample event looks like:{"time":"2019-04-05T21:50:09.925Z","severity":"INFO","duration":25.02,"db":10.23,"view":14.79,"status":200,"method":"GET","path":"/api/v4/project/1","params":[],"host":"my_server_1","ip":"1.2.3.4, 4.5.6.7","ua":null,"route":"/api/:version/projects/:id","user_id":12,"username":"smithers","queue_duration":4.35,"magic_calls":0}
My search looks like: index="my_index" host="prd-srv-00*" source="/var/log/my_program/http_json*" | timechart span=1h count by host
When I do this it combines the hosts the logs came from (built in host field) and hosts listed in the data (host field in the json).
If I try to filter out the hosts from the data, it removes the events from the built in host field as well. IE: index="my_index" host="prd-srv-00*" AND host !="0.0.0.0" source="/var/log/my_program/http_json*" | timechart span=1h count by host
I have also tried to use ...| where host !="0.0.0.0" | ...
but this has the same result.
Any advice on a solution or workaround to handle this at search time? IE, can I rename to column when searching, etc?
Thanks in advance for any help.
After fighting with this of a day and a half I finally found a solution that will work. It has a bit more hard coding with the server names than I wanted, but good enough for now:
index="my_index" host::prd-srv-001 OR host::prd-srv-002 OR host::prd-srv-003 OR host::prd-srv-004 source="/var/log/my_program/http_json*" | timechart usenull=f useother=f cont=true span=1m count by host | fillnull value=0 | table _time, prd-srv-001, prd-srv-002, prd-srv-003, prd-srv-004
As noted in my previous comments, the rex mode=sed stuff didn't work when I included a timechart.
Be sure to click Accept
on your answer and UpVote
any other answers or comments that helped you get to this.
Here are some options.
1: To ensure searching by the indexed host, you can use ::
syntax like index=foo host::bar
.
2: To ensure that the automatic KV_MODE extraction does not happen, polluting your host
value, run your search in Fast mode
.
3: To keep the KV_MODE extractions, but not the host
one, add this to your search: ... | rex mode=sed "s/\"host\"/\"json_host\"/g"
I think #3 is your ticket.
Thanks for the response, this seems close, but isn't exactly what I need.
When I try this I have:
index="my_index" host=prd-srv-00* | rex mode=sed "s/\"host\"/\"json_host\"/g" | timechart span=1m count by host | fillnull value=0
This returns the correct number of events under the Events tab, it still isn't right, see the data below, you can see the event shows it is updated with json_host
instead of host
, but if you look at the bottom, it appears the rex mode change doesn't take affect until after the search:
{"time":"2019-04-05T21:50:09.925Z","severity":"INFO","duration":25.02,"db":10.23,"view":14.79,"status":200,"method":"GET","path":"/api/v4/project/1","params":[],"host":"my_server_1","ip":"1.2.3.4, 4.5.6.7","ua":null,"route":"/api/:version/projects/:id","user_id":12,"username":"smithers","queue_duration":4.35,"magic_calls":0}
Show syntax highlighted
host = prd-srv-008 host = my_server_1 method = GET
So I guess at this point I"m looking for one of two things:
1. Do the rex mode before the search, like I said now it appears to happen after
2. In the Visualization hide a data series (my_server_1 in this case)
I prefer number 1 as it will require fewer manual updates in the future.
I'm continuing to play with this and do some searches, but so far I haven't had any luck.
Thanks again for any help!
Something interesting I found, if you are not trying to visualize the data it works as woodcock expliaed in number three, IE: index="my_index" host=prd-srv-00* | rex mode=sed "s/\"host\"/\"json_host\"/g"
But as soon as you add the timechart, either in the query or by right clicking on host it adds the host field from the data in. Very odd.
Any help is much appreciated!
@evbtbw92 could you explain the corporate requirement for not changing field name when there can be two two different values for host field where both are valid values? I think this is incorrect requirement. Is there any one host value that you need or both?
If you need only the host value from JSON data, you should correct host metadata while indexing, so that searches work faster.
If you need both the values, you should have one of the fields as host and other as something else, maybe Host
, since Splunk field names are case-sensitive (using transforms.conf
).
Hi,
You can use regex to extract host from actual raw data in new field, try below query which will extract hostname from raw data into new field called ext_host
.
Based on sample event you have provided, below query will extract my_server_1
in ext_host
field.
<yourBaseSearcg>
| rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)"
So your query will be like this
index="my_index" host="prd-srv-00*" source="/var/log/my_program/http_json*"
| rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)"
| search ext_host!="my_server_1"
I had a similar issue at How can I search for the field host with multiple values?
Thank you very much for the reply! I put more details in the other answer (I had to pick one), but neither seems to work for me.
When I tried your approach:
index="my_index" host=prd-srv-00* AND ext_host!="my_server_1" | rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)" | timechart span=1m count by host | fillnull value=0
It didn't return any values. This seems to be due to the rex field update not happening until after the search.
Any other thoughts?
Thanks again for any help!
Ah yes totally forgot that, you need to search after rex.
index="my_index" host="prd-srv-00*" source="/var/log/my_program/http_json*"
| rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)"
| search ext_host!="my_server_1"
That still returns no results. My guess is because ext_host
is still set for each event, therefore when the search happens it is excluding all events.
So how many events do you have which does not contain my_server_1
in your raw data ? If you are playing with only 1 sample event which you have provided then it will not return any result because you are searching for ext_host does not equal to my_server_1
I'm looking at a small number of events for testing, about 400. Some have my_server_1 and some have my_server_2, so it would return values.
The problem I have is that all the events come from the host (Splunk side) of prd-srv-008, but also have the other field set.
In my graph, for the first minute it is showing:
prd-srv-008 --COUNT: 432
my_server_1 --COUNT: 320
my_server_2 --COUNT: 112
if you notice my_server_1 and my_server_2 always adds up to the total from prd-srv-008