Splunk Search

Value in location field gets truncated when search is ran

New Member

Hi,

In my Splunk logs, I have a field called location which stores values like"
SINGAPORE (ABC)
WASHINGTON DC (ABC)
HONG KONG (ABC)
NEW YORK (ABC)
HO CHI MINH CITY VIETNAM (ABC)

But when I run a search |stats count by location the table which is displayed is:
SINGAPORE (ABC) 500
WASHINGTON 300
HONG 700
NEW 600
HO 300

As you can see every value except "SINGAPORE (ABC)" is automatically getting truncated as "HONG" or "NEW".
This also has an impact on my dashboard visualization bar chart.

But when I right-click on "NEW" and view events the logs which are displayed has the whole value "NEW YORK".

I request your help in correcting this issue.

Thanks.

0 Karma

Motivator

A full example of your event could be handy. Depending on your full event data you can be a bit more precise with regex. You can use what ever precedes the location name and since you have parenthesis you can also use them as a boundary for your capture group

Example:
event text whatever pre location SINGAPORE (ABC) event text
event text other info pre location HO CHI MINH CITY VIETNAM (ABC) event text

Regex:
location\s+(?<location>[\w\s]+\([\w\s]+\))

Explanation:
Both names would be properly extracted since I bounded my capture group between "location" and a set of "( )" with whatever word and spaces inside. Whatever word composed by a-zA-Z0-9_ ( \w ) ou a blank character ( \s ) will be captured.
Live test here:
https://regex101.com/r/5lMFCJ/1

Hope this helps!

------------
Hope I was able to help you. If so, an upvote would be appreciated.
0 Karma

Builder

Hello @amahesh3 ,

Your field extraction is not created properly, because it does not appear to take into account locations with spaces in the name. You need to provide an example of a some events with locations with spaces in the name, your current extraction configuration and then someone can assist with the proper replacement for the field extraction.

Hope this helps.

New Member

Hi,
Can you please advise on how I can check the field extraction configuration ?

I tried searching around and came across this
(?i)^(?:[^ ]* ){2}(?:[+-]\d+ )?(?P[^ ]*)\s+(?P[^ ]+) - (?P.+)

Please let me know if this is correct and also explain to me how it is accommodating the space in "SINGAPORE (ABC)" and not the space in other location names

0 Karma

Explorer

First things first... The regular expression you pasted won't look right to anyone looking at it here because it got eaten by the site's comment formatting engine. To paste anything with unusual characters like stars or greater than or less than symbols in their original, unaltered form, you'll need to surround them with code tags like this:



(?i)^(?:[^ ] ){2}(?:[+-]\d+ )?(?P[^ ])\s+(?P[^ ]+) - (?P.+)


And then every character will appear exactly as it actually is at your end for other viewers, like this:

(?i)^(?:[^ ] *){2}(?:[+-]\d+ )?(?P[^ ])\s+(?P[^ ]+) - (?P.+)
(Neither of my examples here probably match your real regex, because your version didn't survive the site's formatting engine and I can't reliably guess what the correct regex actually looks like.)

Now, on to your issue.

Purely speculation, but I see in your regular expression above that it contains a {2} which means to look for the previous token "exactly two times". Look at the below:

New York (ABC)
1   2    3
Washington DC (ABC)
1          2  3
Singapore (ABC)
1         2
Hong Kong (ABC)
1    2    3
HO CHI MINH CITY VIETNAM (ABC)
1  2   3    4    5       6

What I'm guessing is your actual regex which matches "Singapore (ABC) " would not match "New York (ABC) ", or any of your other examples, because those others are a string containing non-space characters followed by a space character three or more times, instead of exactly two times.

That could be the problem if you let Splunk create the regex for the field extractions and the sample events you selected didn't happen include any locations with more spaces in the location names, Splunk may have done this without you realizing it because it generally tries to be as specific as possible based on your sample events when it creates the extraction regexes for you.

This may or may not solve the issue for you (I can't know without seeing the actual raw events in their actual format and without knowing your actual unaltered regex, but you could try changing the {2} in the regex you found to {2,} instead (adding the comma without another number after means "match the previous token 2 or more times" instead of just exactly two times as it currently does without the comma. In regular expressions {n,n} specifies a range of how many times the previous token should match. So for example if you wanted to match at least 3 but not more than 7 times, you would have {3,7}. Having the comma with only the first or second number means basically:


{5} - this is the same as "exactly", or "exactly 5 times", or =5
{5,} - this is the same as "equal to or greater than", or "5 or more times", or >=5
{,5} - this is the same as "less than or equal to", or "5 or fewer times", or <=5
{3,5} - this is the same as "from..to", or "3 to 5 times", or ">=3 and <=5"

0 Karma

New Member

If what you are saying is true, then I should be getting location like
WASHINGTON DC
HO CHI
HONG KONG
NEW YORK

I should be getting 2 words of each location right ?

0 Karma

Explorer

That's correct, but one of those "words" is your "(ABC)", so you will only get at most one name for each location based on what I can see and make out of your regex.

Edit: Actually, I just realized that what you're saying is correct, so in that case, I'm not sure what's going on. We'll need some sample raw events to compare with (if there's anything private/sensitive in them, just alter those items but keep the same formatting, i.e., upper case letters stay upper case, lower case letters stay lower case, numbers stay numbers, punctuation stays punctuation - and preferably the same punctuation so the regexes remain clear and answers can be more accurate.)

0 Karma

Builder

You still have not provided an example of a full event. When you do I can provide you a solution for your issue. If it contains sensitive information, just change the values, but keep the formatting.

0 Karma

Splunk Employee
Splunk Employee

Looks like the extraction is not accounting for spaces. Is this an automatic extraction or is it something you created?

0 Karma

New Member

Hi, I have not created any extraction it is happening automatically.
Also, The issue is not happening with SINGAPORE (ABC) which also has a space in between

0 Karma

SplunkTrust
SplunkTrust

What are the props.conf settings for that sourcetype?

---
If this reply helps you, an upvote would be appreciated.
0 Karma