I am a novice, experimenting with a free version of Splunk, and I have a twitter feed in a text file. A part of it looks like :
Name: The Last Word
Screen Name: TheLastWord
Text: .@lawrence anchors from LA tonight where it's in the 60s. In NYC, it's in the 30s and is supposed to snow. #luckyguy #lastword
Created At: Mon Mar 25 18:23:26 +0000 2013
Source: web
Id: 316254010745188352
(I do not have sourcetype : twitter in my version, so I had to make a new sourcetype).
Now I realize that the regex to extract hashtags is : #[^#\s]*\s
, but how do I get splunk to create a new field called hashtag, so that I can report of top hashtags etc ?
Thanks !
The examples on the rex doc might be useful: http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Rex
Example1: creates two new fields: 'from' and 'to'. You capture your matches by using parentheses(like in normal regex) and naming the field that will be captured inside angle brackets(prefixed by a '?' within the capture parentheses.
Thanks a lot this solved it. As I said, I am a novic. I used index=main sourcetype="twitter" | rex "#[^#\s]\s(?P
and if there is more than one hashtag per event?
Yes - you can do it in rex
as well - add max_match=x
to your rex statement, where x would be a number.
/k
In order to extract hashtag as a multivalue field, i.e. where a single event can contain several occurrences of the same field name, you should do it through a REPORT field extraction. This is a configuration directive in props.conf
, which refers to a section of transforms.conf
like so;
props.conf
[twitter]
REPORT-get_tags = twitter_tags
transforms.conf
[twitter_tags]
REGEX = #(\S+)\s
FORMAT = hashtag::$1
MV_ADD = true
/k
The examples on the rex doc might be useful: http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Rex
Example1: creates two new fields: 'from' and 'to'. You capture your matches by using parentheses(like in normal regex) and naming the field that will be captured inside angle brackets(prefixed by a '?' within the capture parentheses.