<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Need assistance dedup'ing data in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359139#M106182</link>
    <description>&lt;P&gt;The router name (rrw01p, rrw02p) is the key field. I can dedup by that, but I need a time element as well. If I just dedup the above data by router name, it will result in 1 event for rrw01p when they're were actually 2 events for rrw01p (18:29 on Jan 2 and 15:46 on Dec 7).&lt;/P&gt;</description>
    <pubDate>Tue, 20 Mar 2018 13:05:16 GMT</pubDate>
    <dc:creator>mjshoaf</dc:creator>
    <dc:date>2018-03-20T13:05:16Z</dc:date>
    <item>
      <title>Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359133#M106176</link>
      <description>&lt;P&gt;I need help figuring out how to correctly dedup the data below. The 10 log messages below represent 4 distinct events (data circuit outages) . How can I dedup this data to get the counts to be accurate? If I dedup by router name (e.g. rrw01p), it only results in 1 event for rrw01p. &lt;/P&gt;

&lt;P&gt;rrw01p - 2 events&lt;BR /&gt;
rrw02p - 1 event&lt;BR /&gt;
intrw02p - 1 event&lt;/P&gt;

&lt;P&gt;*&lt;EM&gt;**EDITED: I've grouped the messages to better reflect what I'm trying to communicate. I hope this helps to clarify.&lt;/EM&gt;* &lt;/P&gt;

&lt;P&gt;These 4 messages represent 1 outage event:&lt;BR /&gt;
Jan  2 18:29:29 rrw01p 2001: Jan  2 18:29:28: %BGP-5-ADJCHANGE: neighbor 199.x.x.13 Down BFD adjacency down&lt;BR /&gt;
Jan  2 18:29:29 rrw01p 1999: Jan  2 18:29:28: %BGP-5-ADJCHANGE: neighbor 152.x.x.73 vpn vrf SIP Down BFD adjacency down&lt;BR /&gt;
Jan  2 18:29:29 rrw01p 1997: Jan  2 18:29:28: %BGP-5-ADJCHANGE: neighbor 68.x.x.133 Down BFD adjacency down&lt;BR /&gt;
Jan  2 18:29:29 rrw01p 1995: Jan  2 18:29:28: %BGP-5-ADJCHANGE: neighbor 68.x.x.249 Down BFD adjacency down&lt;/P&gt;

&lt;P&gt;This 1 message represents 1 outage event:&lt;BR /&gt;
Jan  2 18:29:29 rrw02p 2158: Jan  2 18:29:29: %BGP-5-ADJCHANGE: neighbor 199.x.x.249 Down BFD adjacency down&lt;/P&gt;

&lt;P&gt;These 4 messages represent 1 outage event:&lt;BR /&gt;
Dec  7 15:46:57 rrw01p 1959: Dec  7 15:46:56: %BGP-5-ADJCHANGE: neighbor 152.x.x.73 vpn vrf SIP Down BFD adjacency down&lt;BR /&gt;
Dec  7 15:46:56 rrw01p 1956: Dec  7 15:46:56: %BGP-5-ADJCHANGE: neighbor 199.x.x.13 Down BFD adjacency down&lt;BR /&gt;
Dec  7 15:46:56 rrw01p 1954: Dec  7 15:46:56: %BGP-5-ADJCHANGE: neighbor 68.x.x.133 Down BFD adjacency down&lt;BR /&gt;
Dec  7 15:46:56 rrw01p 1952: Dec  7 15:46:56: %BGP-5-ADJCHANGE: neighbor 68.x.x.249 Down BFD adjacency down&lt;/P&gt;

&lt;P&gt;This 1 message represents 1 outage event:&lt;BR /&gt;
Dec  7 15:46:57 intrw02p 2761: Dec  7 15:46:56: %BGP-5-ADJCHANGE: neighbor 4.x.x.249 Down BFD adjacency down&lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 17:15:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359133#M106176</guid>
      <dc:creator>mjshoaf</dc:creator>
      <dc:date>2018-03-19T17:15:33Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359134#M106177</link>
      <description>&lt;P&gt;Hey&lt;/P&gt;

&lt;P&gt;what connects your events? Anything to pick up for that?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 18:19:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359134#M106177</guid>
      <dc:creator>tiagofbmm</dc:creator>
      <dc:date>2018-03-19T18:19:56Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359135#M106178</link>
      <description>&lt;P&gt;What are the individual field names? You mentioned router number.&lt;/P&gt;

&lt;P&gt;What constitutes a unique event? Each of these lines look distinct to me.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 19:13:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359135#M106178</guid>
      <dc:creator>BearMormont</dc:creator>
      <dc:date>2018-03-19T19:13:03Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359136#M106179</link>
      <description>&lt;P&gt;It looks like the message at the end is what makes 2 for events for rrw01p. So if you dedup on router name and that message you should get what you want. &lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 19:15:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359136#M106179</guid>
      <dc:creator>kmaron</dc:creator>
      <dc:date>2018-03-19T19:15:49Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359137#M106180</link>
      <description>&lt;P&gt;it would be good to understand which fields are being extracted from each event, as as well as what the string in the actual event data is.  For instance, if the timestamp isn't in the event itself then you can dedup on router name, _raw...&lt;/P&gt;

&lt;P&gt;if there are other fields being extracted such as neighbor, status (eg:Down) Message (Down BFD adjacency down or vpn vrf SIP Down BFD adjacency) you can then dedup on those as well..  if you don't see those field extractions, you can extract them using rex or regex commands.  &lt;/P&gt;

&lt;P&gt;This may help you get started to extract fields to allow you to dedup properly:&lt;/P&gt;

&lt;P&gt;yoursearch | rex field=_raw "(?%.&lt;EM&gt;):\sneighbor\s(?\d{1,3}.(\d{1,3}|x).(\d{1,3}|x).\d{1,3}|x)\s(?.&lt;/EM&gt;$)" | dedup change_type, neighbor_ip, router_message&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 18:32:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359137#M106180</guid>
      <dc:creator>damiensurat</dc:creator>
      <dc:date>2020-09-29T18:32:29Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359138#M106181</link>
      <description>&lt;P&gt;The router name (rrw01p, rrw02p) is the key field. &lt;/P&gt;</description>
      <pubDate>Tue, 20 Mar 2018 13:03:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359138#M106181</guid>
      <dc:creator>mjshoaf</dc:creator>
      <dc:date>2018-03-20T13:03:13Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359139#M106182</link>
      <description>&lt;P&gt;The router name (rrw01p, rrw02p) is the key field. I can dedup by that, but I need a time element as well. If I just dedup the above data by router name, it will result in 1 event for rrw01p when they're were actually 2 events for rrw01p (18:29 on Jan 2 and 15:46 on Dec 7).&lt;/P&gt;</description>
      <pubDate>Tue, 20 Mar 2018 13:05:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359139#M106182</guid>
      <dc:creator>mjshoaf</dc:creator>
      <dc:date>2018-03-20T13:05:16Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359140#M106183</link>
      <description>&lt;P&gt;Base search is 'index = network ADJCHANGE bgp_state="Down"' &lt;BR /&gt;
The router name is the 'host' field.  Other extracted fields are 'bgp_neighbor' and 'bgp_state'.&lt;BR /&gt;
Each router has multiple bgp neighbors. That's why I'm seeing multiple log messages for one event. In other words, the circuit goes down and each down neighbor results in a separate log message. But I want to count it as just one event. The router name (host) is the common field (along with time) that distinguishes one event from another.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 18:34:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359140#M106183</guid>
      <dc:creator>mjshoaf</dc:creator>
      <dc:date>2020-09-29T18:34:55Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359141#M106184</link>
      <description>&lt;P&gt;great... so now all you need to dedup on is the message:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;yoursearch | rex field=_raw "%.*(\d{1,3}.(\d{1,3}|x).(\d{1,3}|x).\d{1,3}|x)\s(?&amp;lt;bgp_message&amp;gt;.*)"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;the above should produce and extraction of the end of the event.. EG: &lt;BR /&gt;
Down BFD adjacency down&lt;BR /&gt;
vpn vrf SIP Down BFD adjacency &lt;BR /&gt;
etc..  &lt;/P&gt;

&lt;P&gt;search with dedup:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;yoursearch | rex field=_raw "%.*(\d{1,3}.(\d{1,3}|x).(\d{1,3}|x).\d{1,3}|x)\s(?&amp;lt;bgp_message&amp;gt;.*)"| dedup host, bgp_state, bgp_neighbor, bgp_message
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 20 Mar 2018 14:33:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359141#M106184</guid>
      <dc:creator>damiensurat</dc:creator>
      <dc:date>2018-03-20T14:33:20Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359142#M106185</link>
      <description>&lt;P&gt;on a side note, in the rex extraction I include the x in extraction for the ip address:&lt;BR /&gt;
(\d{1,3}.(\d{1,3}|x).(\d{1,3}|x).\d{1,3}|x)&lt;/P&gt;

&lt;P&gt;I wasn't sure if this was part of the actual message or if you were obfuscating the ip with the x's.  If you are unfamiliar with regular expressions the above looks for:&lt;/P&gt;

&lt;P&gt;\d - any digit&lt;BR /&gt;
\d{1,3} any digit 1-3 digits long&lt;BR /&gt;
(\d{1,3}|x) - any digit 1-3 digits long or x (lowercase)&lt;/P&gt;

&lt;P&gt;I'm not sure if you will have to use an additional \ to escape any special characters.. eg: the \d may need to become \d as I don't have any data to test with.  hope this helps!&lt;/P&gt;</description>
      <pubDate>Tue, 20 Mar 2018 14:40:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359142#M106185</guid>
      <dc:creator>damiensurat</dc:creator>
      <dc:date>2018-03-20T14:40:37Z</dc:date>
    </item>
    <item>
      <title>Re: Need assistance dedup'ing data</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359143#M106186</link>
      <description>&lt;P&gt;@mjshoaf have you tried &lt;CODE&gt;dedup&lt;/CODE&gt; by &lt;CODE&gt;_time&lt;/CODE&gt; and &lt;CODE&gt;host&lt;/CODE&gt; i.e. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;YourBaseSearch&amp;gt;
| dedup _time host
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You can try the following run anywhere search otherwise to pull the &lt;CODE&gt;date&lt;/CODE&gt; from the beginning of the event and override &lt;CODE&gt;_time&lt;/CODE&gt; with the same. It also extracts the &lt;CODE&gt;router&lt;/CODE&gt; as &lt;CODE&gt;host&lt;/CODE&gt; but you should already have the same extracted:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults
| eval data="Jan 2 18:29:29 rrw01p 2001: Jan 2 18:29:28: %BGP-5-ADJCHANGE: neighbor 199.x.x.13 Down BFD adjacency down;Jan 2 18:29:29 rrw02p 2158: Jan 2 18:29:29: %BGP-5-ADJCHANGE: neighbor 199.x.x.249 Down BFD adjacency down;Jan 2 18:29:29 rrw01p 1999: Jan 2 18:29:28: %BGP-5-ADJCHANGE: neighbor 152.x.x.73 vpn vrf SIP Down BFD adjacency down;Jan 2 18:29:29 rrw01p 1997: Jan 2 18:29:28: %BGP-5-ADJCHANGE: neighbor 68.x.x.133 Down BFD adjacency down;Jan 2 18:29:29 rrw01p 1995: Jan 2 18:29:28: %BGP-5-ADJCHANGE: neighbor 68.x.x.249 Down BFD adjacency down;Dec 7 15:46:57 rrw01p 1959: Dec 7 15:46:56: %BGP-5-ADJCHANGE: neighbor 152.x.x.73 vpn vrf SIP Down BFD adjacency down;Dec 7 15:46:57 intrw02p 2761: Dec 7 15:46:56: %BGP-5-ADJCHANGE: neighbor 4.x.x.249 Down BFD adjacency down;Dec 7 15:46:56 rrw01p 1956: Dec 7 15:46:56: %BGP-5-ADJCHANGE: neighbor 199.x.x.13 Down BFD adjacency down;Dec 7 15:46:56 rrw01p 1954: Dec 7 15:46:56: %BGP-5-ADJCHANGE: neighbor 68.x.x.133 Down BFD adjacency down;Dec 7 15:46:56 rrw01p 1952: Dec 7 15:46:56: %BGP-5-ADJCHANGE: neighbor 68.x.x.249 Down BFD adjacency down"
| makemv data delim=";" 
| mvexpand data
| rename data as _raw
| rex "(?&amp;lt;date&amp;gt;[^\s]+\s[^\s]+\s[^\:]+\:[^\:]+\:[^\s]+\s)(?&amp;lt;host&amp;gt;[^\s]+)\s"
| eval _time=strptime(date,"%b %d %H:%M:%S")
| dedup _time host
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 20 Mar 2018 14:53:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Need-assistance-dedup-ing-data/m-p/359143#M106186</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-03-20T14:53:25Z</dc:date>
    </item>
  </channel>
</rss>

