@UnivLyon2 if you want to handle typos and find out similar names you may have to try some clustering algorithm. You can definitely try the built in cluster command. Following is one run anywhere example SPL based on the sample data and use case provided. However, this may take care of only miss spelled names. For complex scenarios like name with missing letters, special characters, spaces etc you may have to further change the logic of pre-processing the data before feeding to cluster command. | makeresults
| fields - _time
| eval client_ip="10.10.10.10", number="3", login_list="Myself,Yourself,Myselv", continent="Somewhere", country="Here"
| eval login_list=split(login_list,",")
| fields - data
| mvexpand login_list
| eval splitLetters=mvjoin(split(login_list,"")," ")
| eval key=client_ip."-".continent."-".country."-".login_list."-".number."- ".splitLetters
| fields key
| cluster field=key t=0.6 showcount=true
| rex field=key "(?<client_ip>[^\-]+)\-(?<continent>[^\-]+)\-(?<country>[^\-]+)\-(?<login_list>[^\-]+)"
| stats list(login_list) as login_list count as number by client_ip continent country Following is a run anywhere dashboard which explains every step of above search as to how it arrives from Myself, Yourself, Myselv as 3 login list to only 2 i.e. Myself and Yourself. You can play around with cluster command sensitivity threshold and other names in the text box as comma separated list to see what level actually fits the need. PS: Since this is clustering algorithm based correlation, there may be situations where actually two or more different valid names which are similar as per configured threshold, may get clustered as 1. Following is the Simple XML dashboard run anywhere example from the screenshot above. <form theme="dark">
<label>Cluster to find similar names</label>
<!-- Independent search to dynamically set count of names in the login_list -->
<search>
<query>| makeresults
| eval number=mvcount(split("$login_list$",","))
</query>
<earliest>-1s</earliest>
<latest>now</latest>
<done>
<condition match="$job.resultCount$==0">
<set token="number">0</set>
</condition>
<condition>
<set token="number">$result.number$</set>
</condition>
</done>
</search>
<fieldset submitButton="false"></fieldset>
<row>
<panel id="panel_title">
<title>Find Similar Names for Grouping using Cluster Command</title>
<html>
<style depends="$alwaysHideCSSPanel$">
div#panel_title h2.panel-title{
text-align:center;color:#7ED2FF;font-weight:bold;
}
div.html-container{
display:flex;
}
div.html-container .html-header h3{
text-align:center;color:#7ED2FF;padding-right:10px;
}
div.html-container div.html-description{
padding-top: 10px;
}
div#panel_step2_input div.dashboard-panel,
div#panel_step5_output div.dashboard-panel{
border-style: solid;
border-color: green;
}
</style>
</html>
</panel>
</row>
<row>
<panel>
<html>
<div class="html-container">
<div class="html-header">
<h3>Step 1 - Add data:</h3>
</div>
<div class="html-description">
<code>
*login_list has comma separated names. **Threshold value is for Cluster command where 0.1 is highest sensitivity and 0.9 is lowest.
</code>
</div>
</div>
</html>
</panel>
</row>
<row>
<panel>
<input type="text" token="client_ip" searchWhenChanged="true">
<label>Client IP</label>
<default>10.10.10.10</default>
</input>
<input type="text" token="continent" searchWhenChanged="true">
<label>Continent</label>
<default>Somewhere</default>
</input>
<input type="text" token="country" searchWhenChanged="true">
<label>country</label>
<default>Here</default>
</input>
<input type="text" token="login_list" searchWhenChanged="true">
<label>login_list</label>
<default>foo,bar,baz</default>
</input>
<input type="dropdown" token="threshold" searchWhenChanged="true">
<label>Cluster sensitivity (0.1 - 0.9)</label>
<choice value="0.1">0.1</choice>
<choice value="0.2">0.2</choice>
<choice value="0.3">0.3</choice>
<choice value="0.4">0.4</choice>
<choice value="0.5">0.5</choice>
<choice value="0.6">0.6</choice>
<choice value="0.7">0.7</choice>
<choice value="0.8">0.8</choice>
<choice value="0.9">0.9</choice>
<default>0.6</default>
</input>
</panel>
</row>
<row>
<panel id="panel_step2_input">
<html>
<div class="html-container">
<div class="html-header">
<h3>Step 2 - Genrate data set:</h3>
</div>
<div class="html-description">
<code>
Generate data with login_list having comma separated names.
</code>
</div>
</div>
</html>
<table>
<search id="sSampleData">
<query>| makeresults
| fields - _time
| eval client_ip="$client_ip$", number="$number$", login_list="$login_list$", continent="$continent$", country="$country$"
| eval login_list=split(login_list,",")
| fields - data</query>
<earliest>-1s</earliest>
<latest>now</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
<row>
<panel>
<html>
<div class="html-container">
<div class="html-header">
<h3>Step 3 - Prepare Data:</h3>
</div>
<div class="html-description">
<code>
Split names in login_list as letters. Full names with spaces, special characters, too many mistakes etc. may need extra logic.
</code>
</div>
</div>
</html>
<table>
<search id="sGenKey" base="sSampleData">
<query>| mvexpand login_list
| eval splitLetters=mvjoin(split(login_list,"")," ")
| eval key=client_ip."-".continent."-".country."-".login_list."-".number."- ".splitLetters
| fields key</query>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
<row>
<panel>
<html>
<div class="html-container">
<div class="html-header">
<h3>Step 4 - Clustered Events based on similarity of login_list names:</h3>
</div>
<div class="html-description">
<code>
Threshold and login_list name play role here.
</code>
</div>
</div>
</html>
<table>
<search id="sCluster" base="sGenKey">
<query>| cluster field=key t=$threshold$ showcount=true</query>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
<row>
<panel id="panel_step5_output">
<html>
<div class="html-container">
<div class="html-header">
<h3>Step 5 - Final Result:</h3>
</div>
<div class="html-description">
<code>
Extract required fields client_ip, continent, country and show final count login_list
</code>
</div>
</div>
</html>
<table>
<search base="sCluster">
<query>| cluster field=key t=0.6 showcount=true
| rex field=key "(?<client_ip>[^\-]+)\-(?<continent>[^\-]+)\-(?<country>[^\-]+)\-(?<login_list>[^\-]+)"
| stats list(login_list) as login_list count as number by client_ip continent country</query>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>
... View more