The SPL statement involving map function restricti...

jiaminyun

Hello teachers, I have encountered an SPL statement that involves restrictions on the map function. Currently, there is a problem of inaccurate data loss in the statistical results. Could you please advise on any functions in SPL that can replace map to achieve this? SPL is as follows:

index=edwapp sourcetype=ygttest is_cont_sens_acct="是"
| stats earliest(_time) as earliest_time latest(_time) as latest_time 
| addinfo 
| table info_min_time info_max_time earliest_time latest_time
| eval earliest_time=strftime(earliest_time,"%F 00:00:00") 
| eval earliest_time=strptime(earliest_time,"%F %T") 
| eval earliest_time=round(earliest_time) 
| eval searchEarliestTime2=if(info_min_time == "0.000", earliest_time, info_min_time) 
| eval searchLatestTime2=if(info_max_time="+Infinity", relative_time(latest_time,"+1d"), info_max_time)
| eval start=mvrange(searchEarliestTime2,searchLatestTime2, "1d")
| mvexpand start
| eval end=relative_time(start,"+7d") 
| where end <=searchLatestTime2 
| eval end=round(end)
| eval a=strftime(start, "%F") 
| eval b=strftime(end, "%F") 
| fields start a end b
| eval a=strftime(start, "%F") 
| eval b=strftime(end, "%F") 
| map search="search earliest=\"$start$\" latest=\"$end$\"  index=edwapp sourcetype=ygttest is_cont_sens_acct="是"
| dedup day oprt_user_name blng_dept_name oprt_user_acct
| stats  count as "fwcishu" by  day  oprt_user_name blng_dept_name  oprt_user_acct  
 | eval a=$a$   | eval b=$b$   
 | stats count as "day_count",values(day) as "qdate",max(day)  as "alert_date" by a b  oprt_user_name,oprt_user_acct
 " maxsearches=500000 
| where day_count > 2 
| eval alert_date=strptime(alert_date,"%F") 
| eval alert_date=relative_time(alert_date,"+1d") 
| eval alert_date=strftime(alert_date, "%F") 
| table a b oprt_user_name oprt_user_acct day_count qdate alert_date

I want to implement statistical analysis of data from 2019 to the present, where a user visits multiple times a day and counts it as one visit, to calculate the continuous number of visits by interval users every 7 days since 2019.

ITWhisperer

Since the first part is just determining values for earliest and latest, you might be able to avoid map like this

index=edwapp sourcetype=ygttest is_cont_sens_acct="是" [search index=edwapp sourcetype=ygttest is_cont_sens_acct="是"
| stats earliest(_time) as earliest_time latest(_time) as latest_time 
| addinfo 
| table info_min_time info_max_time earliest_time latest_time
| eval earliest_time=strftime(earliest_time,"%F 00:00:00") 
| eval earliest_time=strptime(earliest_time,"%F %T") 
| eval earliest_time=round(earliest_time) 
| eval searchEarliestTime2=if(info_min_time == "0.000", earliest_time, info_min_time) 
| eval searchLatestTime2=if(info_max_time="+Infinity", relative_time(latest_time,"+1d"), info_max_time)
| eval earliest=mvrange(searchEarliestTime2,searchLatestTime2, "1d")
| mvexpand earliest
| eval latest=relative_time(earliest,"+7d") 
| where latest <=searchLatestTime2 
| eval latest=round(latest)
| fields earliest latest]
| dedup day oprt_user_name blng_dept_name oprt_user_acct
| stats  count as "fwcishu" by  day  oprt_user_name blng_dept_name  oprt_user_acct  
 | eval a=$a$   | eval b=$b$   
 | stats count as "day_count",values(day) as "qdate",max(day)  as "alert_date" by a b  oprt_user_name,oprt_user_acct
 " maxsearches=500000 
| where day_count > 2 
| eval alert_date=strptime(alert_date,"%F") 
| eval alert_date=relative_time(alert_date,"+1d") 
| eval alert_date=strftime(alert_date, "%F") 
| table a b oprt_user_name oprt_user_acct day_count qdate alert_date

jiaminyun

Thank you for your reply, but the statement you provided cannot achieve the result of looping each start time and end time

bowesmana

Without knowing your data, I would suggest you start with

index=edwapp sourcetype=ygttest is_cont_sens_acct="是"
| bin _time span=7d
| stats dc(_time) as days_present earliest(_time) as earliest_time latest(_time) as latest_time count as "fwcishu" by  _time day  oprt_user_name blng_dept_name  oprt_user_acct  
| eventstats min(*_time) as all_*_time

which will give you a breakdown of all the data you need with the earliest and latest for each grouping.

It will count number of days per 7 day period (days_present) and group by week and all your other grouping parameters. You can calculate overall earliest/latest date with eventstats.

Then I would expect you can manipulate your data from that result set to get you what you want

Using map is not the right solution.

If you share some of the data and mock up an example of what you're trying to end up with, it would help.

jiaminyun

Thank you for your reply. I deeply apologize for the issue I described!
Our sample data is as follows:
2024-12-12 00:30:12 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes“
2024-12-12 08:30:14 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes "
2024-12-12 11:30:12 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes“
2024-12-13 15:30:55 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes "
2024-12-13 00:30:12 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes“
2024-12-14 19:30:30 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes "
2024-12-14 22:30:12 ", 0699075634," Liu Zhiqiang "," Logistics Department "," Yes“
The field title is:
opr_time oprt_user_acct oprt_user_name blng_dept_name is_cont_sens_acct

bowesmana

Can you describe your intended output - it's challenging to reverse engineer SPL to understand what you are trying to do - if you can say from your data what you would like to see from that output, it would be helpful.

Did you try the SPL I posted and if so, did it give you a starting point to produce your results?

bowesmana

Can you summarise what you are trying to do, your SPL contains some errors, e.g. using info_min_time in your mvrange() eval statement, which does not exist and that fact that you have max_searches as half a million indicates you're going about this the wrong way.

Describe the problem you are trying to solve, your inputs and your expected outputs.

jiaminyun

The SPL I provided is indeed not a problem with the production environment. I want to implement data statistics for the interval from 2019 to the present, where a user visits multiple times a day and counts it as one visit. I want to calculate the continuous number of user visits for the interval every 7 days since 2019.

PickleRick

You're overcomplicating your search.

If you want to calculate how many days during a week your users connected to a service there are probably several ways about it. The easiest and most straightforward would probably be to

| bin _time span=1d

to have all visits during the same day with the same timestamp (the alternative would be to use strftime)

Now you need to calculate different days in each week for each user

| stats dc(_time) by user _time span=1d

the alternative is the timechart command.

jiaminyun

Thank you, but the client wants to obtain dimensions every 7 days, with approximately 1200 result sets. The output results need to include: start time, end time, username, department, number of days visited, multi value query time, and alarm time

PickleRick

This seems to be different from your previous description. Counting is one thing, listing sessions is another. Furthermore, we don't know your data.

jiaminyun

感谢您的回复。对于我描述的问题，我深表歉意！
我们的样本数据如下：
2024-12-12 00：30：12 “， 0699075634，” 刘志强 “，” 物流部 “，” 是 “
2024-12-12 08：30：14 ”， 0699075634，“ 刘志强 ”，“ 物流部 ”，“ 是 ”
2024-12-12 11：30：12 “， 0699075634，” 刘志强 “，” 物流部 “，” 是 “
2024-12-13 15：30：55 ”， 0699075634，“ 刘志强 ”，“ 物流部 ”，“ 是 ”
2024-12-13 00：30：12 “， 0699075634，” 刘志强 “，” 物流部 “，” 是 “
2024-12-14 19：30：30 ”， 0699075634，“ 刘志强 ”，“ 物流部 ”，“ 是 ”
2024-12-14 22：30：12 “， 0699075634，” 刘志强 “，” 物流部 “，” 是 “
字段标题为：
opr_time oprt_user_acct oprt_user_name blng_dept_name is_cont_sens_acct

PickleRick

It doesn't seem to include any "start" or "end" time. It's just one timestamp. So you must think of a proper logic behind your request.

jiaminyun

Thank you for your response. To achieve this, we will run the query every 7 days in a loop from the end time until the earliest start time of the data, and write the results to the intermediate index

bowesmana

Can you edit and format your SPL as a code block using this symbol </> in the Body menu - makes it far easier to digest long SPL

jiaminyun

Thank you, I forgot the code format when using it less

The SPL statement involving map function restrictions may result in inaccurate and missing data in the statistics

using Splunk Enterprise

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?