About alvaromari83

alvaromari83 · ‎05-22-2019

Hello! I'm trying to generate some summarized data by using the collect command in my SPL queries. The event format by default is 05/20/2019 07:15:00 +0200, info_min_time=1.000, info_max_time=1558557310.000, info_search_time=1558557310.410, field1=value1 field2=value2 Some of this fields, are actually JSON objects, like: 05/20/2019 07:15:00 +0200, info_min_time=1.000, info_max_time=1558557310.000, info_search_time=1558557310.410, field1=value1 field2=value2 field3="{"subfield1":"subvalue1", "subfield2":"subvalue2",...}" So, to avois a mixture of formats in my collected event, I would like to index the results, using the collect command, BUT in JSON format for the whole event, something like: {"time":"05/20/2019 07:15:00 +0200", "info_min_time"=1.000, "info_max_time":1558557310.000, "info_search_time":1558557310.410, results: {"field1":"value1" "field2":"value2", field3="{"subfield1":"subvalue1", "subfield2":"subvalue2",...}"...}} Is such thing feasible? Some props tweaking? Thank you in advance!

alvaromari83 · ‎06-20-2018

Hello! This then would be something like: index=transactions [ | inputlookup dailyaccountIDs.csv | format maxresults=100000 ] | ... rest of the query This approach would for sure save some time in the subsearch filter, but keeps having the issue of using a subsearch, hitting the default 10,000 limit. So, format command with much higher limit size would be required... and then, the format parsing for the outer search it muuuch slower. I've made some test with a csv of 10000 accountIDs, and with: index=transactions [ | inputlookup dailyaccountIDs.csv | format maxresults=100000 ] | stats dc(accountID) takes 55 minutes!!!, while this full scan approach: index=transactions | lookup dailyaccountIDs.csv accountID OUTPUT isdaily | search isdaily=1 | stats dc(accountID) takes 75 seconds, which is the same time that a raw fullscan takes: index=transactions | stats dc(accountID) So, not big improvement...

alvaromari83 · ‎06-19-2018

Hello! We are using many savedsearches to perform daily detection queries over huge datasets. Concretely, the anatomy of our queries is always the same. We have transactions and events related to many IDs (for example, bank account movements, and the ID would be the bank account). So, our searches login wants to find "yesterday's bank accounts that have already done N or more transactions in the past". Therefore, this would be an example of our SPL query: index=transactions [ search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID] | ... rest of the query As you can see I'm using a first-pipe-subsearch filter approach to speed up queries and only search for yesterday IDs in the past... This is nice until you have scenarios with >10000 accountIDs with movements daily. In this case, I've tried to skip subsearch limits by using the format command: index=transactions [ search index=transactions earliest=-1d@d latest=-0d@d | fields accountID | dedup accountID | table accountID | format max_results=100000] | ... rest of the query But this still seems to be overkill. The limits are not hit thanks to format command, but the subsearch just dies and takes too long. Also, I'm not sure if exceding default limits is good practice... Then I started to think about options WITHOUT subsearches like index=transactions accountID | eval isFromYesterday = if( _time>relative_time(now(),"-1d@d") AND _time<relative_time(now(),"-0d@d"), true, false) | ...rest of the query But this is still overkill as it will scan the whole index for all accountID without need to do such thing... Can you share your advice on this issue? Thank you in advance!! Regards, Álvaro

alvaromari83 · ‎06-09-2018

I think I got the problem: First: accelerations were BAD defined: This is incorrect and acceleration will not be created: acceleration.acceleration1={ name: 1} This is correct: acceleration.acceleration1={ "name": 1} So, doublequoting was missing. Second: the way to call descending sort is quite strange: ascending: https://localhost:8089/servicesNS/nobody/kvstoretest/storage/collections/data/kvstorecoll?sort=name&skip=10&limit=10 descending: https://localhost:8089/servicesNS/nobody/kvstoretest/storage/collections/data/kvstorecoll?sort=name:-1&skip=10&limit=10 This sorting syntax is not documented anywhere in the KVStore API endpoint docs... so this was very confusing. Thank you!

alvaromari83 · ‎06-08-2018

Hello! We are using KVStore collections in our apps, making use of the Splunk Rest API Collection Endpoint. For one of the uses, we are fetching data from a kvstore with huge amount of records (500,000>) in a app report panel. To have a goot UX, we wanted to use the mongo collection pagination capabilities that the KVStore collection endpoint provides: For example: give me page 2 of 10 results per page, sorted by "name" ascending would be: curl -k -u admin:yourpassword \ "https://localhost:8089/servicesNS/nobody/kvstoretest/storage/collections/data/kvstorecoll?sort=name&skip=10&limit=10" But, when testing this with sort when the number of records to return is 50K or more (not so massive!) throws a mongodb code 17144 exception!: "Overflow sort stage buffered data usage of 33554495 bytes exceeds internal limit of 33554432 bytes" Googling for this, I found that the cause is that the sorted fields shall be INDEXED in the Mongo Collection, otherwise the sorting would be precached in memory with a 32MB limit. So, I tried to create a INDEX for the sorted field ("name" in the example), with the following property in the collections.conf file: acceleration.acceleration1={ name: 1} But... after restarting and repopulating the collection, all seems the same, and the 32MB error is still there! Like the acceleration is not related to the sorting indexing mongo needs... or the acceleration is not working at all. Have you experimented something like this, or can give advise about what could be happening? Thank you! Regards, Alvaro

alvaromari83 · ‎10-05-2015

Hello all! I'm implementing a "task ticketing model" in my Splunk apps. Based on roles, many users may be assigned to a task, update it, and store new task statuses and other task fields. I've been doing this with summary indexing (collect commands), but now that many users can generate new task, or operate over the same task concurrently, I'm having concurrency issues (duplicated ids, and other data integrity problems). As far as I know, apps like Enterprise Security use KVStore to handle this kind of multiuser workflow scenarios, but I'm not sure if this is the most reliable, concurrency failure free, and capable of have relatively big sizes/records (aprox 100000 per year). Performance is also an important factor. My other option is to set up an external relational SQL DBMS and use it through DB Connect. Any light on this to help me would be greatly appreciated! Thank you!

alvaromari83 · ‎06-28-2015

Hello everyone. I've been developing pretty complex apps to support fraud and incident investigations. The app started as auxiliary tools to read, gain insight and detect anomalies from massive and various data sources. That is, a typical Splunk usecase. But soon, requirements regarding investigation cycle management arose. Now we have the need to implement Splunk panels to view, but also having the function to generate and store new "investigation" events, something like a ticket system in a lifecycle. The first (and worst) approach here was to implement advanced xml panels, and queries to generate new sourcetypes for investigation events and store them with the collect command in a summary index. This has serious problems like: Input validation. Really difficult to implement in Advanced XML. The solution, for me, is switching to web framework and program it in MVC models with Java, pythong, django, etc. Transactional datamodels query performance. The resulting investigation data generated and indexed in the summary index are usually many events of the same investigation, where the newest event of each investigation is the current investigation state, and thus, the one to be actually used in the apps. Remember, no updating when using collect in Splunk. Any change will imply a new event collected. Not bad taking into account audit trail enforcements and so, but is a performance pain to retrieve eeeevery event in the query only to pipe something like | dedup investigation_id sortby - _time just next. Here I've been searching some kind of param of the search command (first one in the query) to take only ONE (the most recent) event by some criteria. Still no luck... but any idea or alternative would be appreciated. Atomicity of collect queries. Splunk's collect is not intended to be an SQL INSERT query, and so, it does not provide ACID security controls, like atomicity. The problem here is that in really complex data generation, atomicity is needed. For example, imagine that an investigation update, not only generates an event for actually updating the investigation, but also to update another sourcetype like confirmed fraud files or so. I'm implementing this with two consecutive collects, but if the first collect's query is OK, but the second one fails (or browser is closed, or whatever), then the indexed data for that investigation will be corrupted. No rollback: the first collect cannot be undone. This last issue got me thinking about better approaches than using collects and summary indexes. The second approach, and the most obvious too, is to use a secondary SQL database with DB Connect to store these investigation records. It would provide ACID security controls, and probably more speedy response than SPL queries to Splunk indexes. But before planning to implement and deploy this alternative, I would like to ask all you about opinions for better alternatives or other ways to do it, solving all the issues I stated above. What do you think? What would be your better approach? Is auxiliary SQL the way to go? PD: in my opinion, csv and kvstore are not considered to be good for this because it may be unreliable, little security, config bundle size problems prone-to, etc. Thank you!!! Alvaro

alvaromari83 · ‎06-28-2015

Ey, thanks (with many days of delay) for your detailed and useful answer (as always!). I will stick to custom JS in no new webframework panels, but what I understand here is that dynamic generated fields shall not be a good practice for dashboards and visualizations. Again, thank you!!

alvaromari83 · ‎05-31-2015

Hello all! I'm working with dynamic field names, i.e eval {data_name} = data_value will generate, for N different data_name entries, N fields with data_name value. Very useful when data fields are not static. But now I would like to work with custom css rendering for each column based on its value. I normally use Sideview Table module with the "columns.<fieldname>.style" param, but my fieldnames are dynamic. Any way to accomplish this? I was thinking about some king of wildcard approach like: columns.dynamicfieldexample*.style that would match all columns of fields starting with dynamicfieldexample Thanks!!

alvaromari83 · ‎09-26-2014

Nice! Many thanks! addinfo and loadjob covers my issue. However, I'm still wondering how to make both actions (tab clicks and timechart drilldown click) to be rendered in the same panel (replacing the visualizations, not "getting added below"), without using Gate modules... Thank you once again. Alvaro

alvaromari83 · ‎09-23-2014

Hello everyone, I'm having some issues with the following scenario. I have a panel where I want to display a table with the following postprocessed results of a "search A". Search A generates a complete result set and postProcess searches A1, A2, A3 and A4 generate filtered results from search A: A1, A2 and A3 will be displayed and navigated with a Tabs module, fast and easy, with the following, typical structure: SEARCH A QUERY tab_selector with tab values A1, A2 and A3 <module name="Switcher"> <param name="selectedGroup">$tab_selector$</param> <module name="PostProcess" group="A1"> <param name="search">POSTPROCESS A1 QUERY where(condition A1) | ... and display results</param> </module> <module name="PostProcess" group="A2"> <param name="search">POSTPROCESS A2 QUERY where(condition A2) | ... and display results</param> </module> <module name="PostProcess" group="A3"> <param name="search">POSTPROCESS A3 QUERY where(condition A3) | ... and display results</param> </module> </module> Then there is a second search, search B, which displays a timechart with a JSChart and, when clicked, generate a clicked.value variable I want to use to filter the last PostProcess (A4). Something like: where(conditionA4=$clicked.value$) | ... and display results When the search B timechart is clicked, the PostProcess A4 would ideally show the results in the panel automatically picking the A4 Tab selected. This is, however, not feasible as far as i know. So, I just override all the tabs module and generate a different panel for A4. The goal is: having a search A, generate 4 postprocess searchs. 3 of them (A1,A2,A3) would be switchable and selectable through a Tabs module. The 4th one, A4, would be selectable through a timechart graph drilldown, when clicking a value ($clicked.value$ variable inside JSChart) This is the pseudocode I've come to design: <module name="ValueSetter"> <param name="name">is_chartB_clicked</param> <param name="value">False</param> <module name="Search"> <param name="search">SEARCH A QUERY</param> <module name="Gate"> <param name="id">teleporter</param> <module name="Switcher"> <param name="selectedGroup">$is_chartB_clicked$</param> <module name="Tabs" group="False" layoutPanel="panel_row1_col1"> <param name="name">tab_selector</param> <param name="staticValues">with tab values A1, A2 and A3</param> <module name="Switcher"> <param name="selectedGroup">$tab_selector$</param> <module name="PostProcess" group="A1"> <param name="search">POSTPROCESS A1 QUERY where(condition A1) | ... and display results</param> </module> <module name="PostProcess" group="A2"> <param name="search">POSTPROCESS A2 QUERY where(condition A2) | ... and display results</param> </module> <module name="PostProcess" group="A3"> <param name="search">POSTPROCESS A3 QUERY where(condition A3) | ... and display results</param> </module> </module> </module> <module name="PostProcess" group="True" layoutPanel="panel_row1_col1"> <param name="search">POSTPROCESS A4 QUERY where(condition A4=$clickedValueFromChartB$) | ... and display results</param> </module> </module> </module> </module> </module> ... <module name="Search" layoutPanel="panel_row2_col1"> <param name="search">SEARCH B QUERY</param> <module name="JSChart"> <module name="ValueSetter"> <param name="name">is_chartB_clicked</param> <param name="value">True</param> <module name="ValueSetter"> <param name="name">clickedValueFromChartB</param> <param name="value">$clicked.value$</param> <module name="Gate"> <param name="to">teleporter</param> </module> </module> </module> </module> </module> This is not working because the PostProcess A4 coming from search B to retrieve losses the search A results after the Gate teleport. However, i need search B to retrieve the click.value from the timechart to make the postProcess A4 filtering... is there anyway to preserve on standby previous search results (search A results), to use later?? Any other idea to implement my main goal (render filtered results from different sources (Tabs and JSChart click drilldown in the same target panel) in a different way? Thank you a lot. Alvaro

alvaromari83 · ‎09-23-2014

Yep! Good to know, thank you for the answer and for your neat app!

alvaromari83 · ‎09-14-2014

Hello everyone! I'm using advance xml and sideview utils to implement some pretty panels. One of them renders a Table, with some results, including a placeholder field (named for example "PLACEHOLDER") to render Checkbox modules in the results table, one per row. As you know you can do this with something like: <module name="Table"> <module name="Checkbox" group="$row.field.PLACEHOLDER$"> <param name="name">row_selection</param> <param name="onValue">$row.field.Data_Field$</param> </module> </module> This will render a table with checkboxes in every row, and when you check the box in the i-row, the i-value in the field "Data_Field" in the table will be go downstream. From here, you can use it, teleport it, etc. My problem is i want to make MULTIPLE selections. This is, I want to get all the Data_Field values from all the rows where the checkbox is selected. For example, if Data_Field values is A for row 1, B for row 2 and C for row 3, if I check the boxes in rows 1 and 3, I want to get BOTH A and C values as the downstreamed data. But, instead, it just downstreams the data from latest row checked. The only workaround I've reached is to create a temporal variable in a csv, with an array or vector of values. When a checkbox is selected, it executes a search like | inputcsv temp.csv | eval selection=selection."$row.field.Data_Field$," | outputcsv temp.csv and when the box is unchecked: | inputcsv temp.csv | eval selection=replace(selection,"$row.field.Data_Field$,","") | outputcsv temp.csv Pretty ugly, so not using it at all. Is there any way to acomplish multiple row data downstream from modules inside Table module rows? Thanks!!

alvaromari83 · ‎01-14-2014

Allright, you nailed it! Your explanation absolutely makes sense, I will stream the pulldown after a search. Thank you!

alvaromari83 · ‎01-14-2014

Hello all! I'm implementing a search panel with 2 sideview pulldowns. First one is just made of 3 static options, that serve as arguments in the nested second pulldown module, which queries its values from a csv lookup input file, using a PostProcess module. The query for this inputlookup is: | inputlookup file.csv | where fuente="source1" | fields nombre valor This query is running fast as hell in the search app, as expected, as the csv itself is just a few rows with the following format: fuente , nombre , valor source1 , Matricula , cot_carplate source1 , Nombre , cot_nombre source1 , Documentos , cot_id soruce2 , Numero pol , pol_pol The view XML is the following:  fuente Fuente de datos: left Cotizaciones source1 Polizas source2 Siniestros source3 Busqueda libre  <module name="Pulldown"> <param name="float">left</param> <param name="name">valor</param> <param name="label">Parametro de busqueda:</param> <param name="postProcess"> <![CDATA[ | inputlookup file.csv | where fuente="$fuente$" | fields nombre valor ]]> </param> <param name="staticOptions"/>  <param name="template">$value$ =</param> <param name="valueField">valor</param> <param name="labelField">nombre</param When I load this view, the second pulldown population through postProcess takes a lot of time (almost 7-8 seconds). I'm on the last Sideview Utils version on Splunk 5.0. Any idea of what could be the issue? Thanks and regards!

Posts	15
Solutions	1
Karma Given	0
Karma Received	9
Member Since	‎12-12-2013

Online Status	Offline
Date Last Visited	‎06-05-2020 02:03 AM

Collect command to index in JSON format

Hit limits in the subsearch filters used to boost ...

Kvstore Collection endpoint sort limit

Implementing custom task handling across many user...

How to use Splunk for writing records?

Sideview Utils: How to get Sideview Table Module t...

Use Gate to generate a temporary variable from a s...

Table Module "per row" checkbox module downstreams

Sideview Utils PostProcess and local csv inputlook...

Collect command to index in JSON format

Re: Hit limits in the subsearch filters used to bo...

Hit limits in the subsearch filters used to boost ...

Re: Kvstore Collection endpoint sort limit

Kvstore Collection endpoint sort limit

Implementing custom task handling across many user...

How to use Splunk for writing records?

Re: Sideview Utils: How to get Sideview Table Modu...

Sideview Utils: How to get Sideview Table Module t...

Re: Use Gate to generate a temporary variable from...

Use Gate to generate a temporary variable from a s...

Re: Table Module "per row" checkbox module downstr...

Table Module "per row" checkbox module downstreams

Re: Sideview Utils PostProcess and local csv input...

Sideview Utils PostProcess and local csv inputlook...

Join the Conversation