All Posts

Find Answers
Ask questions. Get answers. Find technical product solutions from passionate members of the Splunk community.

All Posts

It's complicated Firstly, Splunk stores contents of raw data. Compressed. The compression ratio is more or less known in a typical text data scenario so we could estimate raw data usage. But tha... See more...
It's complicated Firstly, Splunk stores contents of raw data. Compressed. The compression ratio is more or less known in a typical text data scenario so we could estimate raw data usage. But that's definitely not all that Splunk stores about its indexed data. Firstly, it splits the data on major breakers and minor breakers and stores the resulting tokens along with "pointers" to the events they contain (there are some minute details about minor breakers but we'll not be digging into them here). So if you have an event Jan 23 2025 localhost test[23418] sample event Splunk will store the whole raw event in its raw event journal and will generate a "pointer" to that event within that journal (for the sake of this example we will assume the value of this "pointer" is 0xDEADBEEF; it doesn't matter what it looks like in reality internally) Addiitonally, Splunk will split the data and will add separate entries to its index with pointers to the original raw event from each of the split tokens. So the index will contain Token Pointers 2025 0xDEADBEEF 23 0xDEADBEEF 23418 0xDEADBEEF event 0xDEADBEEF jan 0xDEADBEEF localhost 0xDEADBEEF sample 0xDEADBEEF test 0xDEADBEEF Now if Splunk ingests another event Jan 24 2025 localhost test[23418] another sample event It will save it to the raw data journal, assign it another pointer - let's say it's 0x800A18A0. And it will update its index so that it contains now Token Pointers 2025 0xDEADBEEF,0x800A18A0 23 0xDEADBEEF 24 0x800A18A0 23418 0xDEADBEEF,0x800A18A0 another 0x800A18A0 event 0xDEADBEEF,0x800A18A0 jan 0xDEADBEEF,0x800A18A0 localhost 0xDEADBEEF,0x800A18A0 sample 0xDEADBEEF,0x800A18A0 test 0xDEADBEEF,0x800A18A0 So you can see that the actual index contents are highly dependent on the entropy of the data. If you have just one value or a small set of values which simply repeat throughout your whole data stream, the index will contain just a small set of unique values with a lot of "pointers" to the raw events. But if your events contain unique tokens, the index will grow in terms of indexed values and each of them will be pointing to just one raw event. So that's already complicated Additionally, if you create indexed fields, they are actually stored in the same index as the tokens parsed out of the raw event, they're just stored with the field name prefix. So if you created an indexed field called "mytestfield" with a value of "value1", it will be stored in the same index as the tokens, but it will just be saved as "mytestfield::value1". As an interesting trivia - indexed fields are undistinguishable of key::value tokens parsed out of raw data. So indeed, if you're creating indexed fields, you cause the index to grow. There is no simple linear dependency though since the growth depends on the cardinality of the field, the size of the field values (the size of the name itself too), and the number of events to which the index entry has to point. Additionally, Splunk stores some simple summarizing csv files (which are actually relatively negligible in size) as well as bloomfilter which is a kind of a simplified index containing just the tokens, without the relevant pointers. It might seem a bit redundant but is actually pretty useful - Splunk can determine whether to look for the term at all in the bucket without needing to process the full index which might be way bigger. So Splunk can simply skip searching through the particular bucket if it knows it won't find anything. So long story short - it's relatively complicated and there is no simple formula to give you a sure estimate how your data will grow if you - for example - add a single indexed field. The rule of thumb is that the "core" indexed data (the raw events along with essential metadata fields) is about 15% of the original size of the raw data and the indexes add another 35% of the original size of the raw data. But it's only the generalized estimation. There's no way to reliably calculate it beforehand since there are many factors that come into play.
Hi @Tafadzwa  If you download the "Splunk App for Content Packs" and then once installed, from the ITSI main menu, click Configuration > Data Integrations. Select Add content packs or Add structure... See more...
Hi @Tafadzwa  If you download the "Splunk App for Content Packs" and then once installed, from the ITSI main menu, click Configuration > Data Integrations. Select Add content packs or Add structure to your data depending on your version of ITSI. From here you should see the AppDynamics Content Pack.  Did this answer help you? If so, please consider: Adding karma to show it was useful Marking it as the solution if it resolved your issue Commenting if you need any clarification Your feedback encourages the volunteers in this community to continue contributing
Is the size of log after being stored in buckets compared to its raw size a metric I should monitor? This question came in my mind and the problem is I don't really know how to measure it, since fro... See more...
Is the size of log after being stored in buckets compared to its raw size a metric I should monitor? This question came in my mind and the problem is I don't really know how to measure it, since from the deployment/admin view, I can only view the size of a bucket. But a bucket can store logs from multiple hosts, and I don't know the size of the raw logs being sent from each host for a single bucket. So is there any formulas to calculate? AFAIK, the TRANSFORMS- function in props.conf is one of the main factors to increase size of log after being parsed, since it create index-time field extractions. Also, if there is none exact formula, has anyone calculate the log after being parse when using well-known app like parsing WinEventLog or Linux?
Well, we can't say retroactively what happened for sure. Syslog, especially UDP-transmitted one is sensitive to both network disruptions as well as receiver's performance. If the receiving Splunk in... See more...
Well, we can't say retroactively what happened for sure. Syslog, especially UDP-transmitted one is sensitive to both network disruptions as well as receiver's performance. If the receiving Splunk infrastructure listens for syslog directly with splunkd process, without external syslog daemons, that might have caused the receiver to be "overwhelmed" with a burst of data from other hosts and might have caused it to not process the incoming syslog data properly. Performance is one of the reasons why in production environment you generally shouldn't listen for syslog data directly with Splunk process. You should use an external syslog daemon. See https://docs.splunk.com/Documentation/SVA/current/Architectures/Syslog for possible syslog ingestion architectures.
Hi All,   Trying to find content pack for AppDynamics on Splunk Base. Kindly help.   Thanks
Hi PickleRick. Thank you for replying on the post. Our devices are sending syslogs to Splunk server over the network (there has not been, network issues). Secondly, we  our supplier noticed, that ... See more...
Hi PickleRick. Thank you for replying on the post. Our devices are sending syslogs to Splunk server over the network (there has not been, network issues). Secondly, we  our supplier noticed, that they was not recieving logs from one specefic host. And after some hours (approx 5), our Supplier was recieving logs from the specific host. While the supplier was not recieving logs from this host, they recieved a lot of logs from other hosts on our network. It happend around 04.47 (am) local time, on that time, there is not load on the network  Our supplier is maintaining indexes, and system work.  About the ingestion process stops, could that process stop for one host (one out of many), while the other hosts are not impacted ? Brgds DD
Hi @ArunkumarKarmeg  Sorry missed the last reply but I'm pleased you got it working!  Ye - in terms of APIs it doesnt make a lot of sense to have to query all users - If you have lots of users then... See more...
Hi @ArunkumarKarmeg  Sorry missed the last reply but I'm pleased you got it working!  Ye - in terms of APIs it doesnt make a lot of sense to have to query all users - If you have lots of users then this would be a lot of API calls just to get the groups. You'd think you would be able to get the role from a user list, or the user list from a role!  Did this answer help you? If so, please consider: Adding karma to show it was useful Marking it as the solution if it resolved your issue Commenting if you need any clarification Your feedback encourages the volunteers in this community to continue contributing
Hi @bill  If you're looking to see if the user is a Workflows Administrator then the following should work: | eval isAdmin=IF(typeof(mvfind('target{}.displayName', "Workflows Administrator"))=="Num... See more...
Hi @bill  If you're looking to see if the user is a Workflows Administrator then the following should work: | eval isAdmin=IF(typeof(mvfind('target{}.displayName', "Workflows Administrator"))=="Number","Yes","No")  Did this answer help you? If so, please consider: Adding karma to show it was useful Marking it as the solution if it resolved your issue Commenting if you need any clarification Your feedback encourages the volunteers in this community to continue contributing
Hi, Is this still open? DId we got a solution to extract complete server details monitored in APpdynamics. 
@livehybrid , Thanks a lot for your help on this.   Finally able to get the complete data of user list and their roles.   Still as a suggestion, this is very frustrating that to get a simple repo... See more...
@livehybrid , Thanks a lot for your help on this.   Finally able to get the complete data of user list and their roles.   Still as a suggestion, this is very frustrating that to get a simple report there is no out of the box option.   Thanks again.
What puzzles me here is why are you trying to split the data on the receiving end when you have control over the sending solution and it would be way easier and more maintainable to simply split the ... See more...
What puzzles me here is why are you trying to split the data on the receiving end when you have control over the sending solution and it would be way easier and more maintainable to simply split the array at the source and send each array member as separate event? Also remember then when sending to /event endpoint you're bypassing timestamp recognition unless you append that parameter which I always forget to the URI so you should send an explicit epoch-based timestamp along with your event. The upside to this is that you don't have to worry about date parsing in Splunk then.
We have automation to insert  /saved/searches endpoint and all is good.  Also current have quite lot of custom Splunk Enterprise Security (ESS) event-based detections handcrafted via the GUI in splun... See more...
We have automation to insert  /saved/searches endpoint and all is good.  Also current have quite lot of custom Splunk Enterprise Security (ESS) event-based detections handcrafted via the GUI in splunk cloud. (So can't directly put into savedsearches.conf) We have to automate these as they are not pure 'savedsearches'. We are following the ESCU  standards and use contentctl validate. All good till this stage But how to insert the ESCU detections into Splunk ESS? Which app to insert into? (SplunkEnterpriseSecuritySuite or DA-ESS-* type apps or can it be inserted into our own custom app itself?) Any API based automation into Splunk ESS is deeply appreciated thanks in advance
Before anything, let me first say that when you post JSON event sample, always use "Show raw text" before copying.  This helps others help you.  Secondly, as @bowesmana says, it is really unclear wha... See more...
Before anything, let me first say that when you post JSON event sample, always use "Show raw text" before copying.  This helps others help you.  Secondly, as @bowesmana says, it is really unclear what you are asking.  You already know the value "Workflows Administrator".  Do you mean to search for this value and display other related key-value pairs?  Or do you mean there are other possible values from the 3rd array element of target[] that you want to know how to reach that correct array element? If former, you need to specify which key-value pairs in that element are of interest.  If latter, there are many ways, including a method that does not do "extracting" because Splunk by default has done that for you.  But before doing that, you need to use Splunk's flattened-structure notation, not invented names like targetUserDisplayName. (Splunk's notation is target{}.displayName for this one.) Anyway, assuming the latter, @bowesmana already showed you several ways.  Here I first present a formulae approach to reach every JSON array node in SPL: spath + mvexpand.  But before I show any code, you need to perform the most critical task:  to understand how that element is different from other elements in the same array, all of them having a key displayName.  In order to make this determination, you need to carefully study the data.  The differentiating factor among those elements is the JSON key type in that array.  So, you would be looking for the element whose type is CUSTOM_ROLE. index=okta "debugContext.debugData.privilegeGranted"="*" | fields - target{}.* | spath path=target{} | mvexpand target{} | spath input=target{} | where type == "CUSTOM_ROLE" | rename actor.displayName as "Actor", displayName as "Target Name", alternateId as "Target ID", description as "Action", debugContext.debugData.privilegeGranted as "Role(s)" | table Time, Actor, Action, "Target Name", "Target ID", Action, "Role(s)" With this approach, you can handle any JSON array. If you don't want to (re)extract everything in the array - there are occasions when mvexpand can be too expensive, here is a quirky method that can do the same thing: capture the value of target{}.displayName and target{}.alternateId corresponding to target{}.type of CUSTOM_ROLE. index=okta "debugContext.debugData.privilegeGranted"="*" | eval type_index = mvfind('target{}.type', "CUSTOM_ROLE") | eval "Target Name" = mvindex('target{}.displayName', type_index) | eval "Target ID" = mvindex('target{}.alternateId', type_index) | rename actor.displayName as "Actor", description as "Action", debugContext.debugData.privilegeGranted as "Role(s)" | table Time, Actor, Action, "Target Name", "Target ID", Action, "Role(s)"
Yes, this is right. There's no copy/pasting. We may need to update some parameters to support larger upload, but apart from that we can simply upload the ES package from the UI, perform the setup and... See more...
Yes, this is right. There's no copy/pasting. We may need to update some parameters to support larger upload, but apart from that we can simply upload the ES package from the UI, perform the setup and deploy the Bundle.
Yes, I realized that . The data is processed by Lambda to extract only relevant information and then sent over HEC.  We did try with /raw and it instead sent log encapsulated in an root event field ... See more...
Yes, I realized that . The data is processed by Lambda to extract only relevant information and then sent over HEC.  We did try with /raw and it instead sent log encapsulated in an root event field like the below screenshot (masked some fields)-  I tried the following based on suggestion -   [source::http:lblogs] SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n\s]+){ SEDCMD-remove-extras = [\s\n\r]*\][\s\n\r\]*} NO_BINARY_CHECK = true TIME_PREFIX = \"timestamp\":\s+\" pulldown_type = true MAX_TIMESTAMP_LOOKAHEAD = 100 TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N TRUNCATE = 1000000   
Not totally sure I understand, but if you're trying to get the 3rd array element of target which corresponds to the Workflow admin, then this little snippet will get the JSON for that array element ... See more...
Not totally sure I understand, but if you're trying to get the 3rd array element of target which corresponds to the Workflow admin, then this little snippet will get the JSON for that array element | eval workflow_admin=spath(_raw, "target{}") | eval workflow_admin=mvmap(workflow_admin, if(tostring(spath('workflow_admin', "displayName"))="Workflows Administrator", 'workflow_admin', null())) There are probably a number of ways of getting at the JSON, but this works. Here's another way | eval workflow_admin=json_array_to_mv(json_extract(_raw, "target{}")) | eval workflow_admin=mvmap(workflow_admin, if(tostring(spath('workflow_admin', "displayName"))="Workflows Administrator", 'workflow_admin', null())) Once you have workflow_admin, you can manipulate/extract the fields as needed 
I am working on implementing this query, but I need to rename and standardize the output to the C_Label values so i can stats count on those. I need a count per source. I dont think i cram my eval st... See more...
I am working on implementing this query, but I need to rename and standardize the output to the C_Label values so i can stats count on those. I need a count per source. I dont think i cram my eval statements with the OR statements. Ill try to incorporate this in my query   Thanks
Hello, I am looking to add a particular value to an existing search of Okta data. The problem is I don't know how to extract the value which is on the same level as other values. The value I am look... See more...
Hello, I am looking to add a particular value to an existing search of Okta data. The problem is I don't know how to extract the value which is on the same level as other values. The value I am looking for is "Workflows Administrator". The existing search is: index=okta "debugContext.debugData.privilegeGranted"="*" | rename actor.displayName as "Actor", targetUserDisplayName as "Target Name", targetUserAlternateId as "Target ID", description as "Action", debugContext.debugData.privilegeGranted as "Role(s)" | eval Time = strftime(_time, "%Y-%d-%m %H:%M:%S") | fields - _time | table Time, Actor, Action, "Target Name", "Target ID", Action, "Role(s)" and sample data is { [-] actor: { [+] } authenticationContext: { [+] } client: { [+] } debugContext: { [-] debugData: { [-] privilegeGranted: Application administrator (all), User administrator (all), Help Desk administrator (all) } } device: null displayMessage: Grant user privilege eventType: user.account.privilege.grant legacyEventType: core.user.admin_privilege.granted outcome: { [-] reason: null result: SUCCESS } published: 2025-05-08T19:30:54.612Z request: { [-] ipChain: [ [+] ] } securityContext: { [-] asNumber: null asOrg: null domain: null isProxy: null isp: null } severity: INFO target: [ [-] { [-] alternateId: jdoe@company.com detailEntry: null displayName: John Doe id: 00umfyv9jwzVvafI71t7 type: User } { [-] alternateId: unknown detailEntry: null displayName: Custom role binding added id: CUSTOM_ROLE_BINDING_ADDED type: CUSTOM_ROLE_BINDING_ADDED } { [-] alternateId: /api/v1/iam/roles/WORKFLOWS_ADMIN detailEntry: null displayName: Workflows Administrator id: WORKFLOWS_ADMIN type: CUSTOM_ROLE } { [-] alternateId: /api/v1/iam/resource-sets/WORKFLOWS_IAM_POLICY detailEntry: null displayName: Workflows Resource Set id: WORKFLOWS_IAM_POLICY type: RESOURCE_SET } ] transaction: { [+] } uuid: 2c42-11f0-a9fe version: 0 }  Any help is appreciated. Thank you!
Understood.  To test this, I'll actually need to down an interface for at least 60 seconds to see the Down result.  I'll need to get a network engineer involved to test.  I will get back ASAP.
Try this query that does not use appends or transpose. (index=fortinet dlpextra IN (WatermarkBlock1,Log_WatermarkBlock2,Log_WatermarkBlock3,Log_WatermarkBlock4)) OR (index=035 "Common.DeviceName"="p... See more...
Try this query that does not use appends or transpose. (index=fortinet dlpextra IN (WatermarkBlock1,Log_WatermarkBlock2,Log_WatermarkBlock3,Log_WatermarkBlock4)) OR (index=035 "Common.DeviceName"="p151.d.com" OR Common.DeviceName="p1p71.c.com" "SensitiveInfoTypeData{}.SensitiveInfoTypeName"=*) OR (index=iron AutomaticClassification) OR (index=testing sourcetype="net:alert" dlp_rule="AZ C*") | eval type = case(index=fortinet, "Proxy", index=iron, "Email", index=035, "SFTP", index=testing, "Netskope", 1==1, "Unknown") | stats count by type