Getting Data In

Several question about index

indeed_2000
Motivator

Hi, I have index that call "myindex" and have several question about it:

1-how can i remove specific date range of specific index and force to reindex it?  (cli or web?)

2-how to view percentage of status of current indexing job?  (cli or web?)

3-how to force reindex specific directory? (cli or web?)

4-i have 2 seprate index (1-daily, 2-ondemand)

first one index this path /opt/daily, second index this path /opt/ondemand

every night a script sync daily path, and indexed correctly. the issue is when I put log of today on ondemand path it will index correctly but next day when daily script run, daily index not update correctly and just show log that belong after that on splunk!

 

e.g.

1-I've update ondemand path and it contain log og today from 00:00 to 11:00

2-next day after script run and daily path update on splunk only show from 11:00 to 23:59 

 

any idea?

Thanks,

Labels (1)
Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. As @gcusello already pointed out - as a general rule - what goes into index stays in the index. You can (if you have proper capability) do the delete command to mark some part of data as not available for search but in general it's not how you work with splunk. Yes, during the initial phases of deployment, especially when deploying inputs for new kinds of sources you sometimes point the data initially towards some temporary index and then, after initial testing, you do the delete command on this index just to make it "tidy" but it's a very specific use case. In normal production use you don't want to use the delete command at all.

2. Depends on what you mean by "percentage of status". And if you're refering to the indexing task as per indexers writing the data to the actual indexes or parsing the input queue or if you're refering to the ingesting part on the source side, especially reading from the source files in case of monitor inputs.

3. Depends on the source - if it's a kind of a "push" source (like syslog, or HEC), you have to simply push the data from the source again. If it's a file monitor input you have to either make the forwarder recognize the file as a new one (change the name if you're using crcSalt=<SOURCE> or reset the fishbucket for this particular file).

4. Honestly - I have no idea what you're trying to achieve. You can't move data from one index to another. At least not easily and not without incurring additional license usage.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000,

sorry if i start my answer with a few of phylosophy:

in Splunk you usually define indexes based on two rules:

  • different access rules require different indexes (this means that same access rules can be solved with the same index);
  • different retention requirements require different indexes (and obviously same requirements = same index).

I say this because it's prefereable to have less indexes as possible and not too many for many reasons (management, roll buckets, etc...).

Anyway, answering to your questions:

1)

You can logically remove events in a time period using the a search and the delete command (you must have the "can_delete" role to perform deletion, and remember to remeove this role for your user after the deletion, it's a dangerous role!),

But in this way you logically delete events not physically: events are only marked as "deleted" and remain in indexes until the bucket exceed the retention period.

You cannot physically delete events in a range period.

You can only delete events in buckets where the newest event exceed the retention period.

For reindexing your logs, you can manually do it (using the web) or copying the files to reindex in a new folder and adding a new temporary input that contains the option "crcSal = <SOUCE>".

Otherwise Splunk doesn't index twice a log.

2)

see in the Monitor Console the indexing section

3)

For reindexing you can manually do it (using the web) or copying the files to reindex in a new folder and adding a new temporary input that contains the option "crcSal = <SOUCE>".

Otherwise Splunk doesn't index twice a log.

4) 

at first, think to the hints of my introduction phylosophy.

about the issue of your logs, I think that you should redesign your inputs, having (if possible) only one source and index it once, not two different sources, probably with the same events, to index twice in two indexes; also because in this way you pay twice your logs.

Ciao.

Giuseppe

0 Karma

indeed_2000
Motivator

Hi @gcusello 
Thank you for answer, totally agree with you about segregate index.

1-would you please tel me how? with example i mean.
2-that section doesn't show percentage of each direcory. or index jobs.
3-how throgh the web I can reindex specific path?
4-as i mention I have two seprate index NOT one! 1-ondemand 2- daily

 

Thanks,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000,

I try to orderly answer to your questions:

1)

you can logically delete events from an index in this way:

  • add the role "can_delete" to your user,
  • run a search to identify the events to delete (e.g. index=your_index sourcetype=your_sourcetype earliest=-10d@d latest=-9d@d),
  • be sure that the results you have are the events to delete and not others!
  • run the same search adding at the end "| delete"
  • remove the role "can_change" from your user.

it's difficoult to give you an example of deletion!

2) 

In [Monitoring Console -- Indexing -- Performces ] you have all the queue statuses in this way you have the status of indexing.

It isn't possible to have the momentary status of indexing of a folder or an index, but why do you want to know this?

3)

to reindex a specific path, you have to use the [Settings -- Add Data] function but it's very slow because you have to index every file one by one.

For this reason the easiest approach is to copy the files to reindex in a different folder and create an additionl input (by conf file) adding the option "crcSal = >SOUCE>".

4)

data in your two indexes are different or are the same?

if they a re the same, why to index them twice?

Anyway, it isn't a good approach to sync log files in two different folder to index, as I said, redesign your input approach, e.g. you could put the events to index in one folder and leave Splunk to index them when they arrive, without syncing.

Ciao.

Giuseppe

0 Karma

indeed_2000
Motivator

@gcusello 
1-ok now got it.
2-this is simple thing that I need to know which file indexed and what is the percentage of each files, how much remain to index each files.
3-reindex mean already index and i do something that clear data that already locate in index and reindex it. not add input source from scratch. (seems splunk wont had this feature so we can suggest)
4-same data on different index. beacause ondemand log copy by demand of users and copy to ondemand path, and at the end of the day whole data copy to daily path.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

2. You can verify the status of inputs by invoking "splunk list inputstatus" (if I remember correctly). The command must be run on the component responsible for ingesting particular file (so if you run the command on indexer you won't get status of files ingested on forwarders). You do understand splunk architecture, do you?

4. As I said before - there is no way to "move" data betweej indexes. You could "copy" the events using "collect" command and then use delete on the source index but:

1) it's very "unsplunky"

2) it can be very hard to do it reliably (make sure that all events are copied, that no events are left "dangling", that all events that had been copied are deleted and you don't delete too much data)

3) The data you write with collect (unless the events have the "stash" sourcetype) count against your license again so this way you effectively needlessly double your license consumption

4) There are much easier ways to limit access to only recent data (just restrict your users to use short timerange and you're done). And if it's not an permission-level requirement but simply a "convenience feature", just learn to use timerange specifiers in searches.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000,

2)

index phase is very quick so it's difficoult to follow the indexing status of each file, for this reason I don't understand why you would do this.

3)

if you use web, you don't create any input.

Using the other approach (copy files in a new folder) you have to create a new temporary input that you can delete at the end ot the job.

4)

As I said, I hint to redesign your input because you pay twice these logs and give you and additional complication probably not requested.

Ciao.

Giuseppe

0 Karma

indeed_2000
Motivator

@gcusello 

2-in case you have lots of huge zip file that daily copy on path and need to extract and index, it take time to index them. that why I need to know that.

3-it's not clean way for me and other users, i'm looking for simple way that able us through the web UI, because users doesn't have direct access to splunk server files and storage that log are locate on it.

4-actually it's my fault and after increase transaction limit this issue fix. Solved: same query different result - Splunk Community

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000,

2) 

in my opinion, it isn't so relevant this analysis, but surely it's relevant for you, it's only for discussing.

Anyway, for my knowledge, there isn't ant function to measure which perc of a file is indexed.

3)

as I said, the easiest way is:

  • to copy files to reindex in a different folder,
  • create a temporary input (using inputs.conf file) with crcSalt = <SOUCE> option,
  • restart the Splunk server,
  • when finished, delete temporarey input,
  • delete temporary folder and files.

You could also use anothe Splunk machine (also temporary created for this target) configured to forward logs to Indexers, in this way you don't give problems to users with restarts.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...