Getting Data In

How to resend lost data between two splunk servers?

Path Finder

Hi all,
consider the following scenario: there are two splunk infrastructures. The first (A) collects data from several forwarders and forwards a subset of this data to a second indexer (B). When B receives the data forwarded from A, it performs several index-time transforms (metadata changes like index. source, sourcetype, host) based on data from the received flow (who was the original host, source etc).
A lost connectivity with B for some days for network related issues and now B has a gap in forwarded data. Is there a way to fill this gap in some way? Consider that A is an indexer too, so it has all the data stored. Unfortunately, all the methods I tried (dump, exporttool, moving indexes) do not allow B to reprocess the data using the same index-time rules because the source data IS different (different source, different host etc).
I don't care if the imported data will count on indexed daily volume, as I can create chunks and import some each day.
If someone already faced this issue or has some suggestion, I would really appreciate.
Thank you

Mario

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Ciao Mario,

best wishes from your Brother Matteo. 🙂

Actually i have something similar in place where certain summary searches are routed to another cluster. In case of a downtime of that second cluster i need to backfill of course.

As you would like to backfill that data gap that resides only on A and should be ingested in B you can simply search the data you would like to backfill by a search command and set some additional meta attributes or overwrite existing ones using the collect command which will write into the correct destination index that does not neccesarly have to reside on A. Using transforms.conf, props.conf and outputs.conf you can route the events to B. I assume this search would be done from a Search Head or Search Head Cluster so the configuration would be even independent from the Indexer A configuration and should not be in conflict.

As an example:

search: index=i1 source=s1 sourcetype=st1 host=h1| fields _raw| collect index=i2 source=s2 sourcetype=st2 host=h2

Additionally you would need to have the correct configurations in place to forward the data.

props.conf
[st2]
TRANSFORMS-routing = route_backfill_to_b

transforms.conf
[route_backfill_to_b]
SOURCE_KEY = MetaData:Sourcetype
REGEX =st2
DEST_KEY = _TCP_ROUTING
FORMAT = cluster_b

outputs.conf
[tcpout:cluster_b]
autoLB = true
disabled = false
server = 1.1.1.1:9997, 2.2.2.2:9997

Of course you should be able to apply any additional transformations on the data if required either by overwriting _raw or by additional transforms.

The above should just give you a direction and has not been tested in that special case.

I hope that helps.

BR
Oliver

View solution in original post

SplunkTrust
SplunkTrust

Ciao Mario,

best wishes from your Brother Matteo. 🙂

Actually i have something similar in place where certain summary searches are routed to another cluster. In case of a downtime of that second cluster i need to backfill of course.

As you would like to backfill that data gap that resides only on A and should be ingested in B you can simply search the data you would like to backfill by a search command and set some additional meta attributes or overwrite existing ones using the collect command which will write into the correct destination index that does not neccesarly have to reside on A. Using transforms.conf, props.conf and outputs.conf you can route the events to B. I assume this search would be done from a Search Head or Search Head Cluster so the configuration would be even independent from the Indexer A configuration and should not be in conflict.

As an example:

search: index=i1 source=s1 sourcetype=st1 host=h1| fields _raw| collect index=i2 source=s2 sourcetype=st2 host=h2

Additionally you would need to have the correct configurations in place to forward the data.

props.conf
[st2]
TRANSFORMS-routing = route_backfill_to_b

transforms.conf
[route_backfill_to_b]
SOURCE_KEY = MetaData:Sourcetype
REGEX =st2
DEST_KEY = _TCP_ROUTING
FORMAT = cluster_b

outputs.conf
[tcpout:cluster_b]
autoLB = true
disabled = false
server = 1.1.1.1:9997, 2.2.2.2:9997

Of course you should be able to apply any additional transformations on the data if required either by overwriting _raw or by additional transforms.

The above should just give you a direction and has not been tested in that special case.

I hope that helps.

BR
Oliver

View solution in original post

Path Finder

Thanks a lot. A really clean and creative approach and I like it. I still see some issue in feeding the data seamlessly to the target infrastructure (I mean - like you say - I will have to mangle the data a little bit on arrival), but I see this a lot easier than passing through the nightmare of CSVs or JSON data backfilling. I will work on it and let you know how it goes.

Esteemed Legend

On Server A, use outputcsv to export the data, then ftp it to Server B and use oneshot which allows you to specify all the things (e.g. host) that are normally automatic:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/MonitorfilesanddirectoriesusingtheCLI

0 Karma

SplunkTrust
SplunkTrust

Two possibilities:

Export the data from Server A to CSV and then have Server B use a monitor stanza to read that off disk from somewhere, applying the right sourcetype and all that to it?

Also, it's possible (though I'm not positive) that you can use Eventgen for this. I think you'll still basically be exporting/importing, but I thought it worth mentioning.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!