Getting Data In

Some of my data does not have the correct sourcetype. Can I change it?

Jaci
Splunk Employee
Splunk Employee

Is there a way to export the data that isn't correct then re-import it using the correct sourcetype? If not, is there another way to change the sourcetype after the data has been indexed?

Tags (1)
1 Solution

jrodman
Splunk Employee
Splunk Employee

The easiest method is to wipe the data and reindex.

Wiping the data can be global (splunk clean eventdata -index myindex) or more focused (splunk search "some data | delete"). The full wrinkles of these methods are discussed elsewhere.


Another means is sourcetype renaming, if you want to alias an entire sourcetype to another one you can do this, by eg, in props.conf:

[wrong_sourcetype]
rename = right_sourcetype

This clearly doesn't work if your [wrong_sourcetype] is a valid sourcetype on its own.


It's also possible to dump a bucket to a csv format, manipulate that, and then generate a new bucket from the modified or filtered csv data. This is sort of, 'for wizards'.

The command to emit a bucket to csv is splunk cmd exporttool bucketname filename.csv -csv To generate a new bucket from the csv, you can use splunk cmd importtool new_bucket_dir filename.csv You will either have to manually assign the correct splunk name to the bucket_dir, for example by naming it the same as the original, or by using some kind of script to name it. I used the following shell fragment, where $bucket was the old bucket

bucket_id=$(echo $bucket | sed 's/.*_//')
(cd $NEW_BUCKET; ls *.tsidx | sed 's/-[0-9]\+\.tsidx$//' |sed 's/-/ /') | {
global_low=0
global_high=0
while read high low; do
    if [ $global_high -eq 0 ] || [ $high -gt $global_high ]; then
        global_high=$high
    fi
    if [ $global_low -eq 0 ] || [ $low -lt $global_low ]; then
        global_low=$low
    fi
done
REAL_BUCKET_NAME=db_${global_high}_${global_low}_${bucket_id}
mv $NEW_BUCKET $bucket_dir/$REAL_BUCKET_NAME

Once you have a newly constructed, duplicated bucket, you can remove the old one from your index and insert the new one.

The main problem with exporttool/importtool is that they're not all that optimized, so they consume a significant amount of ram, and a significant amount of cpu for a significant amount of time. We'll be making them faster, but for now you should probably be sure you have a certain amount of headroom on the box where you're processing them.

If you want to go down that path, the full script (treat as example) is stuck in the wiki over here: http://www.splunk.com/wiki/Community:Modifying_indexed_data_via_export_and_import

View solution in original post

Mick
Splunk Employee
Splunk Employee

No and no, once data has been indexed, that's the state it's going to stay in. An export/import capability has been requested on a number of occasions, but it's not built yet. If you want to change the 'sourcetype' value, all you can really do is re-index the data

If that's not possible, then the next best solution is to just use tags - http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Defineandusetags

jrodman
Splunk Employee
Splunk Employee

The easiest method is to wipe the data and reindex.

Wiping the data can be global (splunk clean eventdata -index myindex) or more focused (splunk search "some data | delete"). The full wrinkles of these methods are discussed elsewhere.


Another means is sourcetype renaming, if you want to alias an entire sourcetype to another one you can do this, by eg, in props.conf:

[wrong_sourcetype]
rename = right_sourcetype

This clearly doesn't work if your [wrong_sourcetype] is a valid sourcetype on its own.


It's also possible to dump a bucket to a csv format, manipulate that, and then generate a new bucket from the modified or filtered csv data. This is sort of, 'for wizards'.

The command to emit a bucket to csv is splunk cmd exporttool bucketname filename.csv -csv To generate a new bucket from the csv, you can use splunk cmd importtool new_bucket_dir filename.csv You will either have to manually assign the correct splunk name to the bucket_dir, for example by naming it the same as the original, or by using some kind of script to name it. I used the following shell fragment, where $bucket was the old bucket

bucket_id=$(echo $bucket | sed 's/.*_//')
(cd $NEW_BUCKET; ls *.tsidx | sed 's/-[0-9]\+\.tsidx$//' |sed 's/-/ /') | {
global_low=0
global_high=0
while read high low; do
    if [ $global_high -eq 0 ] || [ $high -gt $global_high ]; then
        global_high=$high
    fi
    if [ $global_low -eq 0 ] || [ $low -lt $global_low ]; then
        global_low=$low
    fi
done
REAL_BUCKET_NAME=db_${global_high}_${global_low}_${bucket_id}
mv $NEW_BUCKET $bucket_dir/$REAL_BUCKET_NAME

Once you have a newly constructed, duplicated bucket, you can remove the old one from your index and insert the new one.

The main problem with exporttool/importtool is that they're not all that optimized, so they consume a significant amount of ram, and a significant amount of cpu for a significant amount of time. We'll be making them faster, but for now you should probably be sure you have a certain amount of headroom on the box where you're processing them.

If you want to go down that path, the full script (treat as example) is stuck in the wiki over here: http://www.splunk.com/wiki/Community:Modifying_indexed_data_via_export_and_import

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...