Getting Data In

Getting duplicate record after uploading json (even dedup not working)

sawgata12345
Path Finder

Hi,
I have uploaded a json file to splunk and using spath command to get output, but the output shows two rows for a single record.
The json file sample is:
[
{
"id": 707860,
"name": "Hurzuf",
"country": "UA",
"timestamp":"2017-09-02T06:44:14,799 MDT",
"coord": {
"lon": 34.283333,
"lat": 44.549999
},
"ports":[
{
"port": 1,
"utilization": 140,
"error": {
"tx": 1000.00,
"rx": 500
}
},
{
"port": 2,
"utilization": 110,
"error": {
"tx": 1002.00,
"rx": 420
}
}
]
},
{
"id": 519188,
"name": "Novinki",
"country": "RU",
"timestamp":"2017-09-03T06:50:14,799 MDT",
"coord": {
"lon": 37.666668,
"lat": 55.683334
},
"ports":[
{
"port": 1,
"utilization": 120,
"error": {
"tx": 1020.00,
"rx": 400
}
},
{
"port": 2,
"utilization": 120,
"error": {
"tx": 1002.00,
"rx": 400
}
}
]
}]

alt text

Tags (2)
0 Karma
1 Solution

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @sawgata12345,

I think your each event contain multiple records.And each record has it's own timestamp country name id fields moreover ports field also contain multiple values like port: 1 & port:2. So you have to write a search which can maintain relation of each field of port and relative id.

Your Sample Event:

[{
    "id": 707860,
    "name": "Hurzuf",
    "country": "UA",
    "timestamp": "2017-09-02T06:44:14,799 MDT",
    "coord": {
        "lon": 34.283333,
        "lat": 44.549999
    },
    "ports": [{
        "port": 1,
        "utilization": 140,
        "error": {
            "tx": 1000.00,
            "rx": 500
        }
    },
    {
        "port": 2,
        "utilization": 110,
        "error": {
            "tx": 1002.00,
            "rx": 420
        }
    }]
},
{
    "id": 519188,
    "name": "Novinki",
    "country": "RU",
    "timestamp": "2017-09-03T06:50:14,799 MDT",
    "coord": {
        "lon": 37.666668,
        "lat": 55.683334
    },
    "ports": [{
        "port": 1,
        "utilization": 120,
        "error": {
            "tx": 1020.00,
            "rx": 400
        }
    },
    {
        "port": 2,
        "utilization": 120,
        "error": {
            "tx": 1002.00,
            "rx": 400
        }
    }]
}]

Can you please try below search:

index="newjson1" sourcetype="_json"
| spath {} output=even 
| stats count by even 
| eval _raw=even 
| spath 
| rename coord.lat as coord_lat coord.lon as coord_lon ports{}.error.rx as ports_error_rx ports{}.error.tx as ports_error_tx ports{}.port as ports_port ports{}.utilization as ports_utilization 
| eval tempField=mvzip(mvzip(mvzip(ports_error_tx,ports_error_rx),ports_port),ports_utilization) 
| stats count by tempField timestamp country name id 
| eval ports_error_tx=mvindex(split(tempField,","),0), ports_error_rx=mvindex(split(tempField,","),1),ports_port=mvindex(split(tempField,","),2), ports_utilization=mvindex(split(tempField,","),3) | sort timestamp id
| table timestamp country name id ports_error_tx ports_error_rx ports_port ports_utilization

My Sample Search:

| makeresults 
| eval _raw="[ { \"id\": 707860, \"name\": \"Hurzuf\", \"country\": \"UA\", \"timestamp\":\"2017-09-02T06:44:14,799 MDT\", \"coord\": { \"lon\": 34.283333, \"lat\": 44.549999 }, \"ports\":[ { \"port\": 1, \"utilization\": 140, \"error\": { \"tx\": 1000.00, \"rx\": 500 } }, { \"port\": 2, \"utilization\": 110, \"error\": { \"tx\": 1002.00, \"rx\": 420 } } ] }, { \"id\": 519188, \"name\": \"Novinki\", \"country\": \"RU\", \"timestamp\":\"2017-09-03T06:50:14,799 MDT\", \"coord\": { \"lon\": 37.666668, \"lat\": 55.683334 }, \"ports\":[ { \"port\": 1, \"utilization\": 120, \"error\": { \"tx\": 1020.00, \"rx\": 400 } }, { \"port\": 2, \"utilization\": 120, \"error\": { \"tx\": 1002.00, \"rx\": 400 } } ] }]" 
| spath {} output=even 
| stats count by even 
| eval _raw=even 
| spath 
| rename coord.lat as coord_lat coord.lon as coord_lon ports{}.error.rx as ports_error_rx ports{}.error.tx as ports_error_tx ports{}.port as ports_port ports{}.utilization as ports_utilization 
| eval tempField=mvzip(mvzip(mvzip(ports_error_tx,ports_error_rx),ports_port),ports_utilization) 
| stats count by tempField timestamp country name id 
| eval ports_error_tx=mvindex(split(tempField,","),0), ports_error_rx=mvindex(split(tempField,","),1),ports_port=mvindex(split(tempField,","),2), ports_utilization=mvindex(split(tempField,","),3) | sort timestamp id
| table timestamp country name id ports_error_tx ports_error_rx ports_port ports_utilization

alt text

Happy Splunking

Thanks
Kamlesh

View solution in original post

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @sawgata12345,

I think your each event contain multiple records.And each record has it's own timestamp country name id fields moreover ports field also contain multiple values like port: 1 & port:2. So you have to write a search which can maintain relation of each field of port and relative id.

Your Sample Event:

[{
    "id": 707860,
    "name": "Hurzuf",
    "country": "UA",
    "timestamp": "2017-09-02T06:44:14,799 MDT",
    "coord": {
        "lon": 34.283333,
        "lat": 44.549999
    },
    "ports": [{
        "port": 1,
        "utilization": 140,
        "error": {
            "tx": 1000.00,
            "rx": 500
        }
    },
    {
        "port": 2,
        "utilization": 110,
        "error": {
            "tx": 1002.00,
            "rx": 420
        }
    }]
},
{
    "id": 519188,
    "name": "Novinki",
    "country": "RU",
    "timestamp": "2017-09-03T06:50:14,799 MDT",
    "coord": {
        "lon": 37.666668,
        "lat": 55.683334
    },
    "ports": [{
        "port": 1,
        "utilization": 120,
        "error": {
            "tx": 1020.00,
            "rx": 400
        }
    },
    {
        "port": 2,
        "utilization": 120,
        "error": {
            "tx": 1002.00,
            "rx": 400
        }
    }]
}]

Can you please try below search:

index="newjson1" sourcetype="_json"
| spath {} output=even 
| stats count by even 
| eval _raw=even 
| spath 
| rename coord.lat as coord_lat coord.lon as coord_lon ports{}.error.rx as ports_error_rx ports{}.error.tx as ports_error_tx ports{}.port as ports_port ports{}.utilization as ports_utilization 
| eval tempField=mvzip(mvzip(mvzip(ports_error_tx,ports_error_rx),ports_port),ports_utilization) 
| stats count by tempField timestamp country name id 
| eval ports_error_tx=mvindex(split(tempField,","),0), ports_error_rx=mvindex(split(tempField,","),1),ports_port=mvindex(split(tempField,","),2), ports_utilization=mvindex(split(tempField,","),3) | sort timestamp id
| table timestamp country name id ports_error_tx ports_error_rx ports_port ports_utilization

My Sample Search:

| makeresults 
| eval _raw="[ { \"id\": 707860, \"name\": \"Hurzuf\", \"country\": \"UA\", \"timestamp\":\"2017-09-02T06:44:14,799 MDT\", \"coord\": { \"lon\": 34.283333, \"lat\": 44.549999 }, \"ports\":[ { \"port\": 1, \"utilization\": 140, \"error\": { \"tx\": 1000.00, \"rx\": 500 } }, { \"port\": 2, \"utilization\": 110, \"error\": { \"tx\": 1002.00, \"rx\": 420 } } ] }, { \"id\": 519188, \"name\": \"Novinki\", \"country\": \"RU\", \"timestamp\":\"2017-09-03T06:50:14,799 MDT\", \"coord\": { \"lon\": 37.666668, \"lat\": 55.683334 }, \"ports\":[ { \"port\": 1, \"utilization\": 120, \"error\": { \"tx\": 1020.00, \"rx\": 400 } }, { \"port\": 2, \"utilization\": 120, \"error\": { \"tx\": 1002.00, \"rx\": 400 } } ] }]" 
| spath {} output=even 
| stats count by even 
| eval _raw=even 
| spath 
| rename coord.lat as coord_lat coord.lon as coord_lon ports{}.error.rx as ports_error_rx ports{}.error.tx as ports_error_tx ports{}.port as ports_port ports{}.utilization as ports_utilization 
| eval tempField=mvzip(mvzip(mvzip(ports_error_tx,ports_error_rx),ports_port),ports_utilization) 
| stats count by tempField timestamp country name id 
| eval ports_error_tx=mvindex(split(tempField,","),0), ports_error_rx=mvindex(split(tempField,","),1),ports_port=mvindex(split(tempField,","),2), ports_utilization=mvindex(split(tempField,","),3) | sort timestamp id
| table timestamp country name id ports_error_tx ports_error_rx ports_port ports_utilization

alt text

Happy Splunking

Thanks
Kamlesh

0 Karma

sawgata12345
Path Finder

@kamlesh

This query as you suggested gives me "No result" after executing.
what is this _raw=even(this "_raw" according to your query you have stored the whole json content but for me the data is already uploaded and indexed)

index="newjson1" sourcetype="_json"
| spath {} output=even
| stats count by even
| eval _raw=even
| spath
| rename coord.lat as coord_lat coord.lon as coord_lon ports{}.error.rx as ports_error_rx ports{}.error.tx as ports_error_tx ports{}.port as ports_port ports{}.utilization as ports_utilization
| eval tempField=mvzip(mvzip(mvzip(ports_error_tx,ports_error_rx),ports_port),ports_utilization)
| stats count by tempField timestamp country name id
| eval ports_error_tx=mvindex(split(tempField,","),0), ports_error_rx=mvindex(split(tempField,","),1),ports_port=mvindex(split(tempField,","),2), ports_utilization=mvindex(split(tempField,","),3) | sort timestamp id
| table timestamp country name id ports_error_tx ports_error_rx ports_port ports_utilization

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @sawgata12345,

 | eval _raw=even 
 | spath

I'm overwriting _raw with new generated event from even field. So next statement spath will execute as per expectation. This is because your given event contains multiple records.

Have you tried to execute this search in parts?

like ..
1)

index="newjson1" sourcetype="_json"
| spath {} output=even

2)

index="newjson1" sourcetype="_json"
| spath {} output=even
| stats count by even

3)

index="newjson1" sourcetype="_json"
| spath {} output=even
| stats count by even
| eval _raw=even
| spath

Can you please execute above searches and give me output?

Thanks

0 Karma

nikita_p
Contributor

Hi @sawgata12345,
If you have indexed file twice and if it is possible to clean index then clean it and while indexing new file add crcSalt in inputs.conf so that splunk won't index duplicate file.

0 Karma

sawgata12345
Path Finder

@nikita_p
i cleared the index and uploaded fresh data and added crcSalt in inputs.conf. Even then the rows are coming twice. This could be because of the multiple port details per json event. But in that case if 50 port details then the row gets repeated 50 times. Its not good right?

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Have you uploaded files twice ? Maybe previously as well ? Have you tried this query index="newjson1" sourcetype="_json" | dedup _raw| spath ..... ?

0 Karma

sawgata12345
Path Finder

@harsmarvania57
I cleaned the index and uploaded the file once fresh, but even then its same. Even dedup _raw before the |spath also giving the same result. Is it because of multiple port details in each single json event?

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Found the issue, remove | spath from your query and it will display only single value not duplicate.

0 Karma
Get Updates on the Splunk Community!

New Cloud Intrusion Detection System Add-on for Splunk

In July 2022 Splunk released the Cloud IDS add-on which expanded Splunk capabilities in security and data ...

Happy CX Day to our Community Superheroes!

Happy 10th Birthday CX Day!What is CX Day? It’s a global celebration recognizing innovation and success in the ...

Check out This Month’s Brand new Splunk Lantern Articles

Splunk Lantern is a customer success center providing advice from Splunk experts on valuable data insights, ...