Re: Dedup / merge multiple events relating to one ...

alexandermunce · ‎07-09-2017

Hi,

Our system logs events in a bizarre way in which multiple lines of data will all relate to a single transaction, however each line will have a different attribute.

Current Relationship: Multiple events TO One transaction_id
Ideal relationship: One event TO One transaction_id

This also means that must data is duplicated - I want to merge these events all into one relating to the one unique transaction_id.

Is it possible that we upload this raw data each month and then format it correctly into single events automatically.

Example as follows:

transaction_id          status     date        type       attribute    value
10001                   complete   10/10/17    request    name         Jenny
10001                   complete   10/10/17    request    company      Ford
10001                   complete   10/10/17    request    reference    564682
10001                   complete   10/10/17    request    amount       $12,345

I would instead like it formatted as follows:

transaction_id          status     date        type       name   company   reference   amount
10001                   complete   10/10/17    request    Jenny  Ford      564682      $12,345

Now all events have the same number of related attributes.

Thanks in advance!

DalJeanis · ‎07-17-2017

Okay, so you have key-value pairs where these three together make the key - transaction_id status date type.

This code will create fields named after the attributes, set their values, and then roll the data together into a single event

| eval {attribute} = value
| fields - attribute value
| stats values(*) as * by transaction_id status date type 
| eval _time = strptime(date,"%m/%d/%y")

You might want to consider putting the csv into a temporary index each month, then using collect after the above lines to write the combined transactions to the permanent index.

HiroshiSatoh · ‎07-09-2017

There may be better search sentences･･･

(your search) | dedup transaction_id,status,date,type
| join transaction_id [search (your search) | chart first(value) by transaction_id,attribut]
| table transaction_id,status,date,type,name,company,reference,amount

alexandermunce · ‎07-09-2017

Could I perhaps use:

... | transaction transaction_id

Or is the transaction command only for combining different fields of the same value (in this case it is the same field)

niketn · ‎07-09-2017

@alexandermunce, you can try the following, this will not create separate new fields for specific attributes and values but will create single row for each transaction id.

| eval key=attribute."=\"".value."\""
| stats values(key) as key by transaction_id status date type

If multiple status, date and/or type are possible for each transaction_id then you would need to use list() or values() for each of such fields instead of using these in by clause.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

Dedup / merge multiple events relating to one transaction

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!