Hi Community,
I have this problem about data correlation, here's the detail.
The source file is a test result summary named summary.xml, and it's not time sensitive. Splunk will parse the file to some events like event1,2,3,etc. The test info is in event 1 and results are in even 2,3,4. My goal is to count the results of all tests under the same info. I don't know how to link these info.
What kind of SPL search I could use?
For example:
Summary1.xml:
event1 | test info: | alpha |
event2 | Pass | |
event3 | Fail | |
event4 | Fail |
Summary2.xml:
event1 | test info: | beta |
event2 | Pass | |
event3 | Pass | |
event4 | Pass |
The results I expected:
Test info | results |
alpha | pass:1, failed:2 |
beta | pass: 3, failed:0 |
Hi @cecilia_cheng1,
you should in gest data in a way that every file is an event, in this way you have all the information in the same event,
but anyway,the file name is always different from the previous ones or there could be more files with the same name, in other words, can we use the source as a unique key?
Then, could you share some sample of your data?
Ciao.
Giuseppe
Hi @gcusello ,
Sorry for the vague description. I post the detail pic as follows:
The summary is gathered from different host and stored in different DIR with same name.
The source name is not a unique key. A test file is the result of a host testing a load multiple times.
I have extract the info I need which are loadname and result1. But these two infos do not shown in the same line... It is impossibe to count result1 by loadname, shown as pic 3
Pic1:
pic2:
pic3:
sorry but your data quality is really bad....
you have fields with NULL values and you are using those fields for "by" clauses
Agreed... Those are the orginal data. The info I needed is not showned at the same time, they're not in the same line... That's why there are so many NULL... 😭
well then you need to provide us with amapping of Columns and their values and how they should be mapped....ATM this is not possible. No where in your screenshots we can see something about alpha and beta
If you are familiar with a bit of coding write some pseudo code here.
Not quite understand what info I should provid...
Something like the following, but need to replace source column with loadname.
If this requirement is impossible, should I write a script to preprocess the data or some other solution?
you need to provide more information on the data...
your screenshots include the same query...how can you differntiate between event A that provides grouping data for tests and the test results themselves?
Hi you are missing some data in the columns with alpha and beta.
Event | Result | Test Type |
event2 | Pass | alpha |
event3 | Fail | alpha |
event4 | Fail | alpha |
event2 | Pass | beta |
event3 | Pass | beta |
event4 | Pass | beta |
that is what you should aim for as you column headers as far as i see from your post are misleading
with that you can do:
| stats count by "Test Type" Result
Hi @Software-Simian ,
Thank you for your quick reply.
That's the tricky part, cuz the test type and test resuls are not in the same event... So I was wondering if there's a command or something that can link these info together.
BR.
Cecilia
Anyhow, you need a series of fields as unique identifier...and event2 is not unique and the event line only states event2 and Pass...but not if it belongs to alpha or beta...
You need real headers...your tables start with content, that makes it hard to answer
are all summaries at least in the same index? Or are those events volatile and not persistent?