There are 2 searches from 2 different sources that are fetching file name details in column A and B respectively.
We need to compare the data of column A and B in a way that values of B already in A should be removed and we get the output from A for files that are not in B till now.
I am using below search but column B data is also coming along:
index=esbsrv_app host=*AM* integrationId=I216 code=JBOSS012 transactionId=* earliest=-90d | eval received_uri=case(code="JBOSS012",uri) | rex field=received_uri ".*/((?.*))" | stats min(_time) as time list(received_file) as Received_SFTP by transactionId | where like(Received_SFTP,"%OH%") | fields - transactionId | appendcols [search index=onlftsprod sourcetype=transaction source="*/data/integration/jboss/oh/in/dailytransaction/Archive*" earliest=-90d | eval mytime=strftime(_time,"%Y-%m-%d %H:%M:%S.%Q") | stats min(mytime) as time by FILENAME ] | eval output=toString(Received_SFTP)+";"+toString(FILENAME) | makemv delim=";" output | mvexpand output | fields - time Received_SFTP | eval abc=if(FILENAME=output,1,0) | search abc="0" | stats count by output | search count=1
Can someone please help in comparing 2 columns and getting results in A which are not in B
Here are several answers of mine that compare multi-value fields and show the differences:
First Query :index=esbsrv_app host=AM integrationId=I216 code=JBOSS012 transactionId=* earliest=-90d
| eval received_uri=case(code="JBOSS012",uri)
| rex field=received_uri "./((?.))"
| stats min(_time) as time list(received_file) as Received_SFTP by transactionId
| where like(Received_SFTP,"%OH%")
| fields - transactionId
Output : Events (45,294)
20 Per Page
Second Query : index=onlftsprod sourcetype=transaction source="/data/integration/jboss/oh/in/dailytransaction/Archive" earliest=-90d
| eval mytime=strftime(_time,"%Y-%m-%d %H:%M:%S.%Q")
| stats min(mytime) as time by FILENAME
Output: Events (200)
20 Per Page
I006_OH01_94021027_000_20190526045001.xml 2019-05-26 07:00:00.000
I006_OH01_94021921_000_20190526053501.xml 2019-05-26 07:00:00.000
I006_OH01_94022085_000_20190526055001.xml 2019-05-26 07:00:00.000
I006_OH01_94022426_000_20190526060501.xml 2019-05-26 07:00:00.000
I006_OH01_94022608_000_20190526062001.xml 2019-05-26 07:00:00.000
I006_OH01_94022954_000_20190526063501.xml 2019-05-26 07:00:00.000
I006_OH01_94023127_000_20190526065001.xml 2019-05-26 07:00:00.000
I006_OH01_94023504_000_20190526070500.xml 2019-05-26 07:10:00.000
I006_OH01_94023672_000_20190526072000.xml 2019-05-26 07:20:00.000
I006_OH01_94024159_000_20190526073500.xml 2019-05-26 07:40:00.000
I006_OH01_94024303_000_20190526075000.xml 2019-05-26 07:50:00.00
I have to create a third column where all received SFTP filenames should get printed which are not in second query output - FILENAME. Clearly the files not yet in FILENAME table should get copied to new column.
Unfortunately, I think the formatting monster may have eaten some of your code. Could you please edit the above comment with all the code in it, and repaste in the searches/SPL only this time using the button in the toolbar for code (labeled 101010)? As it stands, there's ambiguity that may or may not be important in the rex and possibly other places.
Once we have that corrected, the only thing missing is a couple of source events?
A particular question - you have in the first search this line:
| where like(Received_SFTP,"%OH%")
And the files returned have a formatting including as the third section of them, things like
_US_. What happens or should happen when a file becomes localized to the state of Ohio? It would have that third section be
_OH_ and would thus match your where condition...
This sort of underlies why we like to see a couple of the source events, the searches you have, and a mock up or good description of what it is you want it to end up like. Frankly, it also helps sometimes to find out - in English - what it is you are after. I think in your case we know what you are after, and we can see the searches (except for the formatting problems from not using the code button), so a couple of source events would be helpful.
Sorry I know we sound like we're being too particular, but we often make life better in the end. We just need certain information to make sure we're not putting lipstick on a pig (which is a reference to trying to prettify an ugly search that could really use a complete rewriting.) Another analogy: you wouldn't put a $5000 paint job on a $1000 car, would you?
Also, I think @woodcock's reference to text was really a "just don't send us screenshots, but something we can copy/paste into Splunk to use as test data if necessary". Your doing fine for that, so thank you. Screenshots are useful, but not in this context.
I have a feeling you are doing way more work than is actually required.
Do you have a sample of each event?
The usual way to handle this (IMO) would be to throw all those events into one big pile, use some conditional evals to tag each "side", stats them together with a count, then find where you have a count > 1.
But I think seeing those original events would help a lot in trying to tie them together.
(also, your rex got broken in line 3?)