Two searches with different source and producing r...

guptap2 · ‎07-15-2019

There are 2 searches from 2 different sources that are fetching file name details in column A and B respectively.
We need to compare the data of column A and B in a way that values of B already in A should be removed and we get the output from A for files that are not in B till now.
I am using below search but column B data is also coming along:

index=esbsrv_app  host=*AM* integrationId=I216 code=JBOSS012  transactionId=* earliest=-90d
| eval received_uri=case(code="JBOSS012",uri)
| rex field=received_uri ".*/((?.*))"
| stats min(_time) as time list(received_file) as Received_SFTP by transactionId
| where like(Received_SFTP,"%OH%")
| fields - transactionId
| appendcols [search index=onlftsprod sourcetype=transaction source="*/data/integration/jboss/oh/in/dailytransaction/Archive*" earliest=-90d
| eval mytime=strftime(_time,"%Y-%m-%d %H:%M:%S.%Q")
| stats min(mytime) as time by FILENAME ]
| eval output=toString(Received_SFTP)+";"+toString(FILENAME) | makemv delim=";" output
| mvexpand output
| fields - time Received_SFTP
| eval abc=if(FILENAME=output,1,0)
| search abc="0"
| stats count by output
| search count=1

Can someone please help in comparing 2 columns and getting results in A which are not in B

woodcock · ‎07-28-2019

Here are several answers of mine that compare multi-value fields and show the differences:
https://answers.splunk.com/answers/567851/how-can-i-compare-mvfields-and-get-a-diff.html
https://answers.splunk.com/answers/734599/how-to-compare-the-same-search-from-the-previous-d.html
https://answers.splunk.com/answers/319663/how-to-search-the-difference-between-the-values-of.html

guptap2 · ‎07-15-2019

First Query :index=esbsrv_app host=AM integrationId=I216 code=JBOSS012 transactionId=* earliest=-90d
| eval received_uri=case(code="JBOSS012",uri)
| rex field=received_uri "./((?.))"
| stats min(_time) as time list(received_file) as Received_SFTP by transactionId
| where like(Received_SFTP,"%OH%")
| fields - transactionId
Output : Events (45,294)
Patterns
Statistics (38,129)
Visualization
20 Per Page
Format
Preview
Prev12345678...Next
time Received_SFTP
1562508623.153 I216_OH01_CA_98016413_000_20190707090010.xml
1560342840.230 I216_OH01_US_95490755_000_20190612070502.xml
1562616033.656 I216_OH01_US_98144575_000_20190708150001.xml
1562328016.795 I216_OH01_US_97799891_000_20190705061503.xml
1562992241.185 I216_OH01_US_98661758_000_20190712233006.xml
1561411913.315 I216_OH01_US_96603298_000_20190624163015.xml
1562919670.666 I216_OH01_CA_98570110_000_20190712031606.xml
1561912223.606 I216_OH01_US_97155174_000_20190630113001.xml
1562716885.927 I216_OH01_US_98313633_000_20190709190005.xml
1560100230.619 I216_OH01_US_95272914_000_20190609120501.xml
1561183254.865 I216_OH01_CA_96344206_000_20190622005607.xml

Second Query : index=onlftsprod sourcetype=transaction source="/data/integration/jboss/oh/in/dailytransaction/Archive" earliest=-90d
| eval mytime=strftime(_time,"%Y-%m-%d %H:%M:%S.%Q")
| stats min(mytime) as time by FILENAME
Output: Events (200)
Patterns
Statistics (200)
Visualization
20 Per Page
Format
Preview
Prev12345678...Next
FILENAME time
I006_OH01_94021027_000_20190526045001.xml 2019-05-26 07:00:00.000
I006_OH01_94021921_000_20190526053501.xml 2019-05-26 07:00:00.000
I006_OH01_94022085_000_20190526055001.xml 2019-05-26 07:00:00.000
I006_OH01_94022426_000_20190526060501.xml 2019-05-26 07:00:00.000
I006_OH01_94022608_000_20190526062001.xml 2019-05-26 07:00:00.000
I006_OH01_94022954_000_20190526063501.xml 2019-05-26 07:00:00.000
I006_OH01_94023127_000_20190526065001.xml 2019-05-26 07:00:00.000
I006_OH01_94023504_000_20190526070500.xml 2019-05-26 07:10:00.000
I006_OH01_94023672_000_20190526072000.xml 2019-05-26 07:20:00.000
I006_OH01_94024159_000_20190526073500.xml 2019-05-26 07:40:00.000
I006_OH01_94024303_000_20190526075000.xml 2019-05-26 07:50:00.00

I have to create a third column where all received SFTP filenames should get printed which are not in second query output - FILENAME. Clearly the files not yet in FILENAME table should get copied to new column.

Richfez · ‎07-16-2019

Thanks!

Unfortunately, I think the formatting monster may have eaten some of your code. Could you please edit the above comment with all the code in it, and repaste in the searches/SPL only this time using the button in the toolbar for code (labeled 101010)? As it stands, there's ambiguity that may or may not be important in the rex and possibly other places.

Once we have that corrected, the only thing missing is a couple of source events?

A particular question - you have in the first search this line:

| where like(Received_SFTP,"%OH%")

And the files returned have a formatting including as the third section of them, things like _CA_ and _US_. What happens or should happen when a file becomes localized to the state of Ohio? It would have that third section be _OH_ and would thus match your where condition...

This sort of underlies why we like to see a couple of the source events, the searches you have, and a mock up or good description of what it is you want it to end up like. Frankly, it also helps sometimes to find out - in English - what it is you are after. I think in your case we know what you are after, and we can see the searches (except for the formatting problems from not using the code button), so a couple of source events would be helpful.

Sorry I know we sound like we're being too particular, but we often make life better in the end. We just need certain information to make sure we're not putting lipstick on a pig (which is a reference to trying to prettify an ugly search that could really use a complete rewriting.) Another analogy: you wouldn't put a $5000 paint job on a $1000 car, would you?

Also, I think @woodcock's reference to text was really a "just don't send us screenshots, but something we can copy/paste into Splunk to use as test data if necessary". Your doing fine for that, so thank you. Screenshots are useful, but not in this context.

guptap2 · ‎07-15-2019

column A is Received_SFTP and column B is FILENAME

woodcock · ‎07-15-2019

I agree with @rich7177; post a few sample events from each index/sourcetype and then a mockup of your desired final output AND DO IT IN ASCII TEXT, NOT IN PICTURES.

Richfez · ‎07-15-2019

I have a feeling you are doing way more work than is actually required.

Do you have a sample of each event?

The usual way to handle this (IMO) would be to throw all those events into one big pile, use some conditional evals to tag each "side", stats them together with a count, then find where you have a count > 1.

But I think seeing those original events would help a lot in trying to tie them together.

(also, your rex got broken in line 3?)

Two searches with different source and producing results in form of filename in column A and B respectively.

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Two searches with different source and producing results in form of filename in column A and B respectively.

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits