Splunk Search

Join with subsearch doesn't gives me expected results

Engager

Query1-Results:
ProxiesProcessed,Status

Query2-Results:
ProxiesProcessed,Audio_Tracks,year_mm_dd

Join Query:

index=index1 
host=node1 
source="results.log" 
"item: " AND "valid_audio_path: " |eval ProxiesProcessed=trim(substr(_raw,101,11)) | dedup ProxiesProcessed |rename ProxiesProcessed as P_ProxiesProcessed 
|eval Audio_Tracks = trim(substr(_raw,130,len(_raw)-129)) 
|eval year_mm_dd = trim(substr(Audio_Tracks,36,07)) 
|fields P_ProxiesProcessed,Audio_Tracks,year_mm_dd
|join type=left max=0 P_ProxiesProcessed 
[search index=index1 host=node1 source="results.log" 
"Item Processed :: " 
|eval ProxiesProcessed=trim(substr(_raw,112,10)) 
|eval Status=trim(substr(_raw,144,len(_raw)-143)) 
|dedup ProxiesProcessed 
|where Status="already_transcoded" 
|fields ProxiesProcessed,Status 
|rename ProxiesProcessed  as P_ProxiesProcessed 
|rename Status  as S_Status ]
|table P_ProxiesProcessed,Audio_Tracks,year_mm_dd,S_Status

I'm trying to join with the ProxiesProcessed field to get matching results with Query1 & Query2

If I execute the Query1 (used in subsearch) separately, gives me 300 events matched.

If I execute the Query2 (used in outer search) separately, gives me 20k events matched.

Any help is really appreciated.

Thanks, Arul

0 Karma
1 Solution

Esteemed Legend

Try this:

index=index1 host=node1 source="results.log" (("item: " AND "valid_audio_path: ") OR ( "Item Processed :: "))
| eval type=if(searchmatch("item: "), "Left", "Right")
| eval ProxiesProcessed=if((type="Left"), trim(substr(_raw,101,11)), trim(substr(_raw,112,10)))
| eval Status=if((type="Right"), trim(substr(_raw,144,len(_raw)-143), Status) 
| dedup ProxiesProcessed type
| search type="Left" OR Status="already_transcoded"
| rename ProxiesProcessed AS P_ProxiesProcessed 
| eval Audio_Tracks =if((type="Right"), trim(substr(_raw,130,len(_raw)-129)), Audio_Tracks)
| eval year_mm_dd =if((type="Right"), trim(substr(Audio_Tracks,36,07)), year_mm_dd)
| fields P_ProxiesProcessed Status Audio_Tracks year_mm_dd
| stats dc(type) AS numTypes values(*) AS * BY P_ProxiesProcessed
| rename Status AS S_Status
| table P_ProxiesProcessed Audio_Tracks year_mm_dd S_Status numTypes

This gives the fully merged set (full join). For each function, tack on the appropriate remaining search string:

For Left Join:

| search type="Left"

For right join:

| search type="Right"

For XOR (outer join):

| search numTypes= 1

For inner join:

| search numTypes>1

View solution in original post

Esteemed Legend

Try this:

index=index1 host=node1 source="results.log" (("item: " AND "valid_audio_path: ") OR ( "Item Processed :: "))
| eval type=if(searchmatch("item: "), "Left", "Right")
| eval ProxiesProcessed=if((type="Left"), trim(substr(_raw,101,11)), trim(substr(_raw,112,10)))
| eval Status=if((type="Right"), trim(substr(_raw,144,len(_raw)-143), Status) 
| dedup ProxiesProcessed type
| search type="Left" OR Status="already_transcoded"
| rename ProxiesProcessed AS P_ProxiesProcessed 
| eval Audio_Tracks =if((type="Right"), trim(substr(_raw,130,len(_raw)-129)), Audio_Tracks)
| eval year_mm_dd =if((type="Right"), trim(substr(Audio_Tracks,36,07)), year_mm_dd)
| fields P_ProxiesProcessed Status Audio_Tracks year_mm_dd
| stats dc(type) AS numTypes values(*) AS * BY P_ProxiesProcessed
| rename Status AS S_Status
| table P_ProxiesProcessed Audio_Tracks year_mm_dd S_Status numTypes

This gives the fully merged set (full join). For each function, tack on the appropriate remaining search string:

For Left Join:

| search type="Left"

For right join:

| search type="Right"

For XOR (outer join):

| search numTypes= 1

For inner join:

| search numTypes>1

View solution in original post

Engager

Mr. Woodcock,
It works to me.
I appreciate your help on this part, Thank you so much.

Thanks, Arul

0 Karma

Engager

Mr. Woodcock,
One more help from you.

Audit Logs:
2016-06-12 00:48:29,834 INFO [MainThread][PID:3143] item: AR001SJFBS valid_audio_path: /PROXY_AUDIO/2011/05/31/AR001SJFBS_3.mp2||/stornext/MAM_LOWRES_5/PROXY_AUDIO/2011/05/31/AR001SJFBS_4.mp2
2016-06-12 00:48:29,834 INFO [MainThread][PID:3143] Item Submitted :: AR001SJFBS
2016-06-12 00:48:40,730 INFO [MainThread][PID:3143] Item Processed :: AR001SJFBS, Transcode Status :: error

2016-06-27 08:30:20,169 INFO [MainThread][PID:29112] item: AR001SJFBS valid_audio_path: /PROXY_AUDIO/2011/05/31/AR001SJFBS_3.mp2||/stornext/MAM_LOWRES_5/PROXY_AUDIO/2011/05/31/AR001SJFBS_4.mp2
2016-06-27 08:30:20,169 INFO [MainThread][PID:29112] Item Submitted :: AR001SJFBS
2016-06-27 08:51:56,680 INFO [MainThread][PID:29112] update metadat:: file system check True, new access path L:\PROXY\2011\05\31\AR001SJFBS.mp4
2016-06-27 08:52:13,879 INFO [MainThread][PID:29112] Existing proxy backup at P:\MPEG_Backup\PROXY\2011\05\31\AR001SJFBS.mpeg ::
2016-06-27 08:52:13,879 INFO [MainThread][PID:29112] Item Processed :: AR001SJFBS, Transcode Status :: completed

Question, from the above log
I'm re-submitting jobs those are failed with error/aborted/failed status. So possibilities of repeated proxies present in the list.
I've to treat the transcode status of "AR001SJFBS" as "completed" instead "error" since completed is the latest status.

Thanks, Arul

0 Karma

Engager
0 Karma

Esteemed Legend

I do not understand. It would probably be best to start over and ask a new question but be sure to be as specific as possible.

0 Karma