Solved: How can I use a subsearch with rex extracted field...

mwolfe · ‎11-01-2024

I am trying to take the results of one search, extract a field from those results (named "id") and take all of those values (deduped) and use them to get results from another search. Unfortunately the second search doesn't have this field name directly in the sourcetype either so it has to be extracted with rex.

I've been having issues with this though. From what I've read I need to use the subsearch to extract the id's for the outer search. It's not working though. Each search is from a competely different data set that has very little in common.

index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
| rex field=uri "/path/with/id/(?<some_id>[^/]+)/*"
[ search index=index2  source="/another.log"" "condition-i-want-to-find"
  | rex field=_raw "some_id:(?<some_id>[^,]+),*"
  | dedup some_id
  | fields some_id
]

I've tried a bunch of variations of this with no luck. Including renaming field some_id to "search" as some have said that would help. I don't necessarily need the original uri="/path/with/id/some_id" in the outer search but that would be nice to limit those results.

yuanliu · ‎11-01-2024

Whereas the syntax problem that @PickleRick pointed out can be rectified by adding a pipe like this

index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
| rex field=uri "/path/with/id/(?<some_id>[^/]+)/*"
| search
  [ search index=index2  source="/another.log"" "condition-i-want-to-find"
  | rex field=_raw "some_id:(?<some_id>[^,]+),*"
  | dedup some_id
  | fields some_id
  ]

this method reduces the advantage of using subsearch in your dataset.

To improve efficiency, "renaming field some_id to "search" as some have said would help" actually will help. (In part because / is a hard separator in Splunk.) You just need to add a format command:

index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
    [ search index=index2  source="/another.log"" "condition-i-want-to-find"
    | rex field=_raw "some_id:(?<search>[^,]+),*"
    | dedup search
    | fields search
    | format
    ]
| rex field=uri "/path/with/id/(?<some_id>[^/]+)/*"

Here is an emulation. Play with it and compare with your data.

index = _internal log/splunk
``` the above emulates
index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
```
    [makeresults format=csv data="search
    supervisor.log
    splunkd_ui_access.log"
``` the above emulates
        [ search index=index2  source="/another.log"" "condition-i-want-to-find"
    | rex field=_raw "some_id:(?<search>[^,]+),*"
    | dedup search
    | fields search
    | format
    ]
```
    | format]
| rex field=series "log/splunk/(?<some_id>[^\"]+)" ``` emulates | rex field=uri "/path/with/id/(?<some_id>[^/]+)/*" ```
| stats count by some_id

On my laptop, it gives

some_id	count
splunkd_ui_access.log	59
supervisor.log	1045

As you can see, among all the logs, the output is limited to the two values in the subsearch.

View solution in original post

yuanliu · ‎11-01-2024

Whereas the syntax problem that @PickleRick pointed out can be rectified by adding a pipe like this

index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
| rex field=uri "/path/with/id/(?<some_id>[^/]+)/*"
| search
  [ search index=index2  source="/another.log"" "condition-i-want-to-find"
  | rex field=_raw "some_id:(?<some_id>[^,]+),*"
  | dedup some_id
  | fields some_id
  ]

this method reduces the advantage of using subsearch in your dataset.

To improve efficiency, "renaming field some_id to "search" as some have said would help" actually will help. (In part because / is a hard separator in Splunk.) You just need to add a format command:

index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
    [ search index=index2  source="/another.log"" "condition-i-want-to-find"
    | rex field=_raw "some_id:(?<search>[^,]+),*"
    | dedup search
    | fields search
    | format
    ]
| rex field=uri "/path/with/id/(?<some_id>[^/]+)/*"

Here is an emulation. Play with it and compare with your data.

index = _internal log/splunk
``` the above emulates
index=index1 source="/somefile.log"  uri="/path/with/id/some_id/"
```
    [makeresults format=csv data="search
    supervisor.log
    splunkd_ui_access.log"
``` the above emulates
        [ search index=index2  source="/another.log"" "condition-i-want-to-find"
    | rex field=_raw "some_id:(?<search>[^,]+),*"
    | dedup search
    | fields search
    | format
    ]
```
    | format]
| rex field=series "log/splunk/(?<some_id>[^\"]+)" ``` emulates | rex field=uri "/path/with/id/(?<some_id>[^/]+)/*" ```
| stats count by some_id

On my laptop, it gives

some_id	count
splunkd_ui_access.log	59
supervisor.log	1045

As you can see, among all the logs, the output is limited to the two values in the subsearch.

PickleRick · ‎11-01-2024

A subsearch will get executed first and if it completes successfully (which might not happen - subsearches have limitations and throwing heavy raw-data based searches into them is not a good idea) will return a set of conditions or a search string which will get substituted in the main search.

So your search as it is will make no sense syntactically because the rex command doesn't take more arguments.

If anything you'd need to do

<something>
| search [ your subsearch here ]

How can I use a subsearch with rex extracted field to seach over an extracted field

fields

rex

subsearch

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)

Are you a member of the Splunk Community?