Solved: How to join using a wildcard?

kdimaria · ‎05-23-2018

I have a lookup that contains host names with wildcards. I am trying to do a inputlookup to grab those host names with the wildcards and then join those host names to find all other hosts that have a similar name. For example, one host in my lookup for application test is spx*. application test has a bunch of hosts that start with spx and I want to join on all of those possibilities and find them in the linux index.

Example Code:

| inputlookup application_hosts.csv | fields application,host| where application="test" AND host="spx*" | join host type=left [search index=linux sourcetype=df latest=now earliest=-15m | dedup host | table host mount]

When I join the hosts with the wildcard * they dont pick up the hosts they should. Is it possible to join using a wildcard?

DalJeanis · ‎05-23-2018

Not using precisely the method you are using. However, you can do anything if you lay enough pipe.

Let's see.

First, you should reverse the sides of the join.

Second, your code seems to expect only a single mount per host. is this a correct assumption? I'm going to assume no, but it doesn't matter because this gets you what you need either way...

 index=linux sourcetype=df earliest=-15m latest=now 
| fields index host mount
| stats values(mount) as mount by index host 

| rename COMMENT as "now we roll in the lookup records and mark them" 
| inputlookup append=t application_hosts.csv where application="test" AND host="spx*" 
| eval index=coalesce(index,"lookup")
| fields index host mount application
| dedup index host application keepempty=true

| rename COMMENT as "save the original hosts in different places based on record type."
| eval hostLU=case(index="lookup",host)
| eval hostDF=case(index="linux",host)

| rename COMMENT as "create a field to match on. In this phase this field will be empty if there is no wild card."
| eval MatchVHost = hostLU
| rex field=MatchVHost mode=sed "s/^[^*]+$//g s/[*].*$//g"

| rename COMMENT as "create a summary record that contains the lengths of all the wildcarded MatchVHost fields, plus a zero"
| rename COMMENT as "Roll them onto the linux records then kill the summary record"
| appendpipe [| fields MatchVHost 
    | eval MatchLen = len(MatchVHost) 
    | append [| makeresults | eval MatchLen=0 ] 
    | stats values(MatchLen) as MatchLen 
    | eval index="linux", killme="killme"
    ]
| eventstats values(MatchLen) as MatchLen by index
| where isnull(killme)

| rename COMMENT as "set MatchLen on all the lookup records as zero for the expand - zero means not to modify."
| eval MatchLen = coalesce(MatchLen,0)
| mvexpand MatchLen

| rename COMMENT as "Now we have one record for each lookup record, ..."
| rename COMMENT as "and one record for each host for each potential length of match, plus a zero record for do-not-modify"
| rename COMMENT as "Build the match values for the hostDF records, omitting any where the wildcard stem is not shorter than the actual value."
| eval MatchVHost = case(MatchLen=0,coalesce(MatchVHost,hostDF,hostLU),
    MatchLen<len(hostDF),substr(hostDF,1,MatchLen))
| where isnotnull(MatchVHost)

| rename COMMENT as "The prep work is now done.  There is one of each lookup record, and ..."
| rename COMMENT as "... one record for each DF record for each potential length of match"
| rename COMMENT as "Now roll the info from the lookup records to the DF records by MatchVHost"
| eventstats values(index) as index values(mount) as mount  values(hostLU) as hostLU by MatchVHost

| rename COMMENT as "now we only need the linux records, since we've copied the matches"
| where index="linux" AND isnotnull(hostLU)
| sort 0 hostDF MatchLen

Since the above is all aircode, you'll need to look at the output at this point and see if there's anything interesting about it.

View solution in original post

DalJeanis · ‎05-23-2018

Not using precisely the method you are using. However, you can do anything if you lay enough pipe.

Let's see.

First, you should reverse the sides of the join.

Second, your code seems to expect only a single mount per host. is this a correct assumption? I'm going to assume no, but it doesn't matter because this gets you what you need either way...

 index=linux sourcetype=df earliest=-15m latest=now 
| fields index host mount
| stats values(mount) as mount by index host 

| rename COMMENT as "now we roll in the lookup records and mark them" 
| inputlookup append=t application_hosts.csv where application="test" AND host="spx*" 
| eval index=coalesce(index,"lookup")
| fields index host mount application
| dedup index host application keepempty=true

| rename COMMENT as "save the original hosts in different places based on record type."
| eval hostLU=case(index="lookup",host)
| eval hostDF=case(index="linux",host)

| rename COMMENT as "create a field to match on. In this phase this field will be empty if there is no wild card."
| eval MatchVHost = hostLU
| rex field=MatchVHost mode=sed "s/^[^*]+$//g s/[*].*$//g"

| rename COMMENT as "create a summary record that contains the lengths of all the wildcarded MatchVHost fields, plus a zero"
| rename COMMENT as "Roll them onto the linux records then kill the summary record"
| appendpipe [| fields MatchVHost 
    | eval MatchLen = len(MatchVHost) 
    | append [| makeresults | eval MatchLen=0 ] 
    | stats values(MatchLen) as MatchLen 
    | eval index="linux", killme="killme"
    ]
| eventstats values(MatchLen) as MatchLen by index
| where isnull(killme)

| rename COMMENT as "set MatchLen on all the lookup records as zero for the expand - zero means not to modify."
| eval MatchLen = coalesce(MatchLen,0)
| mvexpand MatchLen

| rename COMMENT as "Now we have one record for each lookup record, ..."
| rename COMMENT as "and one record for each host for each potential length of match, plus a zero record for do-not-modify"
| rename COMMENT as "Build the match values for the hostDF records, omitting any where the wildcard stem is not shorter than the actual value."
| eval MatchVHost = case(MatchLen=0,coalesce(MatchVHost,hostDF,hostLU),
    MatchLen<len(hostDF),substr(hostDF,1,MatchLen))
| where isnotnull(MatchVHost)

| rename COMMENT as "The prep work is now done.  There is one of each lookup record, and ..."
| rename COMMENT as "... one record for each DF record for each potential length of match"
| rename COMMENT as "Now roll the info from the lookup records to the DF records by MatchVHost"
| eventstats values(index) as index values(mount) as mount  values(hostLU) as hostLU by MatchVHost

| rename COMMENT as "now we only need the linux records, since we've copied the matches"
| where index="linux" AND isnotnull(hostLU)
| sort 0 hostDF MatchLen

Since the above is all aircode, you'll need to look at the output at this point and see if there's anything interesting about it.

DalJeanis · ‎05-23-2018

If you cut it off at about line 33, and if there was only one line in the lookup where application="test" AND host="spx*", then you should see:

1) One lookup record, with "spx" in MatchVHost, and "spx*" in hostLU.
2) Two records for each host, one with the full original host name in MatchVHost, and one with the first three characters in MatchVHost. Both of those will have the full original host in hostDF.

Anything other than the above means my aircode is bad.

kdimaria · ‎05-23-2018

Thank you so much for your time and effort! This works!

HiroshiSatoh · ‎05-23-2018

You can use wild card for LOOKUP.However,You need to set props.conf.

https://answers.splunk.com/answers/52580/can-we-use-wildcard-characters-in-a-lookup-table.html

kdimaria · ‎05-23-2018

Ignore the lookup, I just want to do | eval host="spx*" | join host [search index=linux blah blah ]
but that doesn't work.

Unless there is someway in the props.conf to help me JOIN using wildcards, I would appreciate if you moved your answer to a comment so this question looks unanswered and other people can chime in. thanks!

kdimaria · ‎05-23-2018

Sorry but that is not what I'm looking for. I am trying to JOIN with a wildcard. That question is doing a lookup and outputing a field from that same lookup that matches the wildcards. not the same.

How to join using a wildcard?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation