Getting Data In

Before Indexing Filter Based ON JSON format data match

kagamalai
Explorer

We want to filter the event before indexing based on filed value match.

For example below is the single event, if the below condition is match  we need to index the whole event otherwise drop the whole event.

WAFAction = unknown
WAFFlags = 0

Please advise how to achieve the same ? 

Sample event  JSON format  with time stamp

{ [-]
BotScore: 98
BotScoreSrc: Machine Learning
CacheCacheStatus: unknown
CacheResponseBytes: 1877
CacheResponseStatus: 200
CacheTieredFill: false
ClientASN: 701
ClientCountry: us
ClientDeviceType: desktop
ClientIP: 196.142.18.94
ClientIPClass: noRecord
ClientMTLSAuthCertFingerprint:
ClientMTLSAuthStatus: unknown
ClientRequestBytes: 3912
ClientRequestMethod: POST
ClientRequestPath: /common/endpoint/
ClientRequestProtocol: HTTP/2
ClientRequestScheme: https
ClientRequestSource: eyeball
ClientRequestURI: /common/endpoint/
ClientRequestUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36
ClientSSLCipher: ECDHE-ECDSA-AES128-GCM-SHA256
ClientSSLProtocol: TLSv1.2
ClientSrcPort: 50738
ClientTCPRTTMs: 14
ClientXRequestedWith: XMLHttpRequest
EdgeCFConnectingO2O: false
EdgeColoCode: EWR
EdgeColoID: 11
EdgeEndTimestamp: 2021-06-24T01:33:21Z
EdgePathingOp: wl
EdgePathingSrc: macro
EdgePathingStatus: nr
EdgeRateLimitAction:
EdgeRateLimitID: 0
EdgeRequestHost: api.xyz.com
EdgeResponseBodyBytes: 71
EdgeResponseBytes: 814
EdgeResponseCompressionRatio: 0
EdgeResponseContentType: application/json
EdgeResponseStatus: 200
EdgeServerIP: 62.15.62.15
EdgeStartTimestamp: 2021-06-24T01:33:21Z
EdgeTimeToFirstByteMs: 160
FirewallMatchesActions: [ [+]
]
FirewallMatchesRuleIDs: [ [+]
]
FirewallMatchesSources: [ [+]
]
OriginDNSResponseTimeMs: 0
OriginIP: 44.12.238.17
OriginRequestHeaderSendDurationMs: 0
OriginResponseBytes: 0
OriginResponseDurationMs: 148
OriginResponseHTTPExpires:
OriginResponseHTTPLastModified:
OriginResponseHeaderReceiveDurationMs: 90
OriginResponseStatus: 200
OriginResponseTime: 148000000
OriginSSLProtocol: TLSv1.2
OriginTCPHandshakeDurationMs: 18
OriginTLSHandshakeDurationMs: 40
ParentRayID: 00
RayID: 6642351fccb80ca5
SecurityLevel: med
SmartRouteColoID: 0
UpperTierColoID: 0
WAFAction: unknown
WAFFlags: 0
WAFMatchedVar:
WAFProfile: unknown
WAFRuleID:
WAFRuleMessage:
WorkerCPUTime: 0
WorkerStatus: unknown
WorkerSubrequest: false
WorkerSubrequestCount: 0
ZoneID: 134451718
ZoneName: yy.xxxxx.com
}
Show as raw text
host = idx1.server001.net.netsource = s3://cloudflare/logs/20210624/20210624T013257Z_20210624T013327Z_0c91c265.log.gzsourcetype = _json  

Labels (4)
0 Karma
1 Solution

mdorobek
Path Finder

Hello kagamalai,

this depends on the sourcetype youre useing. I dont know how familiar youre with config files in Splunk. It merges the settings from all copies of the file, using a location-based prioritization scheme. When different copies have conflicting attribute values (that is, when they set the same attribute to different values), it uses the value from the file with the highest priority. More information here: https://docs.splunk.com/Documentation/Splunk/8.2.0/Admin/Wheretofindtheconfigurationfiles 

Its best practise for prebuild apps that they define the configs in /opt/splunk/etc/<appname>/default/<config> User specific changes should be located in /opt/splunk/etc/<appname>/local/<config> 

This has two advantages:

1. The configuration in the local folder has priority over the default one

2. If you update the app in future the local folder wont be changed

 

Heres an example. If you use the aws:cloudtrail sourcetype of the aws app you can locate the props.conf  under /opt/splunk/etc/apps/Splunk_TA_aws/local/props.conf with the following entry:

 

[aws:cloudtrail]
TRANSFORMS-filter_events = sendNull, keepUnknown

 

 

According to this the transforms.conf under /opt/splunk/etc/apps/Splunk_TA_aws/local/transforms.conf 

[sendNull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepUnknown]
REGEX = WAFAction\":\"unknown\",\"WAFFlags\":\"0\"
DEST_KEY = queue
FORMAT = indexQueue

 

Note that the REGEX has to match your specific condition. Further your transforms needs a individual name because otherwise it could be overwritten if you have also defined it in another place.

 

Kind regards,

mdorobek

View solution in original post

mdorobek
Path Finder

Hello kagamalai, 

you can send events based on a regex matching to a specific queue before indexing. To only index some events send all events to the nullqueue and define a regex to send just the ones you want to keep to the indexqueue.

Since the transforms are executed from left to right "keepUnknown" overwrites the nullqueue with the indexqueue if the regex matches.

Heres a example:

props.conf

 

 

[yourSourcetype]
TRANSFORMS-filter_events = sendNull, keepUnknown

 

 

 

transforms.conf

 

 

[sendNull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepUnknown]
REGEX = WAFAction\":\"unknown\",\"WAFFlags\":\"0\"
DEST_KEY = queue
FORMAT = indexQueue

 

 

 

If this helps you, you are welcome to accept the answer.

 

Kind regards,

mdorobek

0 Karma

kagamalai
Explorer

Thank you for your reply, I have a doubt on this, if the condition is match the entire event will be indexed or only matched field and value only indexed ? please clarify

0 Karma

mdorobek
Path Finder

Hello kagamalai,

when data gets indexed it proceeds through a pipeline where event processing occours. This pipeline consists of several shorter pipelines that are strung together. You can see the pipeline on the picture. If you want further information you can read the following wiki entry: https://wiki.splunk.com/Community:HowIndexingWorks 

 

mdorobek_0-1624546565144.png

 

The event breaking and merging happens in the parsing and merging pipeline. Transforms commands are executed in the typing pipeline. This means that a transforms is executed on every event and not on every line and splunk keeps the whole event which matches the regex. Of course this assumes that the event breaking has been configured correctly.

Does this clarify your question?

kind regards,

mdorobek

 

0 Karma

kagamalai
Explorer

Thank you for detailed explanation and your time.

I am using below apps for downloading the cloudflare logs. so where i have to use or edit the props.conf, transforms.conf files which path?.  all the apps having the props/transforms files in default directory.

1. Splunk Add-On for AWS     ----- downloading the logs from s3.

2. Cloudflare Technology Add-On for Splunk     -------

3. Cloudflare App for Splunk.

I am running the splunk on Linux 

/opt/splunk/etc/apps

Please advise path for this scenario

Thanks in advance 

0 Karma

mdorobek
Path Finder

Hello kagamalai,

this depends on the sourcetype youre useing. I dont know how familiar youre with config files in Splunk. It merges the settings from all copies of the file, using a location-based prioritization scheme. When different copies have conflicting attribute values (that is, when they set the same attribute to different values), it uses the value from the file with the highest priority. More information here: https://docs.splunk.com/Documentation/Splunk/8.2.0/Admin/Wheretofindtheconfigurationfiles 

Its best practise for prebuild apps that they define the configs in /opt/splunk/etc/<appname>/default/<config> User specific changes should be located in /opt/splunk/etc/<appname>/local/<config> 

This has two advantages:

1. The configuration in the local folder has priority over the default one

2. If you update the app in future the local folder wont be changed

 

Heres an example. If you use the aws:cloudtrail sourcetype of the aws app you can locate the props.conf  under /opt/splunk/etc/apps/Splunk_TA_aws/local/props.conf with the following entry:

 

[aws:cloudtrail]
TRANSFORMS-filter_events = sendNull, keepUnknown

 

 

According to this the transforms.conf under /opt/splunk/etc/apps/Splunk_TA_aws/local/transforms.conf 

[sendNull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepUnknown]
REGEX = WAFAction\":\"unknown\",\"WAFFlags\":\"0\"
DEST_KEY = queue
FORMAT = indexQueue

 

Note that the REGEX has to match your specific condition. Further your transforms needs a individual name because otherwise it could be overwritten if you have also defined it in another place.

 

Kind regards,

mdorobek

View solution in original post

kagamalai
Explorer

Hi,

How do I match OR  condition here ?  I want to do index if any one match below it should be index

"WAFFlags":"1"

or

"FirewallMatchesActions":["log"]

or

"FirewallMatchesActions":["log","log"]

or

"FirewallMatchesSources":["firewallRules","waf"]

0 Karma

mdorobek
Path Finder

Hello kagamalai,

you can write an regex with an or condition. Heres an example:

(\"WAFFlags\":\"1\"|\"FirewallMatchesActions\":\[\"log\"\]|\"FirewallMatchesActions\":\[\"log\",\"log\"\]|\"FirewallMatchesSources\":\[\"firewallRules\",\"waf\"\])

https://regex101.com/r/sjwTa8/1 

 

Kind regards ,

mdorobek

0 Karma

kagamalai
Explorer

Hi

I have to write with below format ? in transforms.conf ?

 

regex = (\"WAFFlags\":\"1\"|\"FirewallMatchesActions\":\[\"log\"\]|\"FirewallMatchesActions\":\[\"log\",\"log\"\]|\"FirewallMatchesSources\":\[\"firewallRules\",\"waf\"\])

Thanks.

0 Karma

mdorobek
Path Finder

Yes

0 Karma

kagamalai
Explorer

Thanks, How many OR operator I can add ? could you please verify the below is correct format 

REGEX = (\"WAFFlags\":\"1\"|\"FirewallMatchesActions\":\[\"log\"\]|\"FirewallMatchesActions\":\[\"log\",\"log\"\]|\"FirewallMatchesSources\":\[\"firewallRules\",\"waf\"\]|\"FirewallMatchesActions\":\[\"block\"\]|\"FirewallMatchesActions\":\[\"challengeSolved\"\]|\"FirewallMatchesActions\":\[\"challenge\"\])

I have doubt on this its not filtering 

0 Karma

kagamalai
Explorer

@mdorobek ,

Please advise its not working while adding the OR operator" | " is there any other methods to filter the same ?

 

0 Karma

mdorobek
Path Finder

Hello kagamalai,

I already gave you a link to check your regular expression: https://regex101.com/r/sjwTa8/1 

Did you restart Splunk after the changes?

 

If it doesn't work, which it should, there is another way. Just run several transforms in a row. Its exactly the same like before. This method is much less efficient because each event must be viewed four times. However, heres the example for the four events you want to keep.

props.conf

 

[aws:cloudtrail]
TRANSFORMS-filter_events = sendNull, keepWAFFlags, keepFirewallMatchesActionslog, keepFirewalMatchesActionloglog, keepFirewallMatchesSources

 

 

transforms.conf

[sendNull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepWAFFlags]
REGEX = \"WAFFlags\":\"1\"
DEST_KEY = queue
FORMAT = indexQueue

[keepFirewallMatchesActionslog]
REGEX = \"FirewallMatchesActions\":\[\"log\"\]
DEST_KEY = queue
FORMAT = indexQueue

[keepFirewalMatchesActionloglog]
REGEX = \"FirewallMatchesActions\":\[\"log\",\"log\"\]
DEST_KEY = queue
FORMAT = indexQueue

[keepFirewallMatchesSources]
REGEX = \"FirewallMatchesSources\":\[\"firewallRules\",\"waf\"\]
DEST_KEY = queue
FORMAT = indexQueue

 

 

0 Karma

kagamalai
Explorer

Thank you its working fine.

0 Karma

kagamalai
Explorer

Thank you for reply, Yes tested with regex101 site before using the pattern the pattern is below

Please have look is there any mistake ?

REGEX = (\"WAFFlags\":\"1\"|\"FirewallMatchesActions\":\[\"log\"\]|\"FirewallMatchesActions\":\[\"log\",\"log\"\]|\"FirewallMatchesSources\":\[\"firewallRules\",\"waf\"\]|\"FirewallMatchesActions\":\[\"block\"\])

If above method is work its very useful.

 

0 Karma

kagamalai
Explorer

Thank you its working......

0 Karma

venkatasri
SplunkTrust
SplunkTrust

Hi @kagamalai 

Can you share the _raw event what you have shared is UI version of JSON. change from list to raw in search app UI.

 

0 Karma

kagamalai
Explorer

Hi,

Thanks for your response, please look the below raw format.


{"BotScore":99,"BotScoreSrc":"Machine Learning","CacheCacheStatus":"unknown","CacheResponseBytes":2254,"CacheResponseStatus":200,"CacheTieredFill":false,"ClientASN":6805,"ClientCountry":"de","ClientDeviceType":"mobile","ClientIP":"79.14.16.54","ClientIPClass":"noRecord","ClientMTLSAuthCertFingerprint":"","ClientMTLSAuthStatus":"unknown","ClientRequestBytes":4059,"ClientRequestHost":"ty.xyz.com","ClientRequestMethod":"POST","ClientRequestPath":"/common/endpoint/","ClientRequestProtocol":"HTTP/2","ClientRequestReferer":"https://ty.xyz.com/games/Popular/","ClientRequestScheme":"https","ClientRequestSource":"eyeball","ClientRequestURI":"/common/endpoint/","ClientRequestUserAgent":"Mozilla/5.0 (Linux; Android 10; MAR-LX1A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Mobile Safari/537.36","ClientSSLCipher":"ECDHE-ECDSA-AES128-GCM-SHA256","ClientSSLProtocol":"TLSv1.2","ClientSrcPort":46452,"ClientTCPRTTMs":33,"ClientXRequestedWith":"XMLHttpRequest","EdgeCFConnectingO2O":false,"EdgeColoCode":"DUS","EdgeColoID":48,"EdgeEndTimestamp":"2021-06-24T01:33:16Z","EdgePathingOp":"wl","EdgePathingSrc":"macro","EdgePathingStatus":"nr","EdgeRateLimitAction":"","EdgeRateLimitID":0,"EdgeRequestHost":"ty.xyz.com","EdgeResponseBodyBytes":300,"EdgeResponseBytes":1192,"EdgeResponseCompressionRatio":0,"EdgeResponseContentType":"application/json","EdgeResponseStatus":200,"EdgeServerIP":"18.41.42.13","EdgeStartTimestamp":"2021-06-24T01:33:16Z","EdgeTimeToFirstByteMs":224,"FirewallMatchesActions":[],"FirewallMatchesRuleIDs":[],"FirewallMatchesSources":[],"OriginDNSResponseTimeMs":0,"OriginIP":"14.11.28.17","OriginRequestHeaderSendDurationMs":0,"OriginResponseBytes":0,"OriginResponseDurationMs":215,"OriginResponseHTTPExpires":"","OriginResponseHTTPLastModified":"","OriginResponseHeaderReceiveDurationMs":215,"OriginResponseStatus":200,"OriginResponseTime":215000000,"OriginSSLProtocol":"TLSv1.2","OriginTCPHandshakeDurationMs":0,"OriginTLSHandshakeDurationMs":0,"ParentRayID":"00","RayID":"664235027de321b1","SecurityLevel":"med","SmartRouteColoID":0,"UpperTierColoID":0,"WAFAction":"unknown","WAFFlags":"0","WAFMatchedVar":"","WAFProfile":"unknown","WAFRuleID":"","WAFRuleMessage":"","WorkerCPUTime":0,"WorkerStatus":"unknown","WorkerSubrequest":false,"WorkerSubrequestCount":0,"ZoneID":134451718,"ZoneName":"ty.xyz.com"}
Show syntax highlighted
host = svr003.xyz.netsource = s3://prod-cloudflare/logs/ty.xyz.com/20210624/20210624T013257Z_20210624T013327Z_0c91c265.log.gzsourcetype = _json

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!