I have set of events from which there are a few  events that starts with a three digit number (for example 200 23 45 dgdgdgd dhdhddh).
These are the corrupt data/events and I don't want to include them in my search.
How do I write a condition or regex or any other way, where it will exclude the events if it starts with a three digit number ?
I agree with @niketnilay on dropping the bad data at the forwarder, but for the data you've already indexed you'll want a way to exclude it from your search.
It looks like your "good" data begins with an IP address. If that is so, then this will do the job:
... | regex _raw="^(\d{1,3}\.){3}\d{1,3}"
Where '...' is your base search.
 
					
				
		
hey @zacksoft
According to your commends and logic that you told what you can do is capture the event in field and exclude that field in a search.
<your_search> | rex field=_raw "^(?<CorruptData>[0-9]{3})" | search NOT CorruptData=*
let me know if this helps!
I never thought it would be possible to do it without stopping the bad data from being indexed. 
But your query seems like magic.  And eliminates the need to wrestle with the Splunk admins to have us configure the indexer from indexing bad data. I'll try this and let you know. Thank you.
 
					
				
		
The difference is whether you want load once during index time or always during search time, provided the unwanted data is of no use.
 
					
				
		
let me know and do not forget to accept/upvote if it works for you 😜
 
					
				
		
Personally, I would use a rex to match the results which start with the bad format, and then amend my search to only include results which don't have that field, although if you do not even want to index the bad data you should use @niketnilay 's approach
Something like:
<your search>|rex field=_raw "^(?<corruptData>\d\d\d)"|search corruptData!=*
Updated to match your sample.
The one which should be accepted are like 
40.118.209.1 0x735870x1 GG46989 [21/Dec/2014:00:00:00 -0500] "GET /rest/jphutenxporter/1.0/outputformatconfig/outputformatselected?_=1513833400783 HTTP/1.1" 200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"
The ones which shouldn't be accepted be like
200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"
I think if can identify the events that start with three digit numbers (like 200, 201, 401 etc.. ) and exclude that may work.
 
					
				
		
@zacksoft, seems like you are looking to drop the unwanted events from being indexed. For this you would need to pass your data through Heavy Forwarder which will have stanza for pushing unwanted events to nullQueue
Edit transforms.conf and add the following:
[YourSourceType]
TRANSFORMS-dropSourceTypeEvents=setnull
[setnull]
REGEX=^\d{3}
DEST_KEY=queue
FORMAT=nullQueue
Refer to documentation for details: http://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad#Filter_event_data_...
Does it have to pass through a heavy forwarder ? An universal forwarder won't do ?
and we can set up conditions in the stanza that will eliminate bad data from being indexed .
Is my understanding correct ?
 
					
				
		
@zacksoft - you could do this on your indexer if you don't have a HF.
Thanks @nickhillscpl. I'll work with Splunk admins to configure the set up that @niketnilay suggested in the transforms.conf file.
 
					
				
		
This is a good idea, if you don't need/want to index the corrupt data.
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		Hi @zacksoft ,
Can you please share some sample events which needs to be include and which are not?
It won't allow me to put any sample events.
So let me simplify  the events and type.
The one which should be accepted are like 
40.118.209.1 0x735870x1 GG46989 [21/Dec/2014:00:00:00 -0500] "GET /rest/jphutenxporter/1.0/outputformatconfig/outputformatselected?_=1513833400783 HTTP/1.1" 200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"
The ones which shouldn't be accepted be like
 200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"
I think if can identify the events that start with three digit numbers (like 200, 201, 401 etc.. ) and exclude that may work.
Following is an example of an event that should be included.
40.118.209.1 0x735870x1 GG46989 [21/Dec/2014:00:00:00 -0500] "GET /rest/jphutenxporter/1.0/outputformatconfig/outputformatselected?_=1513833400783 HTTP/1.1" 200 49 2 "https://phuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"
Following is an example that should NOT be included
200 49 2 "https://phuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"
The logic that I could of think of is, if we see a three digit numbers at the beginning of an event (such as 200, 201, 402 etc..) then we ought to exclude it as they are corrupt data.
