Getting Data In

Unable to get the firewall data in splunk from syslogs server? How to troubleshoot this issue.

Hemnaath
Motivator

Hi All, Currently got a request to ingest the newly configured Paloalto device data into splunk. Configured syslog-ng.conf to reach the device data from source to syslog servers, and could see that data from the Paloalto firewall are reaching the syslogs servers under the path/opt/syslogs/paloalto/device3pano.xxx..com/paloalto.log the same server is used as Heavy forwarder server to read the syslog data directly from the path /opt/syslogs/paloalto/.../paloalto.log* in to splunk.

Architecture details :
Currently we have 5 individual indexer instances, 5 individual heavy forwarder instances, 3 clustered search head, one deployment instance and one Deployer instance, they all are running with splunk 6.6.1 version.

Syslog data flow:
All the five heavy forwarder instances acts as the syslogs server and data from network,firewall, ESX etc are read directly and forwarded to indexer instances from all five heavy forwarder instances.

Problem details: We could see the data being ingested from another paloalto firewall device2 in splunk from the same location from HF instances by executing the splunk query "index=firewall sourcetype="paloalto:network:traffic" source="/opt/syslogs/paloalto/device2pano.xxx..com/paloalto.log"

But when we try to execute the same query with different sourcetype="paloalto:network:log" we are unable to see any data in splunk from any paloalto device. sourcetype="paloalto:network:traffic" is defined in the props.conf and whereas sourcetype="paloalto:network:log" defined in the inputs.conf and both the configuration are placed in the Heavyforwarder instances and data reaches the HF instance first and then ingests the data in indexer instances.

Kindly let guide me how to start troubleshooting this issue.

woodcock
Esteemed Legend

You need to deploy the Palo-Alto TA to the Heavy Forwarders (the first full instance of Splunk to handle the events) because that is where the "cooking" of the sourcetype-renaming happens.

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

if you are seeing the data intermittently I would look at :

1.]are the tcpout queues blocking on the HF?
index=_internal host="" source=*metrics.log* group=queue
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size)
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size)
| eval fill_perc=round((curr/max)*100,2)
| timechart minspan=30s perc95(fill_perc) by name useother=false limit=15

the above search is easier to look at with
Visualization >area chart
format > Multi series mode > Yes

if you are seeing the tcpout_* queues flatlined near 100% then your events will have delay and best to look at either the network or downstream indexers with the same search (replacing host with the indexer host name). If on the indexers you see the same issue you will want to see what queues are full/blocking. If its the index queue you likely have some i/o bottleneck so you may need to improve i/o or add more indexers. If the index queue is full it will back up all of the queues upstream, including that of the tcpout queue on any upstream forwarders/HFs. https://wiki.splunk.com/Community:HowIndexingWorks
peaks and valleys are normal , but queues flatlined near 100% is indication of a problem.

2.]is data distributed evenly from forwarders across all indexers:
index=_internal host= "" group=tcpin_connections | eval sourceHost=if(isnull(hostname),sourceHost,hostname)
| stats sum(kb) AS KB dc(sourceHost) AS "forwarder count" min(_time) AS mintime max(_time) AS maxtime by host
| convert ctime(mintime)
| convert ctime(maxtime)
| eval GB = round(KB / 1024 / 1024 ,2)

with the above search you should see roughly the same amount of forwarders sending to each indexer and same amount of data distributed over each indexer, say over a 1hr period. If you see any vast disparities between forwarder count or data volume , check your outputs.conf settings on the forwarders and make sure forwarders a sending to all indexers.

3.]If any UFs have high volume on an indexer they may be "sticking" on one indexer longer than it should and not properly loadbalancing between all indexers. This can happen with very high volume data sources where splunk cannot determine EOF. In that case we have parameters in props.conf for handling this for Universal Forwarders (EVENT_BREAKER_ENABLE + EVENT_BREAKER) http://docs.splunk.com/Documentation/Splunk/7.0.0/Admin/Propsconf

index=_internal host="<forwarder>" source=*metrics.log* tcp_avg_thruput=* | timechart span=10s max(tcp_avg_thruput) by destIp limit=0

the above search is easiest to read by narrowing down a timeframe to say 1hr then clicking the visualization tab:
column chart
format : Stacked Mode > stacked
multi-series mode > No

the destIp values are your indexers and you should see the column chart x axis values switching between indexers every ~30sec or so. If not, you make want to look at setting (EVENT_BREAKER_ENABLE + EVENT_BREAKER) in props.conf on the UF.

0 Karma

Hemnaath
Motivator

thanks rphillips, let me check by executing the above query and post you the result based on the output.

0 Karma

spinnamshetty
New Member

@Hemnaath, did above solution work for you ?, i am also having similar issue here

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

One solution is to setup a common sourcetype ie: paloalto in inputs.conf for the monitor stanza and then set the different sourcetypes via props/transforms by matching some regex in the host name. Ie: name your firewall hosts something like firewallhost1, firewallhost2 and your network hosts networkhost1, networkhost2 or whatever naming convention makes sense for your organization.

Generally you want to assign separate sourcetypes to data with different timestamp formats. If the syslog data from both firewall and network devices have the same timestamp format you may want to re-evaluate why you are assigning different sourcetypes. The example below assumes we are setting different sourcetypes based on a regex match in the hostname

configured on HF:
inputs.conf

 [monitor:///opt/syslogs/paloalto/*/paloalto.log*] 
    sourcetype = paloalto
    disabled = 0
    host_segment = 4
    #example host names which REGEX in transforms.conf will match:
    #firewallhost1
    #networkhost1

props.conf

[paloalto]
TRANSFORMS-set_sourcetype = set_sourcetype_traffic,set_sourcetype_log

transforms.conf

[set_sourcetype_traffic]
FORMAT = paloalto:network:traffic
REGEX = firewallhost
SOURCE_KEY = MetaData:Host
DEST_KEY = MetaData:Sourcetype

[set_sourcetype_log]
FORMAT = paloalto:network:log
REGEX = networkhost
SOURCE_KEY = MetaData:Host
DEST_KEY = MetaData:Sourcetype

with the above configuration you will get events with sourcetype=paloalto:network:traffic for hosts named firewallhost* and sourcetype=paloalto:network:log for hosts named networkhost*

Alternatively you could create separate directories on your syslog-ng/HF host to receive the syslog data into and set everything as desired in inputs.conf

on HF:
inputs.conf

[monitor:///opt/syslogs/paloalto/firewall/*/paloalto.log*] 
sourcetype = paloalto:network:traffic
disabled = 0
index = firewall
host_segment = 5

[monitor:///opt/syslogs/paloalto/network/*/paloalto.log*] 
sourcetype = paloalto:network:log
disabled = 0
index = network
host_segment = 5
0 Karma

Hemnaath
Motivator

Hi rphillips, thanks for your effort on this, Actually we are using common sourcetype configured in the inputs.conf for monitoring stanza then set the different sourcetypes via props/transforms by matching some regex. We have props.conf set as with TZ = UTC for all the paloalto devices that are reaching the syslogs servers.

Details:
inputs.conf

[monitor:///opt/syslogs/paloalto/.../paloalto.log*] 
index=firewall 
sourcetype=paloalto:network:log 
host_segment = 4 

Props.conf - Partial details not the entire props.conf

 [paloalto:network:log]
    category = Network & Security
    description = Output produced by the Palo Alto Networks Next-generation Firewall and Traps Endpoint Security Manager
    pulldown_type = true
    # This first line adjusts PAN-OS 6.1.0 threat logs to revised 6.1.1+ format where the reportid field is at the end.
    SEDCMD-6_1_0 = s/^((?:[^,]+,){3}THREAT,(?:[^,]*,){27}".*",[^,]*,)(\d+),((?:[^,]*,){3})(\d+,0x\d+,(?:[^,]*,){14})$/\1\3\4,\2/
    SHOULD_LINEMERGE = false
    MAX_TIMESTAMP_LOOKAHEAD = 44
    TRANSFORMS-sourcetype = pan_threat, pan_traffic, pan_system, pan_endpoint

And we have other props.conf details configured for each sourcetype in the same props.conf file containing Field Aliases, Report-search etc which I have not posted in this comment.  



 Transforms.conf  details: sourcetype routing

    [pan_threat]
    DEST_KEY = MetaData:Sourcetype
    REGEX = ^[^,]+,[^,]+,[^,]+,THREAT,
    FORMAT = sourcetype::paloalto:network:threat

    [pan_traffic]
    DEST_KEY = MetaData:Sourcetype
    REGEX = ^[^,]+,[^,]+,[^,]+,TRAFFIC,
    FORMAT = sourcetype::paloalto:network:traffic

    [pan_system]
    DEST_KEY = MetaData:Sourcetype
    REGEX = ^[^,]+,[^,]+,[^,]+,SYSTEM,
    FORMAT = sourcetype::paloalto:network:system

And apart from the above transform config details, we have Field extraction, Endpoint etraction, lookup are configure under transforms.conf which I have not posted here.

Hey I could see that newly configured paloalto device data are reaching the sourcetype=sourcetype::paloalto:network:system but the data are intermediate.

index=firewall host="test01pano.xxxx.com" source="/opt/syslogs/paloalto/test01pano.xxx.com/paloalto.log" sourcetype="paloalto:network:system" time frame = last 24 hours

12/12/17
11:04:11.000 PM
Dec 13 04:04:11 test01pano.xxxx.com 1,2017/12/13 04:04:11,000702580900,SYSTEM,general,0,2017/12/13 04:04:11,,general,,0,0,general,informational,"Connection to Update server closed: updates.paloaltonetworks.com, source: 10.x.x.x",557,0x0,0,0,0,0,,test01pano

Now we could see the data but its intermediate, kindly guide me on this.

thanks in advance.

0 Karma

adonio
Ultra Champion

looks like the source is off
Paloalto firewall are reaching the syslogs servers under the path/opt/syslogs/paloalto/device3pano.xxx..com/paloalto.log the same server is used as Heavy forwarder server to read the syslog data directly from the path /opt/syslogs/mguard/.../paloalto.log* in to splunk.

0 Karma

Hemnaath
Motivator

hi adoino, thanks for your effort, it was type error which i had made while posting the question, now i have corrected the question.

Paloalto firewall are reaching the syslogs servers under the path/opt/syslogs/paloalto/device3pano.xxx..com/paloalto.log the same server is used as Heavy forwarder server to read the syslog data directly from the path /opt/syslogs/paloalto/.../paloalto.log* in to splunk.

inputs.conf details:

[monitor:///opt/syslogs/paloalto/.../paloalto.log*]
index=firewall
sourcetype=paloalto:network:log
host_segment = 4

Kindly guide me on how to troubleshoot this issue.

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

@hemnaath it sounds like you have two input monitor stanzas with the same monitor path but two different sourcetypes. Is that accurate?

[monitor:///opt/syslogs/paloalto/.../paloalto.log*]
index=firewall
sourcetype=paloalto:network:traffic
host_segment = 4

[monitor:///opt/syslogs/paloalto/.../paloalto.log*]
index=firewall
sourcetype=paloalto:network:log
host_segment = 4

0 Karma
Get Updates on the Splunk Community!

Splunk Smartness with Brandon Sternfield | Episode 3

Hello and welcome to another episode of "Splunk Smartness," the interview series where we explore the power of ...

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...