Getting Data In

Unable to pull backlog data through TA-ms-loganalytics.

jaihingorani
Path Finder

I have setup the TA-ms-loganalytics on my Splunk enterprise instance, and configured the inputs, i have given the start_date as 08/04/2020 00:00:00 in my inputs configuration, the current data flow (13/07/2020) is coming fine, but the count is very less OR zero for the past month dates, i validated the events/data are present there in my azure for the respective dates. below is my inputs.conf

[log_analytics://SourceLogs1_Backlog]

application_id = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

application_key = **************************

event_delay_lag_time = 15

index = myindex

sourcetype = mysourcetype

interval = 300

log_analytics_query = AuditLogs | where ResourceGroup != ""

resource_group = AAAA-BBB-CC

start_date = 08/04/2020 00:00:00

subscription_id = XXXXXXX-XXXXXX-XXXXX-XXXX-XXXXX

tenant_id = XXXXXXX-XXXXXX-XXXXX-XXXX-XXXXX

workspace_id = XXXXXXX-XXXXXX-XXXXX-XXXX-XXXXX

disabled = 0

 

[log_analytics://SourceLogs2_Backlog]

application_id = XXXXXXXXXXXXXXXXXXXXXXXXXXX

application_key = ***************************************

event_delay_lag_time = 15

index = myindex

sourcetype = mysourcetype

interval = 300

log_analytics_query = AzureDiagnostics | where ResourceGroup != ""

resource_group = AAAA-BBB-CC

start_date = 08/04/2020 00:00:00

subscription_id = XXXXXXX-XXXXXX-XXXXX-XXXX-XXXXX

tenant_id = XXXXXXX-XXXXXX-XXXXX-XXXX-XXXXX

workspace_id = XXXXXXX-XXXXXX-XXXXX-XXXX-XXXXX

disabled = 0

Labels (1)
0 Karma
1 Solution

jhantuSplunk1
Path Finder

Hi @jkat54.

I used this and able to pull JSON format data by direct API call.

 

https://api.loganalytics.io/v1/workspaces/{{workspace_id}}/query?query=AzureDiagnostics | where ResourceGroup != "" | where TimeGenerated between(datetime("2020-04-08 00:00:00") .. datetime("2020-04-12 23:59:59"))

View solution in original post

jaihingorani
Path Finder

@jkat54 appreciate if you could help here.

0 Karma

jkat54
SplunkTrust
SplunkTrust

You need to increase the lag time because azure will not guarantee TIMELINESS of their data delivery to the underlying storage mechanisms the OMS / Log Analytics Data is stored in.

The default lag is supposed to be 15 minutes.  It looks like you're configured for 15 seconds.

make your lag time a function of your interval.  If you collect data every 5 minutes (default), then your lag should be at least twice the interval, but in most cases, people are setting lag to 900-1800

0 Karma

jaihingorani
Path Finder

Hi

i changed it to 900 seconds to make it 15mins, and the data flow has been stopped now.  i am not receiving any events. I am not seeing any errors as well.

0 Karma

jaihingorani
Path Finder

Ok, i can see the logs being indexed, will monitor for sometime now, to see if it started capturing the logs for past months.

jhantuSplunk1
Path Finder

Hi @jkat54

Here is a scenario is that my OMS/Log Analytics has been stored data past 150 days i we just configure lag is 900 seconds =15 minutes and Interval = 300 seconds = 5 mints but i can noticed that data has been pulling but it has been huge delay. Also it has been pulling only past 1 months data but not pulling data for past 5 months.

Hence i took lag as 360 seconds = 6 minutes and interval is 180 minutes= 3 minutes i can see little fast data pulling but it pulling past 30 days only & pulling is in progress. Still we have data available past 5 months and we have Startdate as i put exactly before 5 months.

I don't know it will pull 5 months data or not?  As i can see only it's pulling data from current time to past 1 month and current time to future time.

Could you please guide me how could i pull the data only for past 5 months?

 

 

0 Karma

jhantuSplunk1
Path Finder

Hi @jkat54

Could you please suggest me on my above question? We can see some data (at-least few data from past 30 days only) not 100% backlog data from past 150 days.

Even if same scenario if lag as 900 seconds & interval as 60 or 300 seconds in all scenario just pulling few days data even if my under-layer OMS/Storage log analytics workspace is having enough back log data for past 150 days. Then why it is not pulling backlogs data 100% but its pulling new future data fine.

My requirement here is at-least i could pull or recovery my all old days data (past 150 days) backlog data 100%.

I am waiting your appreciated help here or please suggest any codes need to update here?

Please see attached graph here & very first graph is yesterday count & second one is today count? Seem to be backlog count is not increases even if storage workspace data are available for past 150 days.

 

 

 

 

0 Karma

jkat54
SplunkTrust
SplunkTrust

Once it has ran one time, it will be updated with new dates and never reach back in time to collect older events.

 

the solution should be to create a new input with the same config but set start date to 0.  You can't just change an existing input to start date 0 because the code only uses that setting on the very first run.

another way to "reset" an existing input to "first run" is to wipe out the kvstore data associated with the input.    Just seems easier to create a new input.

 

0 Karma

jhantuSplunk1
Path Finder

Thanks ,

Seem to be i am now getting below error like ERRORtime data '0' does not match format '%d/%m/%Y %H:%M:%S' HFR01

 

2020-07-14 19:25:29.052 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" ERRORtime data '0' does not match format '%d/%m/%Y %H:%M:%S' HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" ValueError: time data '0' does not match format '%d/%m/%Y %H:%M:%S' HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" (data_string, format)) HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" File "/opt/prd_sec/splunk/lib/python2.7/_strptime.py", line 332, in _strptime HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" start_datetime = datetime.datetime.strptime(str(input_item['start_date']),'%d/%m/%Y %H:%M:%S') HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" File "/opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/input_module_log_analytics.py", line 39, in collect_events HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" input_module.collect_events(self, ew) HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" File "/opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py", line 96, in collect_events HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" self.collect_events(ew) HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" File "/opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/ta_ms_loganalytics/modinput_wrapper/base_modinput.py", line 127, in stream_events HFR01
2020-07-14 19:25:29.021 message from "python /opt/prd_sec/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py" Traceback (most recent call last):

 

0 Karma

jaihingorani
Path Finder

Hi @jkat54 , we have done the changes by making the new inputs, and have not changed any parameter in the existing one. so ideally it should work with new inputs having the start_date = <4 month older date>, but it is not. Also the start_date = 0, throws the above error @jhantuSplunk1 comment ref.

0 Karma

jhantuSplunk1
Path Finder

Hi @jkat54 ,

Just more clarifying on my requirement here that i wanted pull my backlog data from April 9th 2020 to April 14th 2020 but this add-on is not able to get past back logs data.

Could you please help me how could i pull these 7 old days data? Yes, we can see it's pull some back days data one time on very first time run this app but after that it's only pulling current data not back days data.

 

 

0 Karma

jhantuSplunk1
Path Finder

2020-07-15 09:08:43,477 INFO pid=124798 tid=MainThread file=connectionpool.py:_new_conn:758 | Starting new HTTPS connection (1): 127.0.0.1
2020-07-15 09:08:44,494 INFO pid=124798 tid=MainThread file=setup_util.py:log_info:114 | Log level is not set, use default INFO
2020-07-15 09:08:44,494 INFO pid=124798 tid=MainThread file=splunk_rest_client.py:_request_handler:100 | Use HTTP connection pooling
2020-07-15 09:08:44,495 INFO pid=124798 tid=MainThread file=connectionpool.py:_new_conn:758 | Starting new HTTPS connection (1): 127.0.0.1
2020-07-15 09:08:44,504 ERROR pid=124798 tid=MainThread file=base_modinput.py:log_error:307 | Get error when collecting events.
Traceback (most recent call last):
File "/opt/splunk_temp/splunk/etc/apps/TA-ms-loganalytics/bin/ta_ms_loganalytics/modinput_wrapper/base_modinput.py", line 127, in stream_events
self.collect_events(ew)
File "/opt/splunk_temp/splunk/etc/apps/TA-ms-loganalytics/bin/log_analytics.py", line 96, in collect_events
input_module.collect_events(self, ew)
File "/opt/splunk_temp/splunk/etc/apps/TA-ms-loganalytics/bin/input_module_log_analytics.py", line 39, in collect_events
start_datetime = datetime.datetime.strptime(str(input_item['start_date']),'%d/%m/%Y %H:%M:%S')
File "/opt/splunk_temp/splunk/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '0' does not match format '%d/%m/%Y %H:%M:%S'

0 Karma

jhantuSplunk1
Path Finder

Hi @jkat54 ,

Do you feel anything problem? Why once i used start_date as 0 it's giving ValueError: time data '0' does not match format '%d/%m/%Y %H:%M:%S'. but if i m giving some date it is working to getting current data<=15 mints data but not past 4 months data

props.conf

[azure:loganalytics:ad*]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = XYZ
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%Z
disabled = false
TZ = Europe/Stockholm
TIME_PREFIX =
category = Structured
KV_MODE =
SHOULD_LINEMERGE = true

 

inputs.conf

[log_analytics://GFDG_HSE_AD_YYYYYYY]
application_id = TYU
application_key = XXXXXXXXXXXXX
event_delay_lag_time = 900
index = jh_test
sourcetype = azure:loganalytics:ad:AuditLogs
interval = 300
log_analytics_query = ContainerLog | where _ResourceId != ""
resource_group = njjj-gff-oug-gh
start_date = 0
subscription_id = MNO
tenant_id = KPL
workspace_id = HFR
disabled = 0

 

0 Karma

jkat54
SplunkTrust
SplunkTrust

Did you create a NEW input with a different name than before?

 

sorry 0 didn't work.  

0 Karma

jaihingorani
Path Finder

Yes, we created NEW inputs(NEW STANZA NAME than before).

0 Karma

jaihingorani
Path Finder

the requirement is to get the missing 7 days data (9th April - 14th April). And when i give start_date even older than my LA retention, it randomly fetches the data for some days, and continue to fetch the current data only(never goes back to pull the old data).

0 Karma

jkat54
SplunkTrust
SplunkTrust

have you tried using the API itself to pull the data and see if for some reason the API is limiting the time frame?

 

 

0 Karma

jaihingorani
Path Finder

Yes, we just tried that, we are getting the same count through API itself same as LA workspace count.

0 Karma

jaihingorani
Path Finder

698,744 this count we were able to fetch in small time frame. attached the screenshot.698,744 this count we were able to fetch in small time frame. attached the screenshot.

698,744 this count we were able to fetch in small time frame.
attached the screenshot.
0 Karma

jkat54
SplunkTrust
SplunkTrust

So why not dump the data you pulled from the API to file and ingest?

0 Karma

jkat54
SplunkTrust
SplunkTrust

If you can write an api query that works better than my app... you probably should be writing code for my app.  You also, don't really need me or my app 😉

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...