All Apps and Add-ons

Splunk App for Web Analytics: Why my does my data model have empty fields when I automatically index data?

slr
Communicator

Hello there.

I'm having another issue with the Splunk App for Web Analytics... but I'm not sure where the problem is.

I created a script that download some data and put this data in a directory. Then, Splunk gets this data in a batch mode and indexes it in some index. On the other hand, I have configured the WebAnalytics App and works fine, but seems to have some problem between the automatically indexed data and the datamodel because the panels go crazy and don't show real data... it's like they can't get data from the datamodel... or even this data was corrupted. With this context, I have done some tests:

-- All is fine in the script logs
-- All is fine in the splunkd.log
-- When I do the search in the datamodel, I can't get any results (seems to be some fields empty)
-- I restart the splunk deployment, without any difference
-- When I rebuild the datamodel, the results are even worse
-- When I use the Pivot to see the datamodel, some fields that has data before, now are empty (http_session, http_locale,http_session_channel, http_session_duration, http_session_end, http_session_pageviews, http_session_referrer, http_session_referrer_domain, http_session_referrer_hostname, http_session_start)
-- If I delete the index with the data, create a new one and index the same data in it, after the configuration steps all works fine again.

Someone has any clue?

Regards.

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi slr

The scheduled search should pick up the new data even if it is from yesterday but the data model acceleration will have already run and it will not backfill. As the app was never designed for batch file delivery I have not tested this scenario before. In this case I also believe you need to modify the schedule for the DM acceleration. This is done by modifying the file datamodels.conf which can be found here:

SPLUNK_HOME/etc/apps/SplunkAppForWebAnalytics/local/datamodels.conf

[Web]
acceleration = 1

acceleration.cron_schedule = 2,12,22,32,42,52 * * * *

acceleration.cron_schedule = 0 21 * * *

I commented out the original schedule (every 10 minutes) to now run every day at 9pm. Change the schedule to about one hour after your batch file is dropped in.

Can you try this?

j

0 Karma

slr
Communicator

Hi jbjerke

Thank for your efforts.

I added the line in the SPLUNK_HOME/etc/apps/SplunkAppForWebAnalytics/local/datamodels.conf (the line acceleration.cron_schedule = 2,12,22,32,42,52 is in the default/datamodels.conf and I haven't commented) and I scheduled it one hour later after the batch (batch 3 a.m. and datamodel 4 a.m.)... but like always, Splunk indexed the data but the session fields from the data model are empty.

I will try with less time between both (10 minutes for example)

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Can you check inside the session lookup file that you have session keys in there for the time period you expect?

| inputlookup WA_sessions

j

0 Karma

slr
Communicator

Ok, I have tried with 10 minutes, and my panels didn't show me the new data, but when I launch the seach that you sugestted, Splunk show me sessions for the time period that I expect (in this case, from yesterday)

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Ok, thats's good. The session generation is working.

Can you troubleshoot the data model acceleration? Have a look at the Data Model Audit page and look for error messages.

0 Karma

slr
Communicator

I check the Data Model Audit page and I haven't seen any error. The field last error from the Acceleration Details panel isn't showing anything.

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi slr

Sorry for delaying my response. I think the problem lies in the batch mode nature of your data ingestion.

The Splunk App for Web Analytics generates sessions for the web traffic through a scheduled search that looks into the last 20 minutes worth of data. This scheduled search is then output into a temporary lookup which the data model uses. My theory is that you get new data into Splunk through the batch process but this data is then excluded from the scheduled search because of a timing issue. On you other server the batch might have a different schedule so it works.

When you rebuild the datamodel it will only use the sessions that can be currently found in the scheduled search session lookup. To rebuild the datamodel you should disable acceleration, re-run the session lookup (this can be found in the app menu), wait until that is finished and then re-enable acceleration. Can you try this?

My proposal to fix this, is to speed up the batch delivery of logs to as near real time as possible.

j

0 Karma

slr
Communicator

Hi @jbjerke_splunk and thank you for your answer.

After all this days of testing, I realise that my problem is similar to the problem explained by @kjhanson in his question:

https://answers.splunk.com/answers/389610/splunk-app-for-web-analytics-v17-lookups-not-popul.html

Seems to be that, sometimes, the scheduled Generate user sessions didn't get the data that I expented. After your answer, I suppouse that when you write... :

The Splunk App for Web Analytics generates sessions for the web traffic through a scheduled search that looks into the last 20 minutes worth of data.

... you mean the last 20 minutes of indexed data, and when you write... :

My theory is that you get new data into Splunk through the batch process but this data is then excluded from the scheduled search because of a timing issue

... you mean that is possible a "conflict" between the data indexing time and the scheduled lookup time? like both of them start at the same time?

About rebuild the datamodel, I suppose that you mislead the last step Expand data model "Web" by clicking on the arrow on the left hand side. Click "Rebuild", but yes I did this many times, most of them when I get an error and I have to re-do all the process, but sometimes (for test purposes) when I got the error I rebuild the datamodel and in that cases the solution is worse.

I did another test: I configured in Splunk that only monitored a file. Up to this point Ok. Then, I injected events from other day (from other file: cat file1 >> filemonitorizedbysplunk) and "the problem" was replicated (the events appeared in the index, but the fields in the data model were empty)

I haven't tried the real time yet but... You believe that I could solve my problem if I try?

0 Karma

slr
Communicator

More info:

I stopped the scheduled Generate user sessions, indexed new data and run manually the scheduled Generate user sessions. "the problem" has replicated.

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

The search "Generate user sessions - scheduled" is not the same as the search "Generate user sessions". The big non-scheduled search needs to complete first. Then start data model acceleration. If you have disabled the "Generate user sessions - scheduled", re-enable it after the "Generate user sessions" has completed.

j

0 Karma

slr
Communicator

Sorry, I mislead the context of my last comment. The context was:

  1. I have indexed 10 days of logs (10 files) without errors (index, datamodel and panels show the correct info). At this point, the Splunk App for Web Analytics works normally.
  2. I stopped the "Generate user sessions - scheduled"
  3. I added one day more (one file) in the monitored directory
  4. When Splunk has indexed the new data, I manually run the "Generate user sessions - scheduled"
  5. I wait until the Events in the data model increases (I can see this in the step 4 Data Model Acceleration check)
  6. I check any panel, but the new data is missing (the panel didn't show it)
  7. I scheduled again the "Generate user sessions - scheduled" and wait 10 minutes
  8. Nothing change
0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

In step 3 - is the data you added within the last 24hours? The scheduled search is looking for the last 20min of indexed data and the last 24hours in total window regardless.

0 Karma

slr
Communicator

Yes, in this case I added 4 hours ago the data from yesterday. With this context, the app would shown me some data, isn't it?

0 Karma

slr
Communicator

With this last context, if I put the file from yesterday in the directory, the app do it fine, but when I did last test that I comment, it doesn't.

0 Karma

slr
Communicator

I scheduled for today at 3 a.m. (local) the download of the yesterday data. I just check Splunk now and I have the same empty fields. If the App works like you said: 'The scheduled search is looking for the last 20min of indexed data and the last 24hours in total window regardless' the app would shown me some data, isn't it?

0 Karma

slr
Communicator

Ok, more info:

I have compared two Splunk instances; the two of them has the same data, but one has "the problem" and the other one hasn't it. When I check both indexes, they are the same. No problems at all. But when I check the data model, I find the same empty fields that I wrote in the question above (only in the Splunk instance with "the problem"). Seems to be that "the problem" happens between the index and the data model... but why?

0 Karma

slr
Communicator

Seems to be some issue between the automated script (cron) and the indexing step... but not always. When this happen, I delete the index, create a new one, put the same logs downloaded by the script in the directory monitored by Splunk, and after the configuration steps, all works fine (exactly the same files).

0 Karma

slr
Communicator

More info
-- I tried a fresh deployment in a Virtual instance, with the same results.
-- Until yesterday, when I copied the logs typing the commands (without cron), I've never had errors. But today when I copied four at once in the monitored directory, Splunk indexed correctly only the last one (In this case, the previous 10 days in the index weren't affected)
-- I did some test with only two days (two files) and when the cron activates the script, I always have problems with the first file indexed.

@jbjerke_splunk , Could you give me some advice or clue, please?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...