Solved: What is the correct strategy to index data from a ...

sylbaea · ‎10-25-2016

Hello,

I need to use Splunk to provide insight about data coming from our internal ticketing tool.

Each event will typically have :
- unique ticket id
- ticket type
- current status
- open date
- close date
- etc.

Once in Splunk, my users will typically want to get answers to following questions?
- Real time status (how much ticket currently open, etc.)
- Trend analysis (backlog over time, etc.)
- Historical analysis (how much ticket in a specific category at a specific point in time)
- etc.

My first question is to define the correct indexing strategy to be able to cover all use cases. Should I ?
1. index each ticket each time I detect a change
2. Perform a full snapshot once a day (assuming I am fine with a daily granularity)
3. Perform a full snapshot once a day + index each event each time I detect a change
- etc.

First strategy would be the most efficient from storage perspective, but it would mean I need to search "all time" just to answer simple question like "how much tickets have been opened so far..."
Furthermore I may loose data depending on index retention strategy.

Second strategy / third strategy do not look very efficient from indexing/storage perspective but I do not see any valid alternative.

Second question is about timestamp... Most of the search will likely use the ticket open date as time parameter but I guess the best choice for timestamp value is to fill it with "last update" field ? And then to override at search-time _time field in case we want to search by open date (or close date, etc.)

Regards.

koshyk · ‎10-25-2016

The key thing to remember is "indexing" means it is written permanently. So there is no "UPDATE" afterwards. Taking this into account the best option is

Index all entries from your ticketing system, including the same ticket at various workflow stages. So there should be constant pulling (say every 5 mins) of data even for same ticket with every updates.
Assign all data to separate index and if multiple type of data is there assign to multiple sourcetypes
Once data in splunk, the logic is to unique the data based on unique_ticket_id + latest timestamp (or other fields within the ticket). eg: to show latest state of an incident is to incident by doing "stats last(_time)"

In professional workflow systems like "ServiceNow", there are ready-made Addons which will do (1) and (2) for you. These addons have extra logic while pulling the data (i.e. do not take change management requests while it in Draft status). It is all about data. Once all data is in, you can do any kind of analysis/trends

My experience shows data is not that much in ticketing system. With tickets around 3000 a day and collecting all states takes approx 100MB of splunk license.

View solution in original post

sudip_bhattacha · ‎02-24-2017

Hi Guiseppe,

I am also stuck with your last question. Did you get a solution for latest changes in a ticket?

Sudip

koshyk · ‎02-25-2017

when you index constantly, take the

stats latest(your_cmdb_time_changing_column) by ticketnumber

I meant to say take the latest entry and that would get you the latest changes in the ticket. Normally the cmdb's are extracted in whole, so all the latest will contain all the delta's till now.

Rogelio · ‎06-23-2020

Hello.

I basically have the same question, but do not seem to understand the best practice.

My team is thinking of updating the ticketing system for ALL ticket data once every day.

We are also thinking of using a “create_date” ítem to create the timestamp.
I feel this is not the best approach, but I will need something more than feeling to change my team’s ideas.

Can you tell me if ingesting all data every night using the create_date as timestamp is a good option? My other option would be updating only the updated events(tickets) every night and use the update_time as timestamp. If there is not much difference, I would also like to hear that

gcusello · ‎10-25-2016

Hi sylbaea,
I think that you should choose the first one to have a near real time situation, obviously you have to choose a retention time aligned with your historical searches needs.
About this you could run some statistic searches and index results to maintain history also after retention period (see accelerated reports).
About timestamp, I'd use the opening date that is the guiding one, last update date is an additional information related to status, but the most important question is "how many tickets are/was open, manager, solved now or in a time period?".
After you can answer to the other questions (trends, historical Series, users/systems/applications more involved etc...).
Bye.
Giuseppe

sylbaea · ‎10-25-2016

Thanks Giuseppe,

If I do use opening date as timestamp... What about the case of updated ticket (for instance, when the ticket will be closed) ? In this simplified case, I will have two events for the same ticket with same timestamp ? How do you identify the most recent one, sorting by last update (which would be an additional time field) ?

sudip_bhattacha · ‎02-24-2017

Hi Giuseppe,

Have u got a workaround for latest metrics for an incident?

koshyk · ‎10-25-2016

The key thing to remember is "indexing" means it is written permanently. So there is no "UPDATE" afterwards. Taking this into account the best option is

Index all entries from your ticketing system, including the same ticket at various workflow stages. So there should be constant pulling (say every 5 mins) of data even for same ticket with every updates.
Assign all data to separate index and if multiple type of data is there assign to multiple sourcetypes
Once data in splunk, the logic is to unique the data based on unique_ticket_id + latest timestamp (or other fields within the ticket). eg: to show latest state of an incident is to incident by doing "stats last(_time)"

In professional workflow systems like "ServiceNow", there are ready-made Addons which will do (1) and (2) for you. These addons have extra logic while pulling the data (i.e. do not take change management requests while it in Draft status). It is all about data. Once all data is in, you can do any kind of analysis/trends

My experience shows data is not that much in ticketing system. With tickets around 3000 a day and collecting all states takes approx 100MB of splunk license.

sylbaea · ‎10-25-2016

Hello koshyk,

Thanks a lot for your feedback. So basically you suggest what I describe as my first option.
What about the case we want stats about... let's say... all tickets for the past five years.... I agree with your suggestion based on usage of "stats last(_time)" but in this in case it will take a while to parse all data to reach tickets that have not been modified since a long time.
If, in parallel, I have a lot of activity on others tickets, the end-user experience may be very poor.
Of course, I plan to implement some accelerated data model and other technics to mitigate this kind of issue but still I am wondering if the best approach.

Regards.

koshyk · ‎10-26-2016

do one time data onboarding to another index for past 6 months or so.

sylbaea · ‎10-26-2016

makes sense, thanks for the suggestion.

What is the correct strategy to index data from a ticketing system?

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

What is the correct strategy to index data from a ticketing system?

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...