Splunk Search

Field extraction

maxitroncoso
Engager

I'm trying to extract fields from this event using regular expressions,

Multiple times I receive the following error.. I tried to avoid it by creating fields out of every part of the event

"The extraction failed. If you are extracting multiple fields, try removing one or more fields. Start with extractions that are embedded within longer text strings."

The fields I'd like to extract are:
FIRST ITEM (and every other item that goes after it)
FIRST ITEM AMOUNT ( The number that goes before first item)
GRAND TOTAL
LASTNAME
FIRSTNAME
The 8 digit number that goes before LASTNAME

  "receipt-text":"  COMPANY NAME INTERNATIONAL\n      Branch name goes here\n            App name\n\n         1/16/2020 22:30  \nServer:123  App Name          \nGuests:0      \n_________________________________\n          appName User\n      1 First Item     5.00\n      2 Second Item                 7.00\n        @ 3.50\n\n       \n      Gross Sales     12.00\n         Discount      0.00\n         Subtotal     12.00\n              Tax      0.00\n   Service Charge      2.16   \n  Delivery Charge      1.95  \n   Additional Tip      0.00\n            Total     16.11\n\n\nUser                       16.11\n  12345678 LASTNAME, FIRSTNAME\n\n       GRAND TOTAL         16.11\n\n______________________________________\nT1234 C300      1/16/2020 22:30\n______________________________________      \n\n \n\nAdditional Tip    $___________________\n\n\n\nTotal             $___________________\n\n\n\n\n______________________________________\nSignature\n         ***************\n   A 18% service charge is included\n  in your bill, however, please use\n   the Additional Tip line to\n  recognize outstanding service.\n         ***************\nCheck No.: 12345678\n@##12345678@"

Any help you can deliver to point me into the right direction, I'd be really happy.
I hope I'm soon able to help other people aswell
If you've come this far, thanks a lot for reading my issue!

0 Karma
1 Solution

to4kawa
Ultra Champion

UPDATED: add time and check_no

your_search
| eval raw=replace(_raw,"\\\n","
 ") 
| makemv delim="
 " raw 
| streamstats count as session 
| mvexpand raw 
| rename raw as _raw
| rex "^\s+(?<time>\d+.\d+.\d{4}\s\d+\:\d+)"
| rex "(?:\d+ )(?<items>[\w ]+)\s+(?<amounts>\d+\.\d{2})"
| rex "GRAND TOTAL\s+(?<ground_total>\d+\.\d{2})" 
| rex "\s+(?<user_id>\d{8})\s+(?<last_name>\w+), (?<first_name>\w+)" 
| rex "Check No\.:\s+(?<check_no>\d{8})" 
| stats list(*) as * by session 
| eval time=strptime(time,"%m/%d/%Y %H:%M")
| fieldformat time=strftime(time,"%m/%d/%Y %H:%M")
| table session time user_id first_name last_name items amounts ground_total check_no

Previous my answer:

| makeresults
| eval _raw="   \"receipt-text\":\"  COMPANY NAME INTERNATIONAL\n      Branch name goes here\n            App name\n\n         1/16/2020 22:30  \nServer:123  App Name          \nGuests:0      \n_________________________________\n          appName User\n      1 First Item     5.00\n      2 Second Item                 7.00\n        @ 3.50\n\n       \n      Gross Sales     12.00\n         Discount      0.00\n         Subtotal     12.00\n              Tax      0.00\n   Service Charge      2.16   \n  Delivery Charge      1.95  \n   Additional Tip      0.00\n            Total     16.11\n\n\nUser                       16.11\n  12345678 LASTNAME, FIRSTNAME\n\n       GRAND TOTAL         16.11\n\n______________________________________\nT1234 C300      1/16/2020 22:30\n______________________________________      \n\n \n\nAdditional Tip    $___________________\n\n\n\nTotal             $___________________\n\n\n\n\n______________________________________\nSignature\n         ***************\n   A 18% service charge is included\n  in your bill, however, please use\n   the Additional Tip line to\n  recognize outstanding service.\n         ***************\nCheck No.: 12345678\n@##12345678@"
| eval raw=replace(_raw,"\\\n","
")
| makemv delim="
" raw
| streamstats count as session
| mvexpand raw
| rename raw as _raw
| rex "(?:\d+ )(?<items>[\w ]+)\s+(?<amounts>\d+\.\d{2})"
| rex "GRAND TOTAL\s+(?<ground_total>\d+\.\d{2})"
| rex "\s+(?<user_id>\d{8})\s+(?<last_name>\w+), (?<first_name>\w+)"
| stats list(*) as * by session
| table session user_id first_name last_name items amounts ground_total

Hi, @maxitroncoso
items and amounts are one-on-one.
As I don't know how you display these, I create like above.

View solution in original post

0 Karma

to4kawa
Ultra Champion

UPDATED: add time and check_no

your_search
| eval raw=replace(_raw,"\\\n","
 ") 
| makemv delim="
 " raw 
| streamstats count as session 
| mvexpand raw 
| rename raw as _raw
| rex "^\s+(?<time>\d+.\d+.\d{4}\s\d+\:\d+)"
| rex "(?:\d+ )(?<items>[\w ]+)\s+(?<amounts>\d+\.\d{2})"
| rex "GRAND TOTAL\s+(?<ground_total>\d+\.\d{2})" 
| rex "\s+(?<user_id>\d{8})\s+(?<last_name>\w+), (?<first_name>\w+)" 
| rex "Check No\.:\s+(?<check_no>\d{8})" 
| stats list(*) as * by session 
| eval time=strptime(time,"%m/%d/%Y %H:%M")
| fieldformat time=strftime(time,"%m/%d/%Y %H:%M")
| table session time user_id first_name last_name items amounts ground_total check_no

Previous my answer:

| makeresults
| eval _raw="   \"receipt-text\":\"  COMPANY NAME INTERNATIONAL\n      Branch name goes here\n            App name\n\n         1/16/2020 22:30  \nServer:123  App Name          \nGuests:0      \n_________________________________\n          appName User\n      1 First Item     5.00\n      2 Second Item                 7.00\n        @ 3.50\n\n       \n      Gross Sales     12.00\n         Discount      0.00\n         Subtotal     12.00\n              Tax      0.00\n   Service Charge      2.16   \n  Delivery Charge      1.95  \n   Additional Tip      0.00\n            Total     16.11\n\n\nUser                       16.11\n  12345678 LASTNAME, FIRSTNAME\n\n       GRAND TOTAL         16.11\n\n______________________________________\nT1234 C300      1/16/2020 22:30\n______________________________________      \n\n \n\nAdditional Tip    $___________________\n\n\n\nTotal             $___________________\n\n\n\n\n______________________________________\nSignature\n         ***************\n   A 18% service charge is included\n  in your bill, however, please use\n   the Additional Tip line to\n  recognize outstanding service.\n         ***************\nCheck No.: 12345678\n@##12345678@"
| eval raw=replace(_raw,"\\\n","
")
| makemv delim="
" raw
| streamstats count as session
| mvexpand raw
| rename raw as _raw
| rex "(?:\d+ )(?<items>[\w ]+)\s+(?<amounts>\d+\.\d{2})"
| rex "GRAND TOTAL\s+(?<ground_total>\d+\.\d{2})"
| rex "\s+(?<user_id>\d{8})\s+(?<last_name>\w+), (?<first_name>\w+)"
| stats list(*) as * by session
| table session user_id first_name last_name items amounts ground_total

Hi, @maxitroncoso
items and amounts are one-on-one.
As I don't know how you display these, I create like above.

0 Karma

maxitroncoso
Engager

@to4kawa thanks a lot for your swift answer, I took what you wrote and it worked as intended!

I tried the following expression in order to add a date and time column to the table, but whenever I use it, instead of one date and time I get a lot per event

| eval indextime= strftime(_indextime,"%Y-%m-%d %H:%M:%S")

I also figured out that I would need to extract the number after Check No.
Tried to do so with the following expression:

| rex "nCheck No.:\s+(?\d{8})"

but i'm getting empty results on that column, any idea of why that may be?

0 Karma

to4kawa
Ultra Champion

As my query runs, \n is remove.

| rex "Check No\.:\s+(?<check_no>\d{8})"

try this.

to4kawa
Ultra Champion

hi, @maxitroncoso
I update my answer, add time and check_no. please confirm.

0 Karma

maxitroncoso
Engager

Checknumber works as intended (I replaced it for order_number) but time is empty

| eval time=strptime(time,"%m/%d/%Y %H:%M")
| fieldformat time=strftime(time,"%m/%d/%Y %H:%M")
| table session time user_id first_name last_name items amounts ground_total check_no

0 Karma

to4kawa
Ultra Champion

| rex "^\s+(?<time>\d+.\d+.\d{4}\s\d+\:\d+)"

This rex may not be working in your actual log.
Since there were actually two time displayed, this regular expression was used to exclude one.

| rex "(?<time>\d+.\d+.\d{4}\s\d+\:\d+)"
and
| eval time=strptime(mvdedup(time),"%m/%d/%Y %H:%M")
This might be better.

Run it line by line and check rex result, if time is extracted.

0 Karma

maxitroncoso
Engager

| eval time=strptime(mvdedup(time),"%m/%d/%Y %H:%M")
| rex "(?\d+.\d+.\d{4}\s\d+:\d+)"

in this order it worked as expected, after the other rex expressions and before the stats, so it ended up like this

your_search
| eval raw=replace(_raw,"\\n","
")
| makemv delim="
" raw
| streamstats count as session
| mvexpand raw
| rename raw as _raw
| rex "^\s+(?\d+.\d+.\d{4}\s\d+:\d+)"
| rex "(?:\d+ )(?[\w ]+)\s+(?\d+.\d{2})"
| rex "GRAND TOTAL\s+(?\d+.\d{2})"
| rex "\s+(?\d{8})\s+(?\w+), (?\w+)"
| rex "Check No.:\s+(?\d{8})"
| eval time=strptime(mvdedup(time),"%m/%d/%Y %H:%M")
| rex "(?\d+.\d+.\d{4}\s\d+:\d+)"
| stats list(*) as * by session
| table session time user_id first_name last_name items amounts ground_total check_no

you've been amazingly helpful, thanks a lot!

0 Karma

to4kawa
Ultra Champion

you're welcome and happy splunking.

0 Karma

efavreau
Motivator

If the field extraction in the GUi didn't work for you, did you try six different regular expressions using the rex command? If so, what exactly were your expressions?

###

If this reply helps you, an upvote would be appreciated.
0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...