I wrote this search to look at a user agent string (RTG_Browser) and identify the operating system. I plan on writing another search to identify the browser from the user agent string. I then plan to append these columns to my current query. A regular expression would not work here as it has to be renamed from Windows NT 6.2 to Windows 8.
index=access | eval OS = if(match(RTG_Browser,"Windows NT 6.1"), "Windows 7", if (match(RTG_Browser,"Windows NT 6.3") OR match(RTG_Browser,"Windows NT 6.2"), "Windows 8", if (match(RTG_Browser,"Macintosh"), "OS X", if(match(RTG_Browser,"like Mac OS X") OR match(RTG_Browser,"like mac o sx"),"m.iOS", if(match(RTG_Browser,"Android"), "m.Android", if(match(RTG_Browser,"Windows NT 5.1") OR match(RTG_Browser,"Windows NT 5.2"),"Windows XP", if(match(RTG_Browser, "bingbot"), "Bing Bot", if(match(RTG_Browser,"Windows Phone"),"m.Windows", if(match(RTG_Browser, "Windows NT 6.0"),"Windows Vista", if(match(RTG_Browser,"Windows NT 10.0"),"Windows 10",if(match(RTG_Browser, "X11") OR match(RTG_Browser, "Linux"),"Linux",if(match(RTG_Browser,"Googlebot"),"Google Bot","OTHER"))))))))))))
Instead of nesting your if
statements, use a case
statement like this (but it probably has the same performance even though it is cleaner and easier to read):
index=access| eval OS =case(
(RTG_Browser="Windows NT 6.1"), "Windows 7",
((RTG_Browser="Windows NT 6.3") OR (RTG_Browser="Windows NT 6.2")), "Windows 8",
(match(RTG_Browser,"Macintosh")), "OS X",
(match(RTG_Browser,"like Mac OS X") OR match(RTG_Browser,"like mac o sx")),"m.iOS",
(match(RTG_Browser,"Android")), "m.Android",
(match(RTG_Browser,"Windows NT 5.1") OR match(RTG_Browser,"Windows NT 5.2")),"Windows XP",
(match(RTG_Browser, "bingbot")), "Bing Bot",
(match(RTG_Browser,"Windows Phone")),"m.Windows",
(match(RTG_Browser, "Windows NT 6.0")),"Windows Vista",
(match(RTG_Browser,"Windows NT 10.0")),"Windows 10",
(match(RTG_Browser, "X11") OR match(RTG_Browser, "Linux")),"Linux",
(match(RTG_Browser,"Googlebot")),"Google Bot",
(1==1), "OTHER")
Instead of nesting your if
statements, use a case
statement like this (but it probably has the same performance even though it is cleaner and easier to read):
index=access| eval OS =case(
(RTG_Browser="Windows NT 6.1"), "Windows 7",
((RTG_Browser="Windows NT 6.3") OR (RTG_Browser="Windows NT 6.2")), "Windows 8",
(match(RTG_Browser,"Macintosh")), "OS X",
(match(RTG_Browser,"like Mac OS X") OR match(RTG_Browser,"like mac o sx")),"m.iOS",
(match(RTG_Browser,"Android")), "m.Android",
(match(RTG_Browser,"Windows NT 5.1") OR match(RTG_Browser,"Windows NT 5.2")),"Windows XP",
(match(RTG_Browser, "bingbot")), "Bing Bot",
(match(RTG_Browser,"Windows Phone")),"m.Windows",
(match(RTG_Browser, "Windows NT 6.0")),"Windows Vista",
(match(RTG_Browser,"Windows NT 10.0")),"Windows 10",
(match(RTG_Browser, "X11") OR match(RTG_Browser, "Linux")),"Linux",
(match(RTG_Browser,"Googlebot")),"Google Bot",
(1==1), "OTHER")
Would a case statement be faster then a lookup?
A lookup would probably be faster but it depends so the only sure way to know is to do it both ways and then examine the job inspector
to see which one was faster on various test searches.
This seems ripe for a lookup.
io operation of opening a file should be miliseconds and it happens once per session.
Do you suggest I put this in a CSV file then do a lookup? Would the search performance be much faster with a lookup compared to a nested-if?
yes, I suggest a lookup even though there is a price to pay for opening the file, but once that's done, it will work like a case statement, so the computation should be fast.
Can you explain the performance cost with opening a file? Would it be a one time thing on the first search or would it be every time? What's the performance hit? Are we talking seconds or minutes?