My python is 3.8.5 and splunk-sdk is 1.6.16. My Splunk developer gives me a URL and I get its search string to retrieve data as shown below.
Below is my search string and additional python code: search/earliest/latest are added after copy/paste search string.
Yup.
You can use fillnull https://docs.splunk.com/Documentation/Splunk/8.2.3/SearchReference/Fillnull
I think it's what you need.
Firstly - what is this abomination
search sourcetype="builder:payeeservice" host="JWPP*BLDR*P*" "*PayeeAddResponse" "*" "*" "*" "*" "*" "*" "*"
Wildcards at the beginning cause you to scan whole events. Not a very good idea. And those repeated wildcards are pointless.
Secondly, think if you can do it as some form of stats. Joins are much less effective and have limitations.
Thirdly - start your search from the beginning and add subsequent steps to see where is the error. It's much easier to pinpoint a mistake this way than debug whole complicated search.
And lastly - it has nothing to do with python since the search itself gives you errors.
Good advice. Now, I only keep the following simple search statement with "_raw" column only as it contains all my required fields.
I would expect the output dataframe has columns from first "TenantId" to last "AccountNumber" with values such as 13744, XX2222.
Come on. You have Splunk, don't just pull the raw data and process it on the receiver's side.
Do a proper search and retrieve the final results.
In your case the events look XML-ish. Maybe you should use spath or xpath to extract the data you want from the events.
And once again - avoid "*something" as a condition.
Rick:
I modified my search string based on your hints. In one minute at 9:33am today, there are 1672 rows. Unfortunately, 23 rows do not have PayeeType column so they have 12 columns while all others have 13 columns which will cause failure to load whole data into Pandas dataframe. Below is an example of _raw column. It doesn't have PayeeType. In addition, there is a chance that AccountNumber may have the same issue. Is there a way to let Splunk generate "null" value for them so that all rows have 13 columns even though PayeeType and/or AccountNumber might be missing in _raw value?
Thanks.
"2021-11-18 09:33:06,900 [59] INFO FiservLog.stdlog - <PayeeAddManager><TenantId>FI05</TenantId><UserId>559852410</UserId><SourceMethodName>LogInfoSecure</SourceMethodName><SourceLineNumber>234</SourceLineNumber><Message>WARNING:Error adding Payee:Subscriber status prevents this action from being completed</Message><Timestamp>2021-11-18T14:33:06.899739Z</Timestamp><Exception /><AdditionalInformation><SessionId>463949F06E9F4B93A57570E8B56489A0201T4Q4P019019D467AADD625BC88A04</SessionId><Timestamp>11/18/2021 2:33:06 PM</Timestamp><CorrelationId>1637245986853</CorrelationId><PayeeName>PNC CARD SERVICES</PayeeName><Address>null</Address><AccountNumber>XXXXXXXXXXXX8590</AccountNumber></AdditionalInformation></PayeeAddManager>"
Yup.
You can use fillnull https://docs.splunk.com/Documentation/Splunk/8.2.3/SearchReference/Fillnull
I think it's what you need.
Yes, I got "null" value for PayeeType after adding "|fillnull value=null PayeeType" in my SEARCH_STRING.
Thanks.