Hi,
I'm onboarding some new data and I'm working on the fields extraction.
Data is some proper JSON related to emails.
I'm having some hard time with the "attachments" field which I'm trying to make CIM compliant.
This attachment field is multivalue (it's a JSON array) and contains :
- The string "attachments" in the 0 position (the first position)
- The file name in every impair position (1, 3 , 5, etc.)
- The file hash in every pair position
So far, I've done it in SPL but I cant find a way to do that in a props.conf (because in props.conf, you can't do a multiline |eval : Every eval is treated in a parrallel way) or in a transforms.conf.
Here is what I've done in SPL :
| makeresults
| eval attachments = mvappend("attachments", "doc1.pdf", "abc123", "doc2.pdf", "def456", "doc3.bla", "ghx789")
``` To get rid of the string "attachments" ```
| eval attachments = mvindex(attachments, 1, mvcount(attachments)-1)
```To create an index```
| eval index_attachments=mvrange(0,mvcount(attachments),1)
```To write down in file_type is the value is file_name or file_hash :```
| eval modulo = mvmap(index_attachments, 'index_attachments'%2)
| eval file_type = mvmap(modulo, if(modulo=0,"file_name", "file_hash"))
``` To zip all that with a "::::SPLIT::::" ```
| eval file_pair = mvzip('file_type', attachments, "::::SPLIT::::")
``` To then create file_name and file_hash```
| eval file_name = mvmap(file_pair, if(match(file_pair, "file_name::::SPLIT::::.*"), 'file_pair', null() ))
| eval file_hash = mvmap(file_pair, if(match(file_pair, "file_hash::::SPLIT::::.*"), 'file_pair', null() ))
| eval file_name = mvmap(file_name, replace(file_name, "file_name::::SPLIT::::", ""))
| eval file_hash = mvmap(file_hash, replace(file_hash, "file_hash::::SPLIT::::", ""))
| fields - attachments file_pair file_type index_attachments modulo attachments
I'd be very glad to find a solution 🙂
Thanks for your kind help !
Your data is ugly. But almost all email data is ugly.
So my solution will be even uglier (and horribly inefficient).
| makeresults
| eval attachments = mvappend("attachments", "doc1.pdf", "abc123", "doc2.pdf", "def456", "doc3.bla", "ghx789")
| eval file_name=mvmap(split(replace(mvjoin(mvindex(attachments,1,mvcount(attachments)),"|"),"([^|]+)\|([^|]+)\|","\\1|\\2||"),"||"),replace(attachments,"\|.*",""))
| makeresults
| eval attachments = mvappend("attachments", "doc1.pdf", "abc123", "doc2.pdf", "def456", "doc3.bla", "ghx789")
| eval file_hash=mvmap(split(replace(mvjoin(mvindex(attachments,1,mvcount(attachments)),"|"),"([^|]+)\|([^|]+)\|","\\1|\\2||"),"||"),replace(attachments,".*\|",""))
You might want to adjust the separators from | and ||.
Thanks a million.
Your data is ugly. But almost all email data is ugly.
So my solution will be even uglier (and horribly inefficient).
| makeresults
| eval attachments = mvappend("attachments", "doc1.pdf", "abc123", "doc2.pdf", "def456", "doc3.bla", "ghx789")
| eval file_name=mvmap(split(replace(mvjoin(mvindex(attachments,1,mvcount(attachments)),"|"),"([^|]+)\|([^|]+)\|","\\1|\\2||"),"||"),replace(attachments,"\|.*",""))
| makeresults
| eval attachments = mvappend("attachments", "doc1.pdf", "abc123", "doc2.pdf", "def456", "doc3.bla", "ghx789")
| eval file_hash=mvmap(split(replace(mvjoin(mvindex(attachments,1,mvcount(attachments)),"|"),"([^|]+)\|([^|]+)\|","\\1|\\2||"),"||"),replace(attachments,".*\|",""))
You might want to adjust the separators from | and ||.