All Apps and Add-ons

Need to remove numeric values from field to find top values

tachu
Explorer

I have millions of values indexed that look like this

,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_33097521 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_20709664 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_586_user_by_outfit_32587030 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_592_impression_33141624 ,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_33952008 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_new_33208512 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_new_stars_32270501 ,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_32635194 ,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_32635194 A}USER_V2_ID_TO_CAREER_TIER_1417875 A}US_MC_USER_EVENTS_BY_UID_U_802735 A}USER_OUTFIT_LOOK_RATING_STARS_SAVED_KEY3_17481979 A}USEROUTFIT_32305379

There is many more variations. I need to create a field that grabs all the alpha characters and excludes all numbers to be able to see top patters/values IE

MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_

MCTEST1_SI_EVENTS_TEST1_SI_EVENTS__impression_

MCTEST1_SI_EVENTS_TEST1_SI_EVENTS__user_by_outfit_

USER_V2_ID_TO_CAREER_TIER_

US_MC_USER_EVENTS_BY_UID_U_
0 Karma

kristian_kolb
Ultra Champion

You can easily do it in inline in a search with rex;

...| rex field=your_fieldname mode=sed "s/\d//g"

If you don't specify a field name, the sed script will run aginst the whole event (the _raw field).

Then you can do your stats/top/chart etc on the field.

http://docs.splunk.com/Documentation/Splunk/6.0.1/SearchReference/Rex

/K

kristian_kolb
Ultra Champion

alternatively, if you want to keep some numbers, like in TEST1 or V2, but not those sub-parts that are ONLY numbers, like _2342_ you can just alter the sed script slightly;

...| rex field=your_fieldname mode=sed "s/_\d+/_/g"

0 Karma

sideview
SplunkTrust
SplunkTrust

If these are in a field called myField then you would just tack this on the end of your search:

| eval myField=replace(myField,"_\d+$","")

and that will effectively clip the _33097521 off the end of all the values.

Therefore if you have this:

| eval myField=replace(myField,"_\d+$","") | top 100 myField

you'll get the top 100 values, considering only the part up to the big integers.

You can also use rex instead of eval. The following will do the same thing as the eval syntax above:

| rex field="myField" "(?<myField>.+)_\d+"`
0 Karma

kristian_kolb
Ultra Champion

Oops, answering a little late. Hmm.. Neither the rex nor the replace will handle numbers in the middle of the strings.

0 Karma

lukejadamec
Super Champion

Is that an example of one event, or is each line an event?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...