Need to remove numeric values from field to find t...

tachu · ‎01-28-2014

I have millions of values indexed that look like this

,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_33097521 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_20709664 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_586_user_by_outfit_32587030 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_592_impression_33141624 ,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_33952008 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_new_33208512 A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_new_stars_32270501 ,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_32635194 ,A}MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_32635194 A}USER_V2_ID_TO_CAREER_TIER_1417875 A}US_MC_USER_EVENTS_BY_UID_U_802735 A}USER_OUTFIT_LOOK_RATING_STARS_SAVED_KEY3_17481979 A}USEROUTFIT_32305379

There is many more variations. I need to create a field that grabs all the alpha characters and excludes all numbers to be able to see top patters/values IE

MCTEST1_SI_EVENTS_TEST1_SI_EVENTS_no_event_id_total_value_season_percent_stars_

MCTEST1_SI_EVENTS_TEST1_SI_EVENTS__impression_

MCTEST1_SI_EVENTS_TEST1_SI_EVENTS__user_by_outfit_

USER_V2_ID_TO_CAREER_TIER_

US_MC_USER_EVENTS_BY_UID_U_

kristian_kolb · ‎01-28-2014

You can easily do it in inline in a search with rex;

...| rex field=your_fieldname mode=sed "s/\d//g"

If you don't specify a field name, the sed script will run aginst the whole event (the _raw field).

Then you can do your stats/top/chart etc on the field.

http://docs.splunk.com/Documentation/Splunk/6.0.1/SearchReference/Rex

/K

kristian_kolb · ‎01-28-2014

alternatively, if you want to keep some numbers, like in TEST1 or V2, but not those sub-parts that are ONLY numbers, like _2342_ you can just alter the sed script slightly;

...| rex field=your_fieldname mode=sed "s/_\d+/_/g"

sideview · ‎01-28-2014

If these are in a field called myField then you would just tack this on the end of your search:

| eval myField=replace(myField,"_\d+$","")

and that will effectively clip the _33097521 off the end of all the values.

Therefore if you have this:

| eval myField=replace(myField,"_\d+$","") | top 100 myField

you'll get the top 100 values, considering only the part up to the big integers.

You can also use rex instead of eval. The following will do the same thing as the eval syntax above:

| rex field="myField" "(?<myField>.+)_\d+"`

kristian_kolb · ‎01-28-2014

Oops, answering a little late. Hmm.. Neither the rex nor the replace will handle numbers in the middle of the strings.

lukejadamec · ‎01-28-2014

Is that an example of one event, or is each line an event?

Need to remove numeric values from field to find top values

Exciting News: The AppDynamics Community Joins Splunk!

The All New Performance Insights for Splunk

Good Sourcetype Naming

Are you a member of the Splunk Community?

Need to remove numeric values from field to find top values

Exciting News: The AppDynamics Community Joins Splunk!

The All New Performance Insights for Splunk

Good Sourcetype Naming