Getting Data In

Why is my subquery returning duplicate values?

New Member

Hello,

I am trying to run a query, which will give me the results not returning by the inner query. Basically any userid can have url="/data/a.jsp" and also url="^/data/abc.* ". I want userids having url="/data/a.jsp" to not appear in the search for url
="(^/data/abc.*) ................"
. Here is the main query:

host="hostname" sourcetype="source_type" NOT 
[search  host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" | fields userid] | 
search userid!="-" | regex url="(^/data/abc.*) |(^/data/def.*)|(^/data/ghi.*)|(^/data/klm.*)" |
dedup url | eval user_status = "no" | dedup userid| 
lookup main_data userid OUTPUT userid, first_name,last_name| table userid, first_name, last_name, user_status 

I tried several ways, but still the duplicate userids are coming. Please help me out. Thanks in advance.

Regards,
Arka

0 Karma

Esteemed Legend

The problem is this line:

 | lookup main_data userid OUTPUT userid first_name last_name

Because that file contains duplicate userid values AND because you are outputting userid again (which is pretty silly), it is doing exactly what you are telling it to do and outputting them all on each line. First, fix your lookup file like this:

| inputlookup main_data
| dedup userid
| outputlookup main_data
0 Karma

Motivator

Hello @abouttathagata

Output of this query is also having duplicate userid:

 host="hostname" sourcetype="source_type" NOT 
 [search  host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" | fields userid] | 
 search userid!="-" | regex url="(^/data/abc.*) |(^/data/def.*)|(^/data/ghi.*)|(^/data/klm.*)" |
 dedup url | eval user_status = "no" | dedup userid
0 Karma

New Member

yes it is the same query right. So it will give the duplicate userid only.

0 Karma

Motivator

Hello @abouttathagata

If at the end of query, dedup userid is mentioned and still you are able to see duplicate userid, then i think the issue is with data. Same userid has either different case or having extra space in the value etc.

Try to run this query to better check this:
host="hostname" sourcetype="sourcetype" NOT
[search host="hostname" sourcetype="source
type" | search url = "/data/a.jsp" | fields userid] |
search userid!="-" | regex url="(^/data/abc.) |(^/data/def.)|(^/data/ghi.)|(^/data/klm.)" |
dedup url | eval user_status = "no | stats count by userid

0 Karma

New Member

No Hope. Still same result. Data is not a problem I think.

0 Karma

Motivator

is it possible to put two duplicate set you are getting while running the above command?

0 Karma

New Member

I am using following query to get the data for user status = yes

host="hostname" sourcetype="sourcetype" |search userid!="-"
|search url="/data/a.jsp" | eval user
status="yes" | dedup userid
| lookup maindata userid OUTPUT userid, firstname,lastname
| table userid, first
name, lastname, userstatus, url

Result:

userid = sam01
firstname=sam
last
name=Rogers
user_status=yes
url=/data/a.jsp

following query to get the data for user status = no

host="hostname" sourcetype="sourcetype" NOT
[search host="hostname" sourcetype="source
type" | search url = "/data/a.jsp" | fields userid]
| search userid!="-" | regex url="(^\/data\/abc.)|(^\/data\/def.)|(^\/data\/ghi.)|(^\/data\/klm.)"
| dedup url | eval userstatus = "no" | dedup userid
| lookup main
data userid OUTPUT userid, firstname,lastname
| table userid, firstname, lastname, user_status, url

Result:

userid = sam01
firstname=sam
last
name=Rogers
user_status=no
url=/data/*

0 Karma

Esteemed Legend

Try this:

index=YouShoulAlwaysSpeciryIndexValues host="hostname" sourcetype="source_type" userid!="-"
NOT [search  host="hostname" sourcetype="source_type" url = "/data/a.jsp" | stats count BY userid | table userid]
| regex url="(^/data/abc.*) |(^/data/def.*)|(^/data/ghi.*)|(^/data/klm.*)" 
| dedup userid
| lookup main_data userid OUTPUT userid first_name last_name
| eval user_status = "no" 
| table userid first_name last_name user_status 
0 Karma

New Member

Thanks woodcock. But still I am getting the duplicate values.

0 Karma

SplunkTrust
SplunkTrust

Try this to filter userids in the subsearch.

host="hostname" sourcetype="source_type" NOT 
[ search host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" userid!="-" | stats count by userid | fields userid | format ]
| regex url="(^\/data\/abc\.)|(^\/data\/def\.)|(^\/data\/ghi\.)|(^\/data\/klm\.)" 
| dedup userid | eval user_status = "no"
| lookup main_data userid OUTPUT userid, first_name,last_name
| table userid, first_name, last_name, user_status
---
If this reply helps you, an upvote would be appreciated.
0 Karma

New Member

Thanks for your quick response. But still the same. The userid present in search url = "/data/a.jsp" still appearing. I am not sure but looks like the inner query not returning anything. If I run it individually it is running fine though.

0 Karma