Getting Data In

Why is my subquery returning duplicate values?

abouttathagata
New Member

Hello,

I am trying to run a query, which will give me the results not returning by the inner query. Basically any userid can have url="/data/a.jsp" and also url="^/data/abc.* ". I want userids having url="/data/a.jsp" to not appear in the search for url
="(^/data/abc.*) ................"
. Here is the main query:

host="hostname" sourcetype="source_type" NOT 
[search  host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" | fields userid] | 
search userid!="-" | regex url="(^/data/abc.*) |(^/data/def.*)|(^/data/ghi.*)|(^/data/klm.*)" |
dedup url | eval user_status = "no" | dedup userid| 
lookup main_data userid OUTPUT userid, first_name,last_name| table userid, first_name, last_name, user_status 

I tried several ways, but still the duplicate userids are coming. Please help me out. Thanks in advance.

Regards,
Arka

0 Karma

woodcock
Esteemed Legend

The problem is this line:

 | lookup main_data userid OUTPUT userid first_name last_name

Because that file contains duplicate userid values AND because you are outputting userid again (which is pretty silly), it is doing exactly what you are telling it to do and outputting them all on each line. First, fix your lookup file like this:

| inputlookup main_data
| dedup userid
| outputlookup main_data
0 Karma

vishaltaneja070
Motivator

Hello @abouttathagata

Output of this query is also having duplicate userid:

 host="hostname" sourcetype="source_type" NOT 
 [search  host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" | fields userid] | 
 search userid!="-" | regex url="(^/data/abc.*) |(^/data/def.*)|(^/data/ghi.*)|(^/data/klm.*)" |
 dedup url | eval user_status = "no" | dedup userid
0 Karma

abouttathagata
New Member

yes it is the same query right. So it will give the duplicate userid only.

0 Karma

vishaltaneja070
Motivator

Hello @abouttathagata

If at the end of query, dedup userid is mentioned and still you are able to see duplicate userid, then i think the issue is with data. Same userid has either different case or having extra space in the value etc.

Try to run this query to better check this:
host="hostname" sourcetype="source_type" NOT
[search host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" | fields userid] |
search userid!="-" | regex url="(^/data/abc.) |(^/data/def.)|(^/data/ghi.)|(^/data/klm.)" |
dedup url | eval user_status = "no | stats count by userid

0 Karma

abouttathagata
New Member

No Hope. Still same result. Data is not a problem I think.

0 Karma

vishaltaneja070
Motivator

is it possible to put two duplicate set you are getting while running the above command?

0 Karma

abouttathagata
New Member

I am using following query to get the data for user status = yes

host="hostname" sourcetype="source_type" |search userid!="-"
|search url="/data/a.jsp" | eval user_status="yes" | dedup userid
| lookup main_data userid OUTPUT userid, first_name,last_name
| table userid, first_name, last_name, user_status, url

Result:

userid = sam01
first_name=sam
last_name=Rogers
user_status=yes
url=/data/a.jsp

following query to get the data for user status = no

host="hostname" sourcetype="source_type" NOT
[search host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" | fields userid]
| search userid!="-" | regex url="(^\/data\/abc.)|(^\/data\/def.)|(^\/data\/ghi.)|(^\/data\/klm.)"
| dedup url | eval user_status = "no" | dedup userid
| lookup main_data userid OUTPUT userid, first_name,last_name
| table userid, first_name, last_name, user_status, url

Result:

userid = sam01
first_name=sam
last_name=Rogers
user_status=no
url=/data/*

0 Karma

woodcock
Esteemed Legend

Try this:

index=YouShoulAlwaysSpeciryIndexValues host="hostname" sourcetype="source_type" userid!="-"
NOT [search  host="hostname" sourcetype="source_type" url = "/data/a.jsp" | stats count BY userid | table userid]
| regex url="(^/data/abc.*) |(^/data/def.*)|(^/data/ghi.*)|(^/data/klm.*)" 
| dedup userid
| lookup main_data userid OUTPUT userid first_name last_name
| eval user_status = "no" 
| table userid first_name last_name user_status 
0 Karma

abouttathagata
New Member

Thanks woodcock. But still I am getting the duplicate values.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Try this to filter userids in the subsearch.

host="hostname" sourcetype="source_type" NOT 
[ search host="hostname" sourcetype="source_type" | search url = "/data/a.jsp" userid!="-" | stats count by userid | fields userid | format ]
| regex url="(^\/data\/abc\.)|(^\/data\/def\.)|(^\/data\/ghi\.)|(^\/data\/klm\.)" 
| dedup userid | eval user_status = "no"
| lookup main_data userid OUTPUT userid, first_name,last_name
| table userid, first_name, last_name, user_status
---
If this reply helps you, an upvote would be appreciated.
0 Karma

abouttathagata
New Member

Thanks for your quick response. But still the same. The userid present in search url = "/data/a.jsp" still appearing. I am not sure but looks like the inner query not returning anything. If I run it individually it is running fine though.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.