
You can also specify more than one aggregation and with the stats command. Specifying multiple aggregations and multiple by-clause fields If there are two distinct hosts, the results are returned as a table similar to this:ģ. If you don't specify a name for the results using the `AS syntax, then the names of the columns are the name of the field and the name of the aggregation.

There are two columns returned: host and sum(bytes). The results contain as many rows as there are distinct host values. This example takes the incoming result set and calculates the sum of the bytes field and groups the sums by the values in the host field. The name of the column is the name of the aggregation. This search summarizes the bytes for all of the incoming results. If you just want a simple calculation, you can specify the aggregation without any other arguments. You can specify the AS and BY keywords in uppercase or lowercase in your searches. The AS and BY keywords are displayed in uppercase in the syntax and examples to make the syntax easier to read.

See Overview of SPL2 stats and chart functions. Many of these examples use the statistical functions. To learn more about the stats command, see How the stats command works. less smart use of dedup may cause more data to be carried around, e.g.The following are examples for using the SPL2 stats command. dedup should not allow batch mode searches, but instead requires event ordering and may therefore not allow parallel search pipelines, didn't verify this When looking at run time, make sure you do several executions to get a good average and iron out other activities on the system. You can verify this by looking at the big numbers to the right of in the job inspector, both should show similar and small amounts of data returned to the search head.

merge those lists into one on the search headĪssuming both commands are built well, there will not be a huge difference in performance.produce a deduplicated list on each indexer (prestats / prededup in remoteSearch in the job inspector) to return to the search head.extract, alias, calculate, lookup, whatever to produce the field.I'm just looking for improve my queries the best as I can.Īssuming you want a list of all values of a field in an index, both these searches would give you that: index=a | stats count by field | fields - countįundamentally, both searches have to do the same work:

So, what do you guys think? Is there any REAL performance improvement in using stats over using dedup? Is there any official answer about this question? Somebody even says here that stats dc(yourfield) it's even faster than a simple stats:įor me it makes completely sense, because it's easier to count (or distinct count) just elements by one unique field than check if that same element exists within ALL the data sets. I've been digging for days on the internet, but I can't find an official answer, just some good argumented approaches: Some days ago, one of my colleagues told me that "if you want to delete duplicates on your search, using a stats count by yourfield is more efficient than using dedup yourfield because it has better performance since stats doesn't have to compare ALL the elements of the search while dedup does", but he didn't give me to me any demonstration about it.
