42

I've written a classification algorithm that does a pretty good job at classifying some datasets. However, I compared my algorithm to other classification methods, and their results exceed my results by a little. For instance, when classifying a dataset from a repository, my algorithm is getting 95% correct while another algorithm usually gets 99% correct.

Should I continue to publish my results although 1) my algorithm is a little slower, and 2) my algorithm's results are not as good as the other results.

I'm a little torn. I'm excited as my paper and results are a contribution to the classification field as the algorithm is novel. Also, I'm of the stance that you can't beat EVERY algorithm. If we only published algorithms that could (loosely) beat other algorithms either A.) we'd never have new innovations, or B.) eventually every dataset would be 100% classified each time, or C.) every algorithm could instantaneously classify a dataset (speed).

I hope that my algorithm will continue to grow and others will pick it up and extend it. I hope that one day -- with tweaks -- my algorithm can reach 99% too.

I'm afraid of being rejected by the journal again. Yes, my first submission was rejected. One of the reasons for the rejection was that my dataset was small. However, when the dataset was small I was beating the other algorithms. Now, as the dataset has grown, the other algorithms are now beating me. I'd like not to be rejected again.

ff524
  • 108,934
  • 49
  • 421
  • 474
Veil
  • 391
  • 3
  • 4
  • 54
    I don't mean to be cruel, but if the state of the art is 99% correct and your algorithm is 95% correct, a reader's reaction won't be that your algorithm is correct 96% as often; it will be that your algorithm makes 5x as many errors. – hobbs Mar 25 '16 at 04:15
  • 4
    This is, statistically, the correct reaction, too. – Ryan Reich Mar 25 '16 at 04:52
  • 8
    It's simply not true that we'd never have innovations if we only published things that are improvements on the state of the art. You need to give people a reason to care about your algorithm. "If people cared about it, they'd make it as good as other algorithms" isn't a reason for people to care about it: it's a hope about the consequences that will follow if people do care. – David Richerby Mar 25 '16 at 06:02
  • 2
    The fact that your algorithm does not perform as well based on accuracy is important because people are usually interested in accuracy, BUT what else does your algorithm bring? Does it only learn with 95% accuracy but can it do it in 30% of the time it takes the other algorithm to learn to 99%? What redeeming qualities does your algorithm have? Otherwise, if it takes about the same time and is less efficient, unless there is a specific lesson to be learned, then I do not think it would be accepted for publication in a journal. A textbook author may find it interesting for algorithm analysis. – querist Mar 25 '16 at 13:43
  • 4
    I'd say the paper is interesting nonetheless, because it contributes a new insight on the performance of algorithms. Also, if you don't publish it, the next person who has the same idea will need to investigate it fully, probably with similar results. – Simon Richter Mar 25 '16 at 14:47
  • I'm voting to close this question as off-topic because this will be very specific to your field. – Scott Seidman Mar 25 '16 at 15:33
  • 5
    Just because your algorithm has inferior performance, doesn't mean its discovery is useless. The fact of its existence may be insightful to theorists, and others may see improvements upon yours that propel it to much better performance. You should try to emphasize any aspects it may have besides speed and correctness that are somehow interesting. – Superbest Mar 26 '16 at 03:27
  • 3
    When the 486 came out, it improved upon the 386 which introduced many multitasking features. Parallelism was considered inferior because there was cost to multitasking. Now, multi-core CPUs benefit greatly from parallel computation. Just because your algorithm is currently clocking in slower with current hardware doesn't mean that it has no unique characteristics. Unique characteristics have often been usefully exploited, sometimes not at the time of initial creation. (I agree with @Superbest's comment.) – TOOGAM Mar 26 '16 at 06:39
  • You say that your algorithm is inferior.Well it has a lower GPA .This lower GPA is measured on standard Key Performance Indicators .If your algorithm is better in ANY aspect it has a place .Even if you have to broaden the KPI to show some merit .I have lots of hardware stuff that is not the be all and end all but has merit in specific areas.If you can find some merit in your algorithm then you have a useful tool that can go into the toolbox and then I think you can publish. – Autistic Mar 27 '16 at 00:23
  • 95% is different by a lot from 99%. – Salvador Dali Mar 27 '16 at 05:45
  • 2
    I assume that you're doing something new, which is why you want to publish, some sort of new approach. The first iteration, I should think, shouldn't necessarily be expected to blow everybody else out of the water. That's not how science works. There is no magic bullet that works 100% of the time in all possible cases. There is great value in coming up with new ways of doing things. – Broklynite Mar 28 '16 at 10:59
  • How many of the cases that the established algorithm that gets 99% right get's wrong does your algorithm get right? – Christian Mar 28 '16 at 18:14

8 Answers8

89

If you want to get technical, in general no learning algorithm performs any better than any other.

The question then, is what can be learned from your algorithm. In your question, you speak of it as a fond intellectual child that you wish to grow and nurture, and you speak of your personal concerns about acceptance and rejection. Here is the thing, though: none of that matters for a publication. What matters is this: what new knowledge or capability is brought into the world with your work on your algorithm, and how can this be objectively evaluated?

Here are some possibilities that I can see:

  • Your algorithm may perform better on an interesting and useful class of problems, and thus be of practical interest.
  • Your algorithm may perform worse, but have some other desirable property, such as executing very quickly or using very little memory, in which case the performance comparison just needs to show that it is sufficient, and you can show much better performance with regards to those other properties.
  • Your algorithm may perform worse, but do so in a way that is enlightening, e.g., taking a more human-like approach, or showing how something can be accomplished by a very unusual and unexpected route. In this case, the performance comparison is simply showing that your algorithm does not perform too badly to be interesting, and the narrative should focus on the path taken to achieve your results and why that is interesting.
  • Your algorithm may only have taught you personally some interesting things about classification and scientific research, in which case you should mourn the passing of a fine research idea and move on with your life.

Only you and those who know your work well will be able to tell which category it truly fits into.

jakebeal
  • 187,714
  • 41
  • 655
  • 920
  • 5
    +1 for a good categorisation of what useful results could be. – Captain Emacs Mar 25 '16 at 00:58
  • 25
    I'd add another category. You think your algorithm might provide a foundation for other algorithms (it introduces a novel concept). Even if it doesn't pass the first three tests, I'd say at a minumum put it up on ArXiv, who knows what others may find it good for (it might be the foundation for an algorithm in a totally unrelated field). Let others see it, if it is indeed novel. – WetlabStudent Mar 25 '16 at 03:39
  • 4
    @WetLabStudent That is really covered by point 3, is it not? "it introduces a novel concept" ~ "showing how something can be accomplished by a very unusual and unexpected route" – Angew is no longer proud of SO Mar 25 '16 at 08:17
  • 12
    To add my five cents. I've been to a conference where someone presented an alghoritm which didin't really get better results than state of the art ones but it was so robust, concise and innovative that I was marveled at the pure idea and I could implement it from scratch there and then whereas I can't implement those better alghoritms without any reference which of course takes time to read and process. Take sorting for example I know complexity of sorting alghoritms but then again I usually go with bubble sort because I can implement it fast and go on. The results are not always the key. – Sok Pomaranczowy Mar 25 '16 at 09:25
  • 3
    Another important category: The resulting model is more straightforward to interpret than models from competing algorithms (many learning algorithms are essentially black-box models), or that the model uses some domain-specific information/assumption which could make it more relevant to that domain. – Bitwise Mar 25 '16 at 12:00
  • @SokPomaranczowy, insertion sort is almost twice as fast and equally simple. Shellsort is somwhat more complex, but quite a bit faster. – vonbrand Mar 25 '16 at 12:05
  • 1
    @SokPomaranczowy: Bubble sort is the worst possible example you can come up with, because in any decent programming language there is a library that can perform O(n*log(n)) sorting for you in one or two lines of error-free code whereas bubble sort is Θ(n^2) on random input lists. – user21820 Mar 26 '16 at 16:20
  • @user21820 I fail to see why is that bad example? I never said that this solution is preferrable, that I would use it in enterprise code or that it has better time complexity. All I said was that bubble sort wins with other alghoritms despite having worse result purely because it takes less of my brain to implement and the question is about publishing less then optimal results. – Sok Pomaranczowy Mar 26 '16 at 18:21
  • @SokPomaranczowy: I understand, but your last sentence is precisely where it is completely wrong, because it takes even less of your brain to use the standard libraries to implement sorting. – user21820 Mar 27 '16 at 00:36
  • @user21820 There are cases where you can't use standard libraries. Besides this conversation and your arguments have nothing to do with OP case. – Sok Pomaranczowy Mar 27 '16 at 19:04
  • @SokPomaranczowy: Yea I agree that this conversation has nothing to do with the question at hand. If standard libraries are out of the question, then I agree that bubble-sort or insertion-sort are the simplest you can get. – user21820 Mar 28 '16 at 07:04
7

Simon Richter wrote in a comment:

I'd say the paper is interesting nonetheless, because it contributes a new insight on the performance of algorithms. Also, if you don't publish it, the next person who has the same idea will need to investigate it fully, probably with similar results.

If "this idea doesn't work as well as one might hope" is the main conclusion then I also see value in publishing it, but you need to consider carefully the venue. Is there something like a Journal of Negative Results in your field?

Peter Taylor
  • 3,433
  • 23
  • 25
  • It seems like the author isn't convinced that his algorithm is a dead-end and thinks that future work on it might produce good results. – Christian Mar 28 '16 at 18:17
5

Let's take a look at a very common and studied problem: sorting. There are lots of algorithms, starting from very inefficient ones, such as bubble sort up to more efficient ones such as quick sort or merge sort. Of course, in practice, I would like to use the most efficient one, but there are some reasons for which I might choose another one. For example, merge sort might be more appropriate for a machine with sequential access memory. Also, even if I would never use it in practice, there was a point in my life when I studied bubble sort, since it created a softer learning curve for me. Also, I studied merge sort initially not for the actual problem that it solves, but for the method it uses.

Bottom line is, there are many reasons why an algorithm is interesting for somebody, even if it has a lower average performance in practice. Moreover, somebody might find some application for your algorithm where it is better than others (for example, bubble sort is quicker than quicksort on sorted lists).

Paul92
  • 576
  • 2
  • 8
3

Accuracy is not the only measure for an algorithm: yours may work blazingly fast, be implementable on a microcontroller, classify in real-time or online, serve as a good preprocessing to other algorithms, be robust to noise, who knows. Perhaps it is just very elegant. With a little wit, you can find a way to assert its efficiency.

Many of nowadays standard algorithms (for instance in sound/image compression) contain parts that where not quite state-of-art when published first. But work great in conjunction.

Laurent Duval
  • 4,995
  • 15
  • 35
2

I'll have a go at this with a softer perspective. You write:

I'm afraid of being rejected by the journal again.

What is the worst consequence of being rejected?

Most of us get rejected from time to time, and you are not alone in being worried about that: How do I overcome fear of rejection when writing academic papers?

You obviously have a logical mind. My guess is that if you put that good head of yours into thinking about the Worst consequence, you will find that it isn't that bad. You might even learn something useful along the way. Life will go on.

If not not in the sphere of academia, several famous writers have been rejected: Agatha Christie, J.K. Rowling, C.S. Lewis to mention a few.

Close your eyes and press send! (and if you get rejected: blame it on someone else, that helps ;)

mrHaugen
  • 21
  • 1
1

A lot of good points supporting publication have already been listed but one seems to have been missed, even if it might not be relevant for your algorithm: that it is a different algorithm.

I'm working in the numerical department from time to time (for fun, not profit) and just now I'm implementing a function to compute a big-integer nth-root. Although the algorithms I use had been proven to be correct sveral hundred years ago any implementation will have errors: tests are mandatory.

I could have used a second library with a well tested implementation but to avoid errors in the translating of the data types I just implemented two different algorithms: a recurrence (Halley) and a linear function (binary search). The chance that both err in the same way and outcome is very low because the two algorithms are sufficiently different[1].

If your algorithm is also sufficiently different it, too can be used to verify the results of other algorithms. That is not the worst thing, to say the least.

TL;DR: publish!

[1] NB: you may use both for large integers because the binary search algorithm is not very fast for small indices and the Halley recurrence is quite slow for large indices starting at about 300 bits radicand size with my implementation. So your algorithm might even be useful in production, although that point has already been made, if I remember correctly.

1

Research is not always writing about how your experimentation works perfectly. It's about contributing to the global body of knowledge. Such knowledge can also include what was attempted and why it did not work.

jaybers
  • 420
  • 2
  • 9
0

It is possible that your algorithm isn't quite as good as other published algorithms right now, but it may have easy improvements that other people could find that makes it better than other algorithms.

If you publish it, someone might figure out such improvements. If you don't publish it, they won't.

gnasher729
  • 3,554
  • 14
  • 16