13

In Sipser's Introduction to the Theory of Computation, the author explains that two strings can be compared by “zigzagging” back and forth between them and “crossing off” one symbol at a time (i.e., replacing them with a symbol such as $x$). This process is displayed in the following figure (from Sipser):

Diagram that shows how a Turing machine can compare two strings

However, this process modifies the strings being compared, which would be problematic if the Turing machine needs to access these strings in the future. What are ways of performing a string comparison without modifying the strings?

Frank
  • 325
  • 2
  • 11

5 Answers5

29

Create two new types of marks: $\dot{0}, \dot 1$. Those two will act "like" $x$, but can still keep the information about the string.

So when you cross-off a letter, add a "dot" to it at the top instead of fully replacing it with $x$.

Then, if you want the original strings back, after you are done comparing, go through the entire strings and remove the "dots": replace $\dot 0$ with $0$ and $\dot 1$ with $1$.

nir shahar
  • 11,538
  • 3
  • 14
  • 35
  • 1
    Wouldn't that require keeping some form of array that links to 0? Is that any different to just using different symbols, where 0 maps to y and 1 maps to h or anything else? – terdon Aug 10 '21 at 10:57
  • 2
    You can use any two distinct symbols you will like, for as long as you know what was mapped from $0$ and what from $1$. The symbols $\dot 0$ and $\dot 1$ are just for the sake of convenience. – nir shahar Aug 10 '21 at 11:53
  • 2
    Yes, of course. I am just trying to understand how this is different to just making a copy of the string and operating on that copy. Is there any benefit in using an associative array-like structure (I assume that's how this would have to be implemented) instead of making a copy? Also, I am a biologist and wouldn't know a Turing machine if it bit me on the nose, so I could just be missing something blindingly obvious here. – terdon Aug 10 '21 at 11:57
  • 5
    This is by no means an "associative array-like structure". The only thing we do here is replace $0$ by a different letter $\dot 0$ in-place (we don't create extra memory for it, we replace the existing memory), so we would know that we have seen it already. Turing machines, in the formal world have some alphabet they operate on. This alphabet doesn't necessarily have to be the binary one, and the solution I proposed uses a $4$ letter alphabet: ${0,1,\dot 0,\dot 1}$. This was done for the sake of convenience, so it would be easier to implement – nir shahar Aug 10 '21 at 12:01
  • I think I'll have to ask a new question. Sorry to bug you in the comments like this. It seems I need to understand some basics first. Thanks! – terdon Aug 10 '21 at 12:50
  • @terdon The alphabet used by your Turing machine is just as much a part of the definition as the transition function; you get to pick the alphabet you want based on the algorithm you are implementing. The input alphabet, the symbols used to present the input initially, is a subset of the tape alphabet, which can include as many extra symbols as you like: they are only used by the transition function, and do not affect the time or space complexity of your algorithm. (The only thing you can't do is wait until you've seen the input to decide what the tape alphabet will be.) – chepner Aug 10 '21 at 20:20
  • 3
    @terdon You don't need an associative array, no. The code of the Turing machine itself (i.e., its transition function) knows how to make these replacements. They are hardcoded. If you were writing this in a normal programming language, there would literally be an "if c == '$\dot{0}$' then set c = '0' else if c == '$\dot{1}$' then set c = '1'". There is just a 0 and 1, no more. So all of them can be literally in the code. – dionyziz Aug 11 '21 at 14:08
  • 1
    Ah. Thanks, @dionyziz, that makes sense even to my ignorant self. Που θα μου πάει, θα το καταλάβω! – terdon Aug 11 '21 at 14:25
24

One simple way is to create a copy of the entire input right after the original input, if your TM only has a single tape. To distinguish the original from the copy you can use ## as separator. So the first step is to scan the entire input to reach the end, add ##, go back to copy the entire input to the right side of ## and finally apply the comparison used in Sipser to the copy.

Russel
  • 2,745
  • 1
  • 7
  • 16
  • 3
    Arguably the better method, because this one doesn't require modification to the Turing Machine's set of allowable symbols, which effectively makes it into a different Turing Machine. – The_Sympathizer Aug 10 '21 at 08:58
  • 3
    @The_Sympathizer but it does have the drawback of requiring an additional $O(n)$ space. And anyways it still needs some additional symbol as a "separator". This answer is still very good considering that it is more practical :) – nir shahar Aug 10 '21 at 11:55
  • 6
    @nirshahar worrying about space overhead is silly when the topic is Turing machines – they have infinite memory... — If you're going to discuss anything resembling real-world performance, then you should just allow for two registers that can store the positions as addresses, instead of encoding them in unary on the tape. – leftaroundabout Aug 10 '21 at 14:46
  • 1
    @nirshahar your solution needs at least the same amount of bits of "storage" on the tape. By introducing two new symbols, you are increasing the amount of bits per symbol. – KarlKastor Aug 10 '21 at 16:52
  • 1
    @KarlKastor take a look at the second answer I posted, which solves the question in a different way without extra storage \ extra alphabet letters – nir shahar Aug 10 '21 at 17:27
  • 3
    How does the Turing machine create the copy? I'm wondering about how you indicate where in the input you are while copying -- can this be done without introducing new symbols like @The_Sympathizer says? Maybe you overwrite the symbol you're copying with $\sqcup$? – Kyle Miller Aug 10 '21 at 22:04
  • 2
    I accepted the other answer because this answer seems to mask its elegance behind "creating a copy of the input." Like Kyle mentions, I think some kind of marking procedure or some more complicated state representation would be required to actually create a copy. – Frank Aug 10 '21 at 22:42
  • 1
    Hi to everyone, I have been reading all the comments and I cannot help but feel that others are thinking that I am trying to compete with the accepted answer. This was never my intention. I think that @nirshahar's answer is clever and more detailed (specially his alternative answer) that is why it should really be the accepted answer. As for my answer, I just wanted to present some sort of general solution so that there is no need to modify existing TM algorithms to force them not to alter the original input. – Russel Aug 11 '21 at 03:33
  • 1
    @KyleMiller, I will admit I have not considered a detailed presentation of the copying process when I posted this answer, and I am not planning to modify my answer just to improve it since the accepted answer is worth considering including nirshahar's alternative answer. But if one will really wants to perform the copying process without the need to introduce additional symbols, it is straightforward to apply the second answer of nirshahar to be able to achieve this. – Russel Aug 11 '21 at 03:37
  • 2
    @Russel this is by no means a competition :) people can vote both my answer and your answer at the same time! I think what we did here is a bit more discussion about the specifics of implementing the idea you provided - what it means, and how do we do it. – nir shahar Aug 11 '21 at 06:06
  • 1
    How do you copy without keeping track of which part you've already copied? How do you keep track of that without modifying the input string? – user253751 Aug 11 '21 at 09:28
  • 1
    One way to do this based on nir shahar's alternative solution. The TM, after writing ##, will go back to find the leftmost symbol. If this symbol is 0 then we enter a state we call $x_0$ that indicates we copy a 0, otherwise we enter state $x_1$. Now the TM will replace the symbol read with X and proceed to write this copied symbol to the right of ##. Now after writing we return to the left to find the symbol X,and turn it back to its original value (which we can do thanks to the states earlier) ,move to the next symbol and repeat the copy process as before. – Russel Aug 11 '21 at 11:49
9

Here is an alternative solution using the original binary alphabet of $\{0,1\}$ (without adding extra letters, apart from $x$ which can also be replaced with $\sqcup$), that also manages to work without allocating extra memory from its tape:

We only keep one "$x$" per string, moving it one right for each comparison we did. That being said, to know what letter was there before we replaced it with $x$, we can utilize the fact that its possible to encode it within the turing machine's internal state.

nir shahar
  • 11,538
  • 3
  • 14
  • 35
3

In general, it can't be done.

E.g., if the strings are length-prefixed, and writing to the cells containing the strings is not allowed, then there is no way that a machine of $n$ internal states can traverse a string longer than $n$ from end to end, and so there is no single program that can compare strings of arbitrary length (or do much of anything with them).

You have to be able to "bring information with you" on the tape, since there isn't enough memory in the internal state space to store anything of interest. Unless the input is encoded in a way that gives you some working space – leaving alternate cells blank, for example – you have no alternative but to rewrite the tape to create space for yourself.

You can always modify the strings in a reversible way, and restore them to their original positions after you've worked out and recorded the answer. But Turing machines aren't normally expected to do that.

benrg
  • 2,112
  • 4
  • 12
  • What you are saying is interesting. But rather, I also think its an interesting concept to allow one to modify the data however they like, for as long as at the end the modified data is exactly equal to the original data. This allows us to think of the TM as a "black box" that we know what to expect from, and this line of thinking allows us to use such a weird turing machine without worrying too much – nir shahar Aug 12 '21 at 12:46
2

You can get along without a copy and without more symbols than used in the strike-out solution - at the cost of increasing the number of states. That is, you only ever have one $x$ per string and (per additional states) keep in mind what was under them, thus allowing you to restore the info upon the next round.