I analyse log files, very often there are strings that only differ on few places and are same every where else, I am trying to find the generic string that they most likely belong to e.g. :
UHGUYGUYGUYGUY id = U1234 UYAG*&^T*#@G*(&G@ id2 = 8767 ib97y79yh0978
UHGUYGUYGUYGUY id = Z1D#34 UYAG*&^T*#@G*(&G@ id2 = 98h ib97y79yh0978
Sss3ug87g87g78ghs837g8 obj { 876t7g }937hs937hs973h97sh397 jh7897y98h
Sss3ug87g87g78ghs837g8 obj { 98u2 }937hs937hs973h97sh397 ZN7897y98h
to me the only difference is between the two the value of ids, so a generic forms/groupings will be
UHGUYGUYGUYGUY id = * UYAG*&^T*#@G*(&G@ id2 = * ib97y79yh0978
Sss3ug87g87g78ghs837g8 obj { * }937hs937hs973h97sh397 *7897y98h
I am not sure what in machine learning I should be looking up for this problem, or even if this problem has a name.
of course this is a very simplified example, the number, location of id's can vary for different generic cases, that is why I can not write old fashioned code for this, too many things can change.
Is there anything in machine learning to help find groups for such strings? if yes, what is it called?