I am attempting to parse a large amount of text that cannot be readily used by other software due to the human readable design. However, each "section" of the text has the same format. By "section", I mean lines 1-10 (section 1) will have the same format as lines 11-20 (section 2). My plan is to loop through this text and gather all sections into a list. Then convert that list into a CSV.
Example:
####################################
123456789 first-name last-name 2/25/2018
------------------------------------
more-user-info1 12345
other-info1
123 user-name1
------------------------------------
even-more-data1
####################################
112233445 first-name last-name 1/1/2018
------------------------------------
more-user-info2 78900
other-info2
555 user-name2
------------------------------------
even-more-data2
####################################
<piece 3 here> ...
So to give you an idea ###...
means new section.
My question is what is the best approach to parsing this data? This is just an example, a real section has a lot more data and a lot of edge cases/optional data. This is also coming from a super old program so I am not aware of any blueprint or business rules around the data.
I am currently using regex to find the data I need to store. Are there any options besides regex to parsing non-strictly structured text?