Create your own textformat and parse it

Last Update: 27.12.2010. By azarai in python | snipplet

For my cv i maintain a list of all projects i worked on to give a bit more information than a typical german cv does. Each project entry contains a name and description, what my job and role was and which technology was mainly used and when it was. I used OpenOffice several years for this task, but was never quite lucky with it as i always had to fix the layout after i added a new entry… But this time it bothered me once too often, so i had to code something :-)

My requirements

  • simple text format; should be readable and writable in a normal texteditor, so no xml
  • should output pdf

My way:

I looked at various ways to produce pdfs and decided that i didn’t want to mess with pdf generation directly, specially layouting. But i can outout html pretty fast and there are plenty solutions out there with will “convert” html to pdf. So, pdf generation is not covered here (I used wkhtmltopdf ).

The textformat
<starttime> - <endtime>
<name>
<description>
Job
<jobdescription>
Role
<roles in project>
Technology
<technoligies used>

At least the starttime needs to be in the format MM.YYYY. The keywords Job, Role and Technology are case-sensitive.

Example entry
10.2009 - 10.2010
Ich bin der Projektname
Es ging um a
b
und c

Job
OOAD, Aufwandsschätzungen

Role
Architekt, Entwickler, Tester

Technology
Java, Tomcat 5/5.5/6
Parsing the format

I’ve implemented the parser as a simple state machine with no syntax tolerance; so formating errors might break it. And i allowed markdown inside of the project and job description. The code is commented and should be self-explanatory. Feedback is appreciated.

The Code:

# -*- coding: utf-8 -*-
import re, os, sys
from jinja2 import Environment, FileSystemLoader
import markdown

#define states
state_time, state_name, state_desc, state_job, state_role, state_tech = range(6)

#setup jinja2
template_store = '.'
env = Environment(loader=FileSystemLoader(template_store))

projects = {}
state = state_time
current_project = None
key = 0

if len(sys.argv) != 2:
    exit("No project list given")

print "Reading file %s" % sys.argv[1]

#start reading project file line by line
for line_raw in open(sys.argv[1], 'r').readlines():
    line = unicode(line_raw, "utf-8")
    if re.match("[0-9]{2}\.[0-9]{4}.*", line):
        time = line.strip()
        projects[key] = {}
        projects[key]['time'] = time
        current_project = projects[key]
        current_project['desc'] = ''
        current_project['job'] = ''
        current_project['role'] = ''
        current_project['tech'] = ''
        state = state_name
        key += 1
    elif state == state_name:
        current_project['name'] = line.strip()
        state = state_desc
    elif state == state_desc:
        if not re.match("Job", line.strip()):
            current_project['desc'] += line
        else:
            state = state_job
            current_project['desc'] = markdown.markdown(current_project['desc'])
    elif state == state_job:
        if not re.match("Role", line.strip()):
            current_project['job'] += line
        else:
            state = state_role
            current_project['job'] = markdown.markdown(current_project['job'])
    elif state == state_role:
        if not re.match("Technology", line.strip()):
            current_project['role'] += line
        else:
            state = state_tech
    elif state == state_tech:
        current_project['tech'] += line

print "Successfully build the projectlist"

print "Generating the html page"
template = env.get_template('template.html')
content = template.render(projects=projects).encode('utf-8')

path = os.path.join("index.html")
file = open(path, 'wb')
file.write(content)
file.close()
print "Done"

Usage in the template:

{% for pkey in projects.keys() %}
      <table class="event">
      <tr class="dummy">
      <td>&nbsp;</td>
      </tr>
       <tr>
        <td class="date"><h1>{{projects[pkey]['time']}}</h1></td>
        <td>
            <h1>{{projects[pkey]['name']}}</h1>
            {{projects[pkey]['desc']}}

            <h3>Tätigkeiten</h3>
            {{projects[pkey]['job']}}

            <h3>Rollen</h3>
            {{projects[pkey]['role'] |trim |replace("\n", "<br/>")}}

            <h3>Technologien</h3>
            {{projects[pkey]['tech'] |trim |replace("\n", "<br/>")}}
        </td>
      </tr>
    </table>
{% endfor %}