Create your own textformat and parse it
27.12.2010 by azarai in python | snipplet
For my cv i maintain a list of all projects i worked on to give a bit more information than a typical german cv does. Each project entry contains a name and description, what my job and role was and which technology was mainly used and when it was. I used OpenOffice several years for this task, but was never quite lucky with it as i always had to fix the layout after i added a new entry... But this time it bothered me once too often, so i had to code something :-)
My requirements
- simple text format; should be readable and writable in a normal texteditor, so no xml
- should output pdf
My way:
I looked at various ways to produce pdfs and decided that i didn't want to mess with pdf generation directly, specially layouting. But i can outout html pretty fast and there are plenty solutions out there with will "convert" html to pdf. So, pdf generation is not covered here (I used wkhtmltopdf ).
The textformat
<starttime> - <endtime> <name> <description> Job <jobdescription> Role <roles in project> Technology <technoligies used>
At least the starttime needs to be in the format MM.YYYY. The keywords Job, Role and Technology are case-sensitive.
Example entry
10.2009 - 10.2010 Ich bin der Projektname Es ging um a b und c Job OOAD, Aufwandsschätzungen Role Architekt, Entwickler, Tester Technology Java, Tomcat 5/5.5/6
Parsing the format
I've implemented the parser as a simple state machine with no syntax tolerance; so formating errors might break it. And i allowed markdown inside of the project and job description. The code is commented and should be self-explanatory. Feedback is appreciated.
The Code:
# -*- coding: utf-8 -*- import re, os, sys from jinja2 import Environment, FileSystemLoader import markdown #define states state_time, state_name, state_desc, state_job, state_role, state_tech = range(6) #setup jinja2 template_store = '.' env = Environment(loader=FileSystemLoader(template_store)) projects = {} state = state_time current_project = None key = 0 if len(sys.argv) != 2: exit("No project list given") print "Reading file %s" % sys.argv[1] #start reading project file line by line for line_raw in open(sys.argv[1], 'r').readlines(): line = unicode(line_raw, "utf-8") if re.match("[0-9]{2}\.[0-9]{4}.*", line): time = line.strip() projects[key] = {} projects[key]['time'] = time current_project = projects[key] current_project['desc'] = '' current_project['job'] = '' current_project['role'] = '' current_project['tech'] = '' state = state_name key += 1 elif state == state_name: current_project['name'] = line.strip() state = state_desc elif state == state_desc: if not re.match("Job", line.strip()): current_project['desc'] += line else: state = state_job current_project['desc'] = markdown.markdown(current_project['desc']) elif state == state_job: if not re.match("Role", line.strip()): current_project['job'] += line else: state = state_role current_project['job'] = markdown.markdown(current_project['job']) elif state == state_role: if not re.match("Technology", line.strip()): current_project['role'] += line else: state = state_tech elif state == state_tech: current_project['tech'] += line print "Successfully build the projectlist" print "Generating the html page" template = env.get_template('template.html') content = template.render(projects=projects).encode('utf-8') path = os.path.join("index.html") file = open(path, 'wb') file.write(content) file.close() print "Done"
Usage in the template:
{% for pkey in projects.keys() %}
<table class="event">
<tr class="dummy">
<td> </td>
</tr>
<tr>
<td class="date"><h1>{{projects[pkey]['time']}}</h1></td>
<td>
<h1>{{projects[pkey]['name']}}</h1>
{{projects[pkey]['desc']}}
<h3>Tätigkeiten</h3>
{{projects[pkey]['job']}}
<h3>Rollen</h3>
{{projects[pkey]['role'] |trim |replace("\n", "<br/>")}}
<h3>Technologien</h3>
{{projects[pkey]['tech'] |trim |replace("\n", "<br/>")}}
</td>
</tr>
</table>
{% endfor %}
comments powered by Disqus
