August 5, 2018

Hugo & XML parsing

Sometimes fatigue strikes. And sometimes you can’t do what you wanted to. Like yesterday, when I wanted to concentrate on Regex in the FCC curriculum but only could do a hand full of exercises till my eyes fell shut.

New try today, but I first looked at my newly created XSD and made some minor improvements, like allowing it to contain no data yet but a name tag and moving an attribute to another tag where it makes more sense… There certainly is a structured way to get these things straight but bear with me, I’m only learning. You might wonder why I do not go into details what exactly I do there in this whole Java/XML project thing. Truth is, I don’t want to share it publicly, at least not yet, but as soon as I get to the next FCC challenges, I promise to be more verbose… Till then you might want to look at what I’ve already done on my Github and GitLab profiles.

I looked a bit into hugo which powers this blog. You create a new article in hugo with hugo new posts/$article$.md where $article$ stands for the filename and “.md” is the ending (obviously markdown). It creates the file, but nothing else. I thought it might be handy to have it open the file in an editor, so I can start right away writing. I thought I had remembered a command line parameter to name an editor to use, but I couldn’t find it in the documentation. But what I fould is the newContentEditor key for the config.toml. If you set it to your prefered editor, the hugo new command will opn the newly created file in said editor. So newContentEditor = "atom" will make the command hugo new posts/ create the file with the corresponding front matter and open it in the atom editor for you.

Speaking of atom, with Ctrl-Shift-m I can open a markdown preview alongside the source text, so I also see what will end up on the blog.

Speaking of front matter: This had me puzzling when creating the blog. I just didn’t seem to be able to put the articles out into the internet, because I always had to add “type=post” to the front matter of the article by hand, which took me quite a while to find out.

So after how many? three? articles I decided I won’t rely on me remembering every time that I have to add the type parameter. So I looked for the place where the front matter is defined. Turned out it’s the sub-directory “archetypes”. In that directory you have a number of files that define how content will be structured when created. Basically it’s all (not all, but that should be the most common case) about setting the front matter right.

So I only had one file in the “archetypes” sub-directory, which would be used for every kind of content. I do not plan to add other content than articles here, but who knows, what the future brings, so I created a (mind the plural: “posts”, while the type is “post”) which reads like the following:

title: "{{ replace .Name "-" " " | title }}"
date: {{ .Date }}
draft: false
type: post

Yes, I also set the draft value to false, because I usually do not write drafts. Once written this all goes online.

I did some reasoning about how to process the XML in my secret project (sounds important, I think I’m gonna refer to it as “secret project” from now on): DOM, SAX or StAX. I think DOM would be the most convenient to write, but a short approximation of the data resulted in around 160 MB of data in memory in the worst case, now I’m hesitant. The first computer I played with (literally) was my father’s first PC (80286) and had a hard disk of 40 MB and the RAM was well below 1 MB (don’t remember exactly). So maybe this is the reason why I consider 160 MB a huge amount of data.

I might start with DOM and implement SAX or StAX later on, as the data will slowly grow over time, so earlier versions will be working fine with DOM. This should also be a good exercise to get to know different approaches. I also stumbled across something called TrAX which might be another approach worth trying. Let’s see what the future holds.

I also watched a few videos on udemy, but didn’t feel to learn anything new today. Well, maybe tomorrow. Good night.

© tonnerkiller

Powered by Hugo & Kiss.