K-Pop Songs, Translations, and Profiles

A Peek at Our Code

Here you can get a glimpse of all the code beneath the hood. If you want to explore all of our code and documents, please feel free to sift through our GitHub Repo. Use the navigation to quickly jump to a topic of interest.

XML and Document Structure

Both our group profiles and song pages begin with an About section dedicated to providing relevant background information. We did a lot of research to make sure this information was accurate and current. We also ensured that we are referencing relevant XML:IDs whenever possible to link between pages and profiles.

The "meaning" element within the About section of the song documents is meant to discuss the message, themes, and/or narrative of the song's lyrics (and video if applicable). We wrote them all ourselves, and when making them, we looked into the opinions of the general community through forum posts and comments, then into the reviews and interpretations from professional critics (typically from magazines or review sites). We also added our own personal interpretations into these as well. Overall, we wanted to portray a combination of many perspectives.

You may notice in the "lyricCredit" element, there are some writers with "artistRef" attributes. That reference attribute is pointing to XML:IDs that were established on the corresponding group profile -- those people are actually members of the group! This was something we wanted to include from the start, so that we could demonstrate that many K-Pop idols contribute to the writing, production, and composition of the music itself.

A screenshot of the XML editor oXygen, showing the beginning About section of the song page XML for BTS

Our group profiles and song pages had pretty unique structures from one another. Every song was broken up into both the original Korean lyrics and their English translations, broken down even further into sections (verse, chorus, bridge, etc.) and then into individual lines. Each section and line has a specific number attached to it, in case we ever wanted to link to a particular verse or line (which we do in some of the song meanings!) Adding the "lineRef" attribute was very important, as that was what allowed us to do the colored line toggle you see on the song pages.

The "backVocal" and "sharedLine" elements were also really essential -- some members are well-known for their ad-libs (like J-Hope in the example below) and it would feel inaccurate and misleading if we didn't include some way for those ad-libs to be represented in our markup. It was also important, because some members (as noted in the Line Distribution section of our Analysis page) have more ad-libs than full lines.

A closer look.

The profile code is less complicated, comprised mostly of these little sections that contain basic info about a given member. The reason for the "koreanName" element is that there are many people within these 10 groups that weren't born in South Korea and thus have a birth name that's not Korean (take Mark and Johnny from NCT 127 as examples, born in Canada and the US respectively). These idols, those without Korean birth names, are typically given or adopt a Korean name so that Korean-speaking people can more easily address them. This isn't always true, though -- look at Yuta, a Japanese idol without one.

The numbers within the "role" section were significant. Each member within a given group has a certain number of roles/positions that determine what they do or focus on -- some people are mainly singers, some dancers, some focus heavily on producing the music, etc. These roles aren't all the same though, they have an internal hierarchy that we wanted to show through the code. If we look at RM, he's a rapper, sure, and an important one at that, but he's BTS' leader first, then a rapper.

A closer look.


Due to having two very distinct types of documents, we had to make two RelaxNG schemas for the project. We tried to make our schemas as thorough and precise as we could, as we had a lot of text that needed to be arranged very specifically (especially the songs).

You can see that only a few elements are required in the "koreanLyrics" section (and similarly in the "engTrans" or English translation section). K-Pop is very formulaic, and we knew going in that every song will have some verses, choruses, pre choruses, and a bridge, but not all songs include an intro/outro, post chorus, refrain, etc. They can also be placed in any order (other than the intro/outro), because one song may start with a verse while another might start with the chorus.

As is (partially) explained in the comments below, there are lots of points of member information that can change from person to person. If we look back at the roles, some members might only fill one role -- maybe they only are a singer, and that's it. Other members, by contrast, might fill a ton of roles, be a jack-of-all-trades, type. We needed our schema to be able to fit both scenarios, and anything in between.

Another example of an optional piece of info is the "subunit" element. Subunits are smaller sub-group that's formed of a whole, larger group, like Stray Kids' 3RACHA or EXO's EXO-CBX. Not every member is a part of a subunit, and most groups don't even have them. But, some do, so we needed that flexibility in the schema.

We also had a Schematron file that was used to connect our group profiles and song pages. With this document, we were able to reference the XML:IDs established in a profile in the lines of a song page. We also have a few additional options when marking up the songs that can seen below: "All" and "Non-Member." "All" is pretty obviously, it's just when the entire group is singing together. "Non-Member" was used for every instance where the line being sung was not a member of the corresponding group -- this was usually a piece of sampled music or something to that effect.


We were able to convert all of that XML to HTML for this website with XSLT. Predictably, our XSLT documents were pretty long and complicated to put together, due to how many elements and sections we developed.

Thankfully, the XSLT for the song pages was pretty simple. Setting up the background information held in the About section was easy, and just required putting the specific pieces of information like album and release date into its own div.

Matching for the different song sections was similarly uncomplicated. We did include those aforementioned numbers on each section into the HTML through classes.

For the lines, we had to add in that "lineRef" attribute as a class, so we could isolate all the lines sung/rapped by each member with the line toggle JavaScript. We also have these two types of classes "korLineNumber" and "engLineNumber" followed by that "linenum" attribute from the XML. This was really useful to isolate certain lines in the song meanings.

The XSLT for the group profiles was a bit more complicated. We had to build these intricate nesting divs and lists that could accommodate all of the information we had on each idol. As said above, some elements like the "koreanName" and "nkBirthName" (non-Korean Birth Name) had to be made optional, while still fitting into the list of names when they were there.


Lastly, here's what it takes to make one of the graphs/charts you see on the Analysis page! This is most of the code that made the Comparing Roles chart. It involved essentially counting each member, then counting the roles that the members fill. This chart is counting every role a member fills, so if there's a member who is a rapper first, then a dancer, this chart is counting both. We did it that way so that idols who are prolific and well-known in multiple roles won't get miscounted -- take RM from BTS as an example again. He's the leader, but the "rap line" of the group would be incomplete without him, and it would be misleading to count him only as a leader, when he's also a very renowned rapper.