There’s been a lot of dicussion about religion and violence in light of recent events, with some claiming that certain religions are inherently more violent than others. I thought it would be interesting to examine this assertion by looking at holy texts from six major religions (Buddhism, Christianity, Hinduism, Islam, Judaism, and the Church of Jesus Christ of Latter-day Saints) and “measuring” the violence in each.
Now obviously there’s no way this analysis is a complete treatment of the subject. This subject is incredibly nuanced; an (admittedly incomplete) examination of the texts is one very small facet of the overall discussion. Regardless, I think it’s well worth a look, and the results may surprise you. At the very least it’s quite an exercise in data cleaning - if you’re into that sort of thing.
The general strategy is simple; count the number of sentences containing violence-related words in each religious book. The first thing we’ll need (besides the data) is a way of obtaining the sentences from the text. It seems easier than it is; the variety of punctuation patterns, quotations, etc make this pretty nontrivial. Fortunately it’s something there’s been a lot of work already done, and we’re going to use it. The OpenNLP (NLP = natural language processing) library has a model for sentence chunking.
The books were pulled from Project Gutenberg (except the Jewish scriptures). They can be obtained at the following links:
- Buddhism, Dhammapada
- Christianity, The Holy Bible (World English Bible)
- Hinduism, The Vedas
- Islam, The Koran
- Judaism, The Jewish Scriptures
- Latter Day Saints, The Book of Mormon
The choice of Hindu and Buddhist texts is somewhat arbitrary; there isn’t any single centralized holy text for those religions, so I picked two of the more well-known (to me at least) texts. The Jewish Scriptures I pulled is perhaps not the best translation to apply this type of analysis on. Unfortunately Project Gutenberg’s collection is just links to the Bible on an individual book-by-book basis. This would be an immense amount of work to assemble and clean, so I opted for a collected edition.
Each book is presented as plain text, but a substantial (and I mean substantial) amount of cleaning is required. I will spare you the details, but suffice it to say there’s a solid amount of code required. Hit the code bar at your own risk.
How many sentences in each book?
|The Holy Bible||36,316|
|The Jewish Scriptures||20,498|
|The Book of Mormon||7,605|
Note that the Jewish Scriptures and the Bible both have by far the highest number of sentences. This will come into play when we start counting the violent sentences later.
Next, we need a list of words related to violence. This is where the arbitrary nature of the analysis rears its ugly head. Here’s 24 words I picked kind-of randomly with a little googling.
wound, hurt, fight, violate, destroy, slaughter, murder, kill, attack, break, crush, provoke, anger, hatred, bloodshed, rage, fear, suffer, violent, war, stab, shoot strike, rape.
Clone and run the notebook if you want to pick your own list of violent words (or any words, actually).
Finally, we need a way to detect when a word is in a sentence. This is even harder than detecting sentences; we have to detect words within a sentence.
Suppose we want to search a book for instances of the word kill. We can search for the exact word and we’ll do okay, but we’d miss any that were at the end of a sentence (kill?, kill!, kill.) or really any instance followed by punctuation (kill,, kill;). We’d also miss words we may want to count: killing, killed, etc.
Luckily, this is not an uncommon need; it’s called the search problem. There are a number of ways to tackle this, but since I’m feeling especially lazy I opted for Apache Lucene. Lucene is essentially a document store that let’s you pull documents based on textual queries. It does this by analyzing the documents as they’re added and indexes them based on their contents using natural language processing techniques with (probably) a sprinkle of magic. We can use it to index each sentence of each book, then perform a search on the words we want. At that point all we’d need to do is count the results of our search and we have our answer.
Just to demonstrate Lucene’s capabilities, I’ll perform a query for the word “rape” and look at the sentences it retrieves.
The Holy Bible: Their houses will be ransacked, and their wives raped. The Jewish Scriptures: In those days saw I in Judah some treading winepresses on the sabbath, and bringing in heaps of corn, and lading asses therewith; as also wine, ;rapes, and figs, and all manner of urdens, which they brought into Jerusalem on the sabbath day; and I forewarned them in the day where- in they sold victuals.
Notice that the query returned “raped” and “;rapes”; Lucene doesn’t necessarily need an exact match. It’s clear from the context that the match in the Jewish Scriptures is incorrect. Not only is it not referring to the correct word in the first place (should be “grapes”), but the word isn’t even spelled properly. This is important, as it reveals the limitations of the analysis technique.
- Unstructured text is almost never “all the way clean”.
- Obtaining context is extremely difficult.
There has been an enormous amount of progress in the NLP community on teasing out context, particularly with neural network based techniques. We’ll be using exactly none of that research and stick with the simple stuff.
Without further delay, let’s take a look at what Lucene can get us.
The first thing we should look at is the raw counts of sentences containing violence.
By a landslide, the Bible is on top, followed closely by the Jewish Scriptures. This isn’t unexpected for two reasons:
- The Bible and Jewish Scriptures have the most sentences.
- The Jewish Scriptures share a huge amount of material with the Bible.
The Vedas has no violent sentences at all.
A better way to look at the data is to compute the ratio of violent sentences against total sentences.
Now things are more interesting. Far ahead of the others we have Dhammapada and the Book of Mormon. To tease out a little more context and chase down this surprising (at least to me) result we need to look at which of the violent words in the list were in the most sentences. We could visualize this with just a series of barcharts, but since we’re specifically producing counts of words I think word clouds will be a lot more fun.
The Holy Bible
The Jewish Scriptures
The Book of Mormon
Now things are really interesting! Fear is a common theme amongst all of the texts (except Vedas, which would have been a really boring word cloud).
The Jewish Scriptures, the Bible, and the Book of Mormon all have a large number of sentences containing “destroy”, something much less prominent in the Koran. The Bible also has a large number of sentences with “kill”, though this is likely a consequence of the translation as much as anything else. The other texts use much more formal language.
Another standout is the prominence of “suffer” in both the Dhammapada and the Book of Mormon - our two “most violent” texts. We can get an even better sense of context by just sampling the sentences that match our queries.
A true Brahmana goes scatheless, though he have killed father and mother, and two valiant kings, though he has destroyed a kingdom with all its subjects. Not to blame, not to strike, to live restrained under the law, to be moderate in eating, to sleep and sit alone, and to dwell on the highest thoughts,--this is the teaching of the Awakened. If a man's thoughts are not dissipated, if his mind is not perplexed, if he has ceased to think of good or evil, then there is no fear for him while he is watchful. The evil done by oneself, self-begotten, self-bred, crushes the foolish, as a diamond breaks a precious stone. Let us live happily then, not hating those who hate us! among men who hate us let us dwell free from hatred! "He abused me, he beat me, he defeated me, he robbed me,"--in those who harbour such thoughts hatred will never cease. Or lightning-fire will burn his houses; and when his body is destroyed, the fool will go to hell. A true Brahmana goes scatheless, though he have killed father and mother, and two holy kings, and an eminent man besides. All men tremble at punishment, all men fear death; remember that you are like unto them, and do not kill, nor cause slaughter. Pleasures destroy the foolish, if they look not for the other shore; the foolish by his thirst for pleasures destroys himself, as if he were his own enemy.
It’s pretty clear from this sample that we’re looking at advice against violence here. This makes sense, the Dhammapada is a collection of sayings, not an actual story. Suffering is also a key theme in Buddhism, so it’s no surprise we see it heavily represented.
Let’s look at the Book of Mormon.
Moroni : For behold, their wars are exceedingly fierce among themselves; and because of their hatred they put to death every Nephite that will not deny the Christ. Alma : But the law requireth the life of him who hath murdered; therefore there can be nothing which is short of an infinite atonement which will suffice for the sins of the world. Mormon : And it came to pass that I did speak unto my people, and did urge them with great energy, that they would stand boldly before the Lamanites and fight for their wives, and their children, and their houses, and their homes. Mosiah : Yea, they went again even the third time, and suffered in the like manner; and those that were not slain returned again to the city of Nephi. And behold it shall come to pass that after the Messiah hath risen from the dead, and hath manifested himself unto his people, unto as many as will believe on his name, behold, Jerusalem shall be destroyed again; for wo unto them that fight against God and the people of his church. Alma : Then, my brethren, ye shall reap the rewards of your faith, and your diligence, and patience, and long-suffering, waiting for the tree to bring forth fruit unto you. Ether : For so great had been the spreading of this wicked and secret society that it had corrupted the hearts of all the people; therefore Jared was murdered upon his throne, and Akish reigned in his stead. Alma : Therefore, whosoever suffered himself to be led away by the Lamanites was called under that head, and there was a mark set upon him. Helaman : And it came to pass that Helaman did send forth to take this band of robbers and secret murderers, that they might be executed according to the law. Jacob : And it came to pass that many means were devised to reclaim and restore the Lamanites to the knowledge of the truth; but it all was vain, for they delighted in wars and bloodshed, and they had an eternal hatred against us, their brethren.
The Book or Mormon, on the other hand,contains a mixture of sayings and warnings against violence with actual violence in the form of a story. This is very much in line with the style seen in the Jewish Scriptures and the Bible, which are presented as a mixture of narrative, dialogue, and proverb.
Going into this, I thought the Bible and Jewish Scriptures were going to have the most violence by a longshot. In some sense they did, but only because they’re the longest. When accounting for the relative lengths of the texts, Dhammapada and Book of Mormon came out way way ahead of the others.
What really surprised me was the number of violent sentences in the Book of Mormon. Members of the LDS church have a well-deserved reputation for being nice. Actually, the kindest people I know belong to the LDS church.
In the end, I think this analysis told us exactly what we expected about violence and religion: pretty much nothing :). At least we got some morbid word clouds out of it.