Literary Analysis

The characters in the Persistence of Vision series invent a Hilbert space of literature characteristics, so it seemed appropriate to use that concept on my novels (and, not shown here, my scientific books).  Each dimension in this literary Hilbert space spans the values of a particular descriptive variable, such as Flesch-Kincaid Grade Level and action scenes per 5000 words of text.  The concept is that similar works should have similar dimensional-values - or directly explainable differences, such as the number of words in a short story versus the number in a novel. 

The entire Hilbert space can be contained in a SAS JMP file; however, visualizing dozens of dimensions is difficult.  However, JMP provides a 3-D tool that allows for rotation of the visualization.  I've provided three salient views that I rotated so that their 2-D renderings provide some information.


The size metrics include page count, word count, number of chapters, and number of figures.

Overall, the novels average 350 pages, with the Dark Energy series being slightly longer, averaging 360 pages, and the Sense of Gravity series being shorter at 340 pages.  The word count average is 113,500 and is fairly consistent across the series, with a standard deviation of 13,000 words.  The number of chapters range from 18 to 26, with an average of 23.  The number of figures has the largest variation, ranging from 1 to 20 per novel; however, most have about 7 figures.


The content variables are science, religion, sexy scenes, action, travel, business, and figures.  The occurences and intensity of each of these are divided by the word count for each novel and converted to per-5000-word metrics.

The first scatterplot shows the values of science, religion, and sexy scenes per 5000 words for each of the novels.  The data are color-coded by series.  The Dark Energy series shows more spread in both the sexy scenes and the religion dimensions; however, the averager values for each series are close in all three dimensions.

Content 1

The second scatterplot compares the values in the action, travel, and business dimensions.  There are major differences in all three dimensions; however, again, the averages for each series are close together.

Content 2

The final content dimension, figures per 5000 words, is not displayed.  The Dark Energy and Sense of Gravity series contain books with significantly more figures than their series averages and more than in the Persistence of Vision series.


The style variables are dialog per 5000 words, reading grade level, and passive sentence fraction.  As above, the data are coded by series in the scatterplot.  The Dark Energy novels are written at a slightly higher grade level, 7.5, than the other two series, 6.3 and 6.4.  But generally speaking, the dialog and passive sentence values are comparable among the series.



In some dimensions, there are differences among the series; however, they are generally similar.  The 3-D scatterplots have a reference in the legend to the Springer-published scientific books; however, they aren't shown in the views, as they are widely separated from the science fiction data points.  Thus, the Hilbert space concept can produce the desired differentiation among literary works, clumping scientific works in one volume and science fiction in another.  Further differentiation within these novels would be possible.  For example, separating the different sciences that are used would separate the series and some intra-series separation.  (Anthropology is strongly used in the latter three novels of the Dark Energy series, but not in the first three - or in the other two series, for that matter.)  Similarly, identifying the science-fictional theme would allow differentiation - and, if combined with analyses of other SF works - would allow for clumping along that set of dimensions.

This analysis serves no purpose other than being fun.  However, there are people who do literary scholarship in SF and they might find the technique to be useful.


