Writing has changed with digital technology, but much is the same. Pirsig’s slip-based writing system was inspired by information technology.

Writing has changed with digital technology, but much is the same. The Lila writing technology builds on both the dynamic and static features.

Writers traditionally spend considerable time reading individual works closely and carefully. The emergence of big data and analytic technologies causes a shift toward distant reading, the ability to analyze a large volume of text in terms of statistical patterns. Lila uses these technologies to select relevant content for deeper reading.

Writing, as always, occurs in many locations, from a car seat to a coffee shop to a desk. Digital technology makes it easier to aggregate text from these different locations. Existing technologies like Evernote and Google Drive can gather these pieces for Lila to perform its cognitive functions.

Writing is performed on a variety of media. In the past it might have been napkins, stickies and binder sheets. Today it includes a greater variety, from cell phone notes to email and word processor documents. Lila can only analyze digital media. It is understood that there is still much text in the world that is not digital. Going forward, text will likely always be digital.

Writing tends to be more fragmented today, occurring in smaller units of text. Letter length is replaced with cell phone texts, tweets, and short emails. The phrase “too long; didn’t read” is used on the internet for overly long statements. Digital books are shorter than print books. Lila is expressly designed around a “slip” length unit of text, from at least a tweet length for a subject line, up to a few paragraphs. It would be okay to call a slip a note. Unlike tweets, there will be no hard limit on the number of characters.

A work is written by one or many authors. Print magazines and newspapers are compilation of multiple authors, so too are many websites. Books still tend to be written by a single author, but Lila’s function of compiling content into views will make it easier for authors to collaborate on a work with the complexity and coherence of a book.

In the past, the act of writing was more isolated. There was a clear separation between authors and readers. Today, writing is more social. Authors blog their way through books and get immediate feedback. Readers talk with authors during their readings. Fans publish their own spin on book endings. Lila extends reading and writing capabilities. I have considered additional capabilities with regard to publishing drafts to the web for feedback and iteration. A WordPress integration perhaps.

Pirsig’s book, Lila, was published in 1991, not long after the advent of the personal computer and just at the dawn of the web. His slip-based writing system used print index cards, but he deliberately chose that unit of text over pages because it allowed for “more random access.” He also categorized some slips as “program” cards, instructions for organizing other slips. As cards about cards, they were powerful, he said, in the way that John Von Neuman explained the power of computers, “the program is data and can be treated like any other data.” Pirsig’s slip-based writing system was no doubt inspired by the developments in information technology.

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. Lila is for writing non-fiction; poetry, not so much.

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. And style. Think clearly and the rest comes easy. Lila is designed to extend human writing capabilities by performing cognitive work:

  1. The work of reading, especially during the early research phase. Writers can simply drop unread digital content onto disk, and Lila will convert it into manageable chunks — slips. These slips are shorter than the full length originals, making them quicker to evaluate. More important, these slips are embedded in the context of relevant content written by the author; context is meaning, so unread content will be easier to evaluate
  2. The work of analyzing content and sorting it into the best view, using visualization. As Pirsig said, “Instead of asking ‘Where does this metaphysics of the universe begin?’ – which was a virtually impossible question – all he had to do was just hold up two slips and ask, ‘Which comes first?'” This work builds of a table of contents, a hierarchical view of the content. Lila will show multiple views so the author can choose the best one.
  3. The ability to uncover bias and ensure completeness of thought. Author bias may filter out content when reading, but Lila will compel a writer to notice relevant content.

robotpoetLila’s cognitive abilities depend on the author’s engagement in a writing project, generating content that guides the above work. Lila is designed expressly for the writing of non-fiction; poetry, not so much. The cognitive work is performed in most kinds of writing, and so Lila will aid with other kinds of fiction. Both fiction and creative non-fiction still require substantial stylistic work after Lila has done her part.

The author slip selects and boosts words for questioning unread content

Lila uses author slips to “question” a collection of unread articles and books, suggesting “answers” or responses that extend the author’s material. The term, question, is appropriate because Lila uses natural language processing to enhance search. The application of natural language is shown here in three ways.

1. Distill the focus of the author slip

Perhaps the most important step is to decide which words are the most meaningful for questioning unread content. The design of the slip provides the necessary structures for making this decision, as shown in the figure:

template-slip-weights

An algorithm could use these design features to group keywords and calculate relative weights for use in searching, as shown in the table:

Figure # Field Calculation Word (weight)
4 Content Weight of 1 for each uncommon word. Increase by 1 for each occurrence, so static and dynamic add up to 3. static (3), dynamic (3), quality (1), scientific (1), knowledge (1), cave (1), political (1), institutions (1), centuries (1), king (1), constitution (1), destroying (1), government (1)
3 Subject Line Weight of 2, twice that of Content. Stop words removed. pencil (2), mightier (2), pen (2)
2 Tags Weight of 4, twice that of Subject Line staticVsDynamic (4)
1 Categories Weight of 8, twice that of Tags. “Quality” appears in both Content and Categories; the weight for this word could be their sum, 9. quality (8+1=9)

The words can be used as keywords in a natural language query. The weights would be included as boost factors, ranking search results higher if they contain those words.

2. Apply other natural language analysis, such as word frequency

In the above table, not all words were selected. In the Subject Line, stop words (e.g., “the”) are removed. This is a common practice in the query construction, since stop words are too common to add value. Similarly, in Content, only uncommon words are kept. In this case, word frequency could be calculated using a scientific measure. Words falling below a threshold could be skipped. Word frequency and other linguistic features, such as repetition and word concreteness, will be discussed in detail later on. These steps utilize knowledge of language to improve search relevance.

3. Take advantage of natural language index configurations

Unread content will be crawled and organized in a natural language index, such as Apache Solr’s Lucene index. An index of this sort can be configured to apply other natural language processing, e.g., synonym matching between queries and documents.

Lila Slip Factory II: Answer the questions and find “semantic corners” to build new slips.

Lila is a cognitive system that extends reading abilities by automatically dividing unread content into “slips”, relatively small units of text for later visualization and analysis. This automatic operation is called the Slip Factory. It involves two processes. The first process takes slips manually written by an author and converts them into a collection of “questions;” see the earlier post. The second process, represented here, involves iterating through the unread content, answering the questions and applying a “semantic corners” algorithm that splits and groups text into new slips.

slipfactory2

  1. The process iterates through the collection of unread content. The collection includes content curated by the author, and additional content collected by the system. The content is a large volume of unstructured content in digital format.
  2. The content is tokenized in readiness for search. Search requirements may include an index.
  3. The process iterates through each question generated by the first process.
  4. The question is a natural language query on each item in the unread content collection. The query identifies and ranks a set of search results, documents that are considered to “answer” the question.
  5. The most interesting point in this process is the “semantic corners” algorithm that will analyze the answer passages, splitting and grouping them into new slips. The semantic corners algorithm is the heart of the Lila concept. It will be detailed in future posts.
  6. New slips are added to the collection of slips for later visualization and analysis.

Eliza, Turing, and Whatson vs Lila. Enlist the cooperation of the human rather than design around a fight.

Remember Eliza, the psychotherapist program? Eliza is a computer program written elizaby Joseph Weizenbaum in the mid-sixties and circulated widely in the early days of personal computing. Eliza is modeled on non-directional Rogerian therapy, programmed with a few prompts and a simulation of human understanding by feeding back content from the user. It is an early example of natural language processing. Eliza appears smart as long as the user played along, but it is not hard to confuse the program. And it has bad grammar. People delight in teasing Eliza.

I am looking forward to seeing the new movie about Alan Turing, The Imitation Game, with Benedict Cumberbatch. Turing is regarded as the father of the computer, and he introduced the Turing Test, a natural language test of machine intelligence. In short, a human asks questions to determine if the hidden respondent is a machine or not. The human is trying to mess with the machine, focused on tripping it up.

‘Whatson’ was my first run at designing a cognitive system. It was designed to be a Question-Answer system for literature. I sensed that a big challenge would be the same one as Eliza or any program faced with the Turing Test. People like Question and Answer systems because it makes life easy. Ask a question, get an answer. It does the work for them. Do a little more than everyone expects and everyone expects a little more. The expectations increase. The questions get trickier. Even if the questioner was trying to help, the clues for finding the answer would often be missing. I would have to design a dialog mechanism for collecting more information. But often the questioner would be deliberately trying to test the intelligence and limits of the system. It’s what we humans do, push systems with the Turing Test. I needed a way to enlist the cooperation of the human user, so that I would not design around a fight.

In 2012 I patented a search technology, “Silent Tagging” (US 8,250,066). The technology solves a problem with social tagging. In the heyday of Web 2.0 people actively tagged content on the open web, as an aid to findability. It works on the open web, but in smaller closed contexts like company intranets, workers are much less likely to tag content. In a small population there are fewer adopters of emerging technology and workers are focused on immediate tasks. Was there a way to benefit from tagging without interrupting an employee’s workflow? I introduced the idea of Silent Tagging. The method associates two things in an employee’s normal workflow: keyword searches and clicks on search results. Keywords are like tags, intelligently selected by a searcher for findability. Clicks on search results follow a small cognitive act, deciding that one search result is better than another. The keyword-click association can be silently captured to adjust rankings of content and benefit other users. The key point here is that human cooperation can be implicitly enlisted in the design.

In January of this year I switched gears from Whatson to Lila. Lila is also a cognitive system but its design implicitly enlists human cooperation in natural language processing tasks. Lila is a cognitive writing system, designed to extend human reading, thinking and writing capabilities. The human user is involved in a writing project. In Lila, the author creates content in short units of text called slips. As I have described lately, in Lila, author slips are questions asked of unread content, just like questions in a Question-Answer system. The difference in Lila is that the author’s intent and work is implicitly intelligent, generating slips or questions with high signal and little noise. In one way, Lila is like Eliza, in that both depend on the intelligence of the user. The difference is that in Lila the purpose of the user and the system are implicitly (silently) aligned. No design work is required to convince or negotiate with a user.

Lila Slip Factory I: “Question” rather than “query” for natural language processing

The Lila cognitive writing system extends your reading abilities by converting unread content into slips, units of text for later visualization and analysis. How does Lila convert content into slips? The Lila Slip Factory has two processes. The first process, represented here, involves converting slips written manually by the author into questions to be asked of the unread content. I use the word “question” rather than “query” because I am using natural language processing in addition to more structured query methods. I want to create the association between author slips and natural language questions, such as one might find in a Question-Answer system.
slipfactory1

  1. The first process begins with a stack of slips generated manually by the author. Each slip is processed.
  2. Natural language processing is applied to convert the slip into tokens and analyze parts of speech.
  3. The keyword analysis is an algorithm that converts the outputs of step two into keywords. The selection of keywords will depend on their placement in the author slip, i.e., in the subject line, content, categories and tags. It will depend on other text analytics such as word frequency and word concreteness. Weighting factors may be applied to the keywords. This algorithm will be explained more later.
  4. Once the keywords have been selected, a structured question can be formed.
  5. Each question is added to a collection that will be used in the second process, to be represented in a following post.

Slips written by the author are questions asked of the unread content. Just like the ‘Whatson’ Question-Answer system.

Question. What is a slip?
The Lila cognitive system converts all text into units called slips. A slip is short length of text. It ranges in length from a single sentence to a few paragraphs. Minimally it has a subject line. Optionally it has suggested categories and tags. This is a natural way for authors to collect their thoughts using existing writing software. Lila also automatically converts unread content into slips. The manual and automatic processes are called the slipstream.

Question. Why does Lila use slips?
A slip is text in the length range of a sentence to a few paragraphs, long enough to extract meaning for natural language processing. One or a few words is not enough text to extract meaning. At the same, a slip is shorter than a page and a document. A slip is small enough to treat as an atom or unit of meaning. Slips can be clustered and reconfigured in many different views, to simplify complex analysis, to allow serendipity, to discover the needles in the haystack, to find missing angles and answers to questions.

Update. Another reason to use slips is that they are easy to evaluate visually. One can look at a slip and decide where it belongs in a sequence, more easily than a whole page of text. This feature will come in handy when I get to the visualization and views component of Lila.

Question the AnswersQuestion. How will Lila automatically convert unread content into slips and make them useful?
The stack of unread content is very large in size. It includes not only articles and books that the author has curated, but also undiscovered content on the web. It is a big data problem and Lila will use cognitive technology. Lila is a cognitive system, working on the same Question and Answer principles I previously discussed about ‘Whatson’. In a cognitive system, the question itself provides the frame of reference for processing big data and responding with an answer. In the Lila system, the slips written by the author are like questions asked of the unread content. The slips are a manageable size for natural language processing , as described in the previous question. Lila will use the author slips to filter down signal from noise and machine generate new slips. The association between the author slips and the unread content slips can be used to generate visualizations in an analytical tool.

What happened to ‘Whatson’?

Question. What happened to Whatson/Wilson/Whatever you called it?

Answer. ‘Whatson’ was my first go at building a cognitive system in my basement, a Q&A system for literature, using open source code, open access articles, open web knowledge, and public domain content. It went through two iterations, sketching out architecture in broad strokes, and documenting deep dives into key code pieces. My third ‘Wilson’ iteration was already in progress when I took a strategic turn this January. You see, the main point of Whatson was personal research into the depths of cognitive computing. Having covered some of those bases, my next iteration became clear. I had to pick a smaller project with a sharper focus. Lila is that project. Lila is a cognitive computing project, designed to extend reading, thinking and writing capabilities. I will be proceeding with Lila much like I did with Whatson, only the focus will be sharper, and, I believe, more interesting. Lots of cognitive stuff coming down the pipe. I hope you join me.

I have received inquires about the disappearance of Whatson. If you would like a copy of that material, here is a PDF of my Physika blog — the Whatson posts are in the 2014 entries toward the end. The code samples are also still online at Github Gist. I’m happy to answer any questions you have about Whatson or Lila.

Best to you.

Lila Slipstream II: Extend reading capabilities by processing content into slips

Lila is a cognitive computing system that extends writing capabilities. It also extends reading capabilities.

slipstream2

  1. In a previous post I outlined how an author uses existing writing software to generate “slips” of content. A slip is the unit of text for the Lila cognitive system. The slip has just a few required properties: a subject line, a bit of content, and suggestions for tags and categories. The author generates many slips, hence a “slipstream.” In this post, I show part two of the slipstream for other kinds of content.
  2. In the writing process, an author collects and curates related content generated by dialog with other people, e.g., email and blog commments, or written by other people, e.g., articles and books. This content is usually filtered and managed by the author, but the volume piles up well beyond the author’s ability to read. (Notice the icon in the lower right of item two looks like both a book and a scanner. It is assumed that all content will be digital text.)
  3. Existing technologies such as Google Alerts allow authors to monitor the web for undiscovered but related content generated by anyone in the world. This content abides on the open web, growing daily. The volume easily exceeds an author’s ability to curate let alone read. A Lila curation process will be described later.
  4. The second part of the Lila slipstream is a process that will automatically convert the curated and undiscovered content into slips. The common slip unit format will enable Lila to generate visualizations of the content, enabling the author to read and analyze a high volume of content. The visualization tool will be described later.

Lila Slipstream. Content is written naturally using existing digital writing software.

The Lila cognitive writing process begins with four steps:

slipstream

  1. Content is written naturally using existing digital writing software, on a mobile app, a laptop, or other device. This is a good time to emphasize that Lila is not just another writing studio, e.g., MS Word, Scrivener. It is a cognitive solution to extend reading, thinking and writing capabilities. There is much to follow.
  2. Content needs to be converted to a standard “slip” format. The slip format requirements are minimal: a subject line, content, and markup for suggested categories and tags. Various writing tools could easily export to this format.
  3. A “stream” of slips will be generated over time, hence the term, “slipstream.” This slipstream involves creating and collecting slips into a repository. Existing technologies like Evernote and Google Drive can be used to collect content into a repository. Lila adds to that. Natural language processing will be performed upon the content in the repository; this processing will be described later.
  4. A graphical user interface is used to visualize the content in an organized way. A default view can be generated at an early stage using the categories suggested by the writer. Analytics will performed to generate other views; this processing will be described later.