Lila’s four cognitive extensions to the writing process

Lila is cognitive writing technology. It uses natural language processing to extend the cognitive abilities of a writer engaged in a project. In the previous post I described the seven root categories used to organize a non-fiction writing project and to optimize Lila’s analytics. These categories are considered a natural fit with the writing process and can be visualized as folders that contain notes. In this post I present a diagram that maps Lila’s four cognitive extensions through the folders to the writing process.

4 extensions to writing

A “slip” is the unit of text in Lila. A slip is equivalent to a “note,” usually one or a few sentences, but no hard limit.

  1. The early stages of the writing process focus on thinking and research. An author sends slips to an Inbox and begins filing them in a Work folder. Documents and books that have not been read are sent to the TLDR folder. Lila processes the unread content, generating slips that are also filed in the Work folder.
  2. As the slips build up the author analyzes them. Using Lila, an author can visualize the connections between slips. The author can  “pin” interesting connections and discard others.
  3. Connections are made between the author slips, and from author slips to unread content slips. Where the connections are made to unread content, a link is provided to the original document or book. Authors can read both slips and original material in the context of their own content. This is called “embedded reading”, allowing for swifter analysis of new material.
  4. Analysis leads to organizing and writing drafts. An author will organize content in a particular hierarchical view, a table of contents. The author can get new insight by viewing the content in alternate hierarchical views generated by Lila.

The writing process usually involves each of these steps — thinking, research, analysis, etc. — at each step. Lila can perform its cognitive extensions at any step, e.g., integrate a new unread document late in the process. As the writing process continues, slips will be edited and integrated into a longer work for publication. Lila maintains a sense of “slips” in the background even when the author is working on a long integrated unit of text.

Seven root categories for organizing non-fiction writing and optimizing Lila’s analytics

Lila technology collaborates with an author engaged in a writing project. A model of the writing process is assumed, one that is considered natural for writing non-fiction, at least, and compliant with existing writing software. In this model, an author writes notes and organizes them into categories. Seven root categories are assumed to be fundamental to a writing project, folders than contain the written material. The categories are presented here not so much as Lila system requirements, but as a best practice, structures that optimize the writing process and Lila’s analytics. If you do not use these categories to organize your non-fiction writing project, you might consider doing so, whether or not you intend to use Lila.

Step in the Writing Process Structural Category/Folder Category/Folder Description Comparison with Pirsig’s categories
1 The author begins a project. A root Project folder is created, a repository for everything else. Project A single root folder. Contains all other folders and slips. Root folder may contain high level instructions regarding project plans, to do lists, etc., but these are not content  for Lila’s analysis. Like Pirsig’s PROGRAM slips, the Project folder may contain “instructions for what to do with the rest of the slips” but this information will not operate as a “program.” All programming functions will be handled by Lila code.
2 The author takes notes on ideas using various software programs on different devices. Many notes will require further thought before filing into the project. These notes get sent to an inbox, a temporary queue, a point for later conscious attention and classification. Project > Inbox The Inbox may be an email inbox or an Evernote notebook dedicated to an inbox function. There can be multiple inboxes. Notes in the inbox may be tentatively assigned categories and/or tags, but these will be reviewed. Inbox corresponds to Pirsig’s UNASSIMILATED category, “new ideas that interrupted what he was doing.”
3 Notes are filed into a main folder, a workspace for all the active content. Project > Work Notes in the Work folder are organized by categories and subject classified by tags. These notes are the target of Lila’s analytics. See upcoming post on subject classification for more information. The Work folder contains all the topic categories Pirsig developed as he was working.
4 Some ideas are considered worth noting, but either not sufficiently relevant or too disruptive to file into the main work. These notes should not be trashed, but parked for later evaluation. Project > Park Parked notes are excluded from Lila’s analytics, but can be brought back into play later. Park corresponds to Pirsig’s CRIT and TOUGH categories. I see these two categories as the positive and negative versions of the same thing, i.e., disruptive ideas. Don’t let them take over but don’t ignore them either. Let them hang out in the Park for awhile.
5 A primary function of Lila is to assist with the large volume of content that an author does not have time to read. On the web, the acronym TLDR is used, “Too Long; Didn’t Read.” Project > TLDR TLDR is not a flippant term. Content Management Systems typically have special handling for large files. Lila will generate notes (slips) from this unread content, and present it in context for embedded reading. Pirsig provided no special classification for unread content. Likely it just went in a pile, perhaps left unread.
6 Some notes, and chains of notes, seem important at one time but later are considered irrelevant or out of scope. This typically happens as the project matures and editing is undertaken. These notes are not trashed but archived for possible reuse later. Project > Archive Archived notes are excluded from Lila’s analytics. Perhaps a switch will allow them to be included. The archive could tie into version control for successive drafts. Pirsig filed these notes in JUNK, “slips that seemed of high value when he wrote them down but which now seemed awful.”
7 Other notes are just plain trash: duplicates, dead lines of thought. To avoid noise in the archive it’s best to trash them. Project > Trash Trashed notes are excluded from Lila’s analytics. These notes may be purged on occasion. Pirsig filed these notes in JUNK, but maintained them indefinitely.

 

“Actually, these last two piles, JUNK and TOUGH, were the piles that gave him the most concern.”

Phaedrus is the philosopher-protagonist in the well-known book, Zen and the Art of Motorcycle Maintenance by Robert Pirsig. Phaedrus is Robert Pirsig, the author, and his books represent a serious metaphysical inquiry. Lila is the lesser-known sequel in which Phaedrus refines and organizes his thought. It is the organizational elements that inspired my current software project. In the following quote, Phaedrus describes the information architecture of his project. It is elegant and complete, found in better organized folder systems, reflecting the natural development of thought.

In addition to the topic categories, five other categories had emerged. Phaedrus felt these were of great importance:

The first was UNASSIMILATED. This contained new ideas that interrupted what he was doing. They came in on the spur of the moment while he was organizing the other slips or sailing or working on the boat or doing something else that didn’t want to be disturbed. Normally your mind says to these ideas, ‘Go away, I’m busy,’ but that attitude is deadly to Quality. The UNASSIMILATED pile helped solve the problem. He just stuck the slips there on hold until he had the time and desire to get to them.

The next non-topical category was called PROGRAM. PROGRAM slips were instructions for what to do with the rest of the slips. They kept track of the forest while he was busy thinking about individual trees. With more than ten-thousand trees that kept wanting to expand to one-hundred thousand, the PROGRAM slips were absolutely necessary to keep from getting lost.

What made them so powerful was that they too were on slips, one slip for each instruction. This meant the PROGRAM slips were random access too and could be changed and resequenced as the need arose without any difficulty. He remembered reading that John Von Neumann, an inventor of the computer, had said the single thing that makes a computer so powerful is that the program is data and can be treated like any other data. That seemed a little obscure when Phaedrus had read it but now it was making sense.

The next slips were the CRIT slips. These were for days when he woke up in a foul mood and could find nothing but fault everywhere. He knew from experience that if he threw stuff away on these days he would regret it later, so instead he satisfied his anger by just describing all the stuff he wanted to destroy and the reasons for destroying it. The CRIT slips would then wait for days or sometimes months for a calmer period when he could make a more dispassionate judgment.

The next to the last group was the TOUGH category. This contained slips that seemed to say something of importance but didn’t fit into any topic he could think of. It prevented getting stuck on some slip whose place might become obvious later on.

The final category was JUNK. These were slips that seemed of high value when he wrote them down but which now seemed awful. Sometimes it included duplicates of slips he had forgotten he’d written. These duplicates were thrown away but nothing else was discarded. He’d found over and over again that the junk pile is a working category. Most slips died there but some reincarnated, and some of these reincarnated slips were the most important ones he had.

Actually, these last two piles, JUNK and TOUGH, were the piles that gave him the most concern. The whole thrust of the organizing effort was to have as few of these as possible. When they appeared he had to fight the tendency to slight them, shove them under the carpet, throw them out the window, belittle them, and forget them. These were the underdogs, the outsiders, the pariahs, the sinners of his system. But the reason he was so concerned about them was that he felt the quality and strength of his entire system of organization depended on how he treated them. If he treated the pariahs well he would have a good system. If he treated them badly he would have a weak one. They could not be allowed to destroy all efforts at organization but he couldn’t allow himself to forget them either. They just stood there, accusing, and he had to listen.

Pirsig, Robert M. (1991). Lila: An Inquiry into Morals. Pg. 25-26.

What is the difference between a Question and an Answer? Focus and Context Slips in Lila.

What is the difference between a Question and an Answer? Both are bunches of text. One could say that the Question is missing something that the Answer provides, but Questions and Answers are not often shaped as neatly as puzzles, with the missing part plugging easily into the whole. In natural language processing (NLP), the distinction between Question and Answer is understood in the cognitive terms of Focus and Context. Focus refers to attention, the text that is currently being analyzed. In Lila, each slip written by the author is analyzed as a Focus point, asking a Question of the large corpus of unread content. The Focus slip provides the particulars of the Question, but the point is always to find relevant Context that will shed new light on the Focus or expand it with new information. The Question queries the corpus and Lila responds with zero, one or many Context slips. Focus is joined to Context through the Question, as shown in this first figure.

focus and context slips

The Question is implemented as an NLP query. Search results are ranked by relevance. The rankings can be expressed as a correlation between Focus and Context slips. At this point the system has completed a portion of the cognitive work that used to be manual, i.e., reading and filtering from a large volume of material. That work is now completed automatically, and the author can instead focus of higher-order cognitive work, thinking about the Context and integrating it into the Focus.

An author might choose to modify a Focus slip, or “pin” a Context slip to it, creating a association for later work. In a traditional programmed system, these association would be maintained with unique identifiers, shown in the second figure as “slip1,” “slip42,” and so on, as shown in this second figure.

pin association

In cognitive systems, there is a shift away from using unique identifiers. It requires a change in thinking about how information is organized. In a traditional Von Neumann architecture, data is stored in structured tables and related through unique identifiers. This kind of system is normally planned carefully in advance because changes in the information architecture require costly database work. Cognitive systems like Lila are being designed to be more fluid, allowing for quickly shifting views and analysis, essentially changing information architecture on the fly. How is this possible? Consider that it is really the Question and its query that make the association. The query is the embodied link between the full text of a Focus slip and the full text of the Content slip. One can imagine a powerful cognitive system in which an Author edits a Focus slip and the system responds dynamically with new queries and Context slips.

Consider that idea again. It is really the Question and its query that make the association. The query is the embodied link between the full text of a Focus slip and the full text of the Content slip. We think of metadata as being “shorter” than content, e.g., “slip1″ is an identifier for a content record, a short item that stands in place of the longer content. This “shortness” is traditionally what makes metadata useful in organizing and finding content. Things change with cognitive systems. Unlike a unique identifier, a query is built from the full text of a Focus slip and maintains it meaning. The difference between metadata and content breaks down. (When surveillance agencies tell you they are only looking at metadata, this means they are also looking at content. Think about it.)

It Lila it will be practical to use unique identifiers to store temporary pinned associations between Focus and Context slips. As the work progresses the pins and their associations will disappear because the author will modify the Focus slips into a longer integrated stream of text for publication.

Lila is cognitive writing technology built on top of software like Evernote. Key differences.

evernoteWriters everywhere benefit from content management software like Evernote. Evernote can collect data from multiple devices and locations and organize it into a single writing repository. Evernote is beautiful software. For the last few years, I have been using Google Drive to collect notes. Recently I tried Evernote again, and I am impressed enough to switch. Notebooks, tags, collaboration, web clipping, related searches. All very nice.

Lila is cognitive writing technology built on top of software like Evernote. Here are some key differences between the products:

1. Evernote users read long-form content manually, decide if it is relevant, and then write notes to integrate it into their project. Lila will pre-read content for users and embed relevant notes (slips) in the context of the user’s writing. This will save the writer lots of reading and evaluation time.

2. Evernote users get “related searches” from a very limited number of web sources. Lila will perform open web searches for related content.

3. Evernote users can visualize a limited number of connections between notes. I am yet to get any utility out of this. Lila will use natural language search to generate a vast number of connections between notes, allowing a user to quickly understand complex relationships between notes.

4. Evernote users can use tags to construct a hierarchical organization of content. Notebooks can only have one sub-level of categorization, essentially chapters, but many writers need additional levels of classification. Tags can be ordered hierarchically and if you prefix them with a number they will sort in a linear order. You can use tags for hierarchical classification but it creates problems.

  • If you want both categories and tags, you will have to use a naming convention to split tags into two types.
  • Numbering tags causes them to lose type-ahead look-up functionality, i..e, you have to start by typing the number. It is a problem because numbers can be expected to change often.
  • If you decide to insert a category in the middle of two tags, you have to manually re-number all the tags below.
  • Tags are shared between Notebooks. Maybe that works for tags? Not for hierarchical sectioning of a single work.

None of these problems are technically insurmountable. I hope Evernote comes out with enhancements soon. I would like to build Lila on top of Evernote. Lila has something to add. To be cognitive means an inherent ability to automate hierarchical classification. Lila will be able to suggest hierarchical views, different ways of understanding the data, different choices for what could be a table of contents.

Embedded Reading in Lila Cognitive Writing Technology [Video]

Lila is a cognitive technology that extends reading and analysis capabilities for a writing project. Author content is used to generate “slips”, short units of text from unread content. Slips are visualized to allow embedded reading. Embedded means “to fix firmly and deeply in a surrounding mass.” Embedded reading is reading content in the context of other closely related content. Context is meaning. Embedded reading gives new insight and ensures completeness. It is visualized as web of associated, clickable slips in Lila. View the video.

Writing has changed with digital technology, but much is the same. Pirsig’s slip-based writing system was inspired by information technology.

Writing has changed with digital technology, but much is the same. The Lila writing technology builds on both the dynamic and static features.

Writers traditionally spend considerable time reading individual works closely and carefully. The emergence of big data and analytic technologies causes a shift toward distant reading, the ability to analyze a large volume of text in terms of statistical patterns. Lila uses these technologies to select relevant content for deeper reading.

Writing, as always, occurs in many locations, from a car seat to a coffee shop to a desk. Digital technology makes it easier to aggregate text from these different locations. Existing technologies like Evernote and Google Drive can gather these pieces for Lila to perform its cognitive functions.

Writing is performed on a variety of media. In the past it might have been napkins, stickies and binder sheets. Today it includes a greater variety, from cell phone notes to email and word processor documents. Lila can only analyze digital media. It is understood that there is still much text in the world that is not digital. Going forward, text will likely always be digital.

Writing tends to be more fragmented today, occurring in smaller units of text. Letter length is replaced with cell phone texts, tweets, and short emails. The phrase “too long; didn’t read” is used on the internet for overly long statements. Digital books are shorter than print books. Lila is expressly designed around a “slip” length unit of text, from at least a tweet length for a subject line, up to a few paragraphs. It would be okay to call a slip a note. Unlike tweets, there will be no hard limit on the number of characters.

A work is written by one or many authors. Print magazines and newspapers are compilation of multiple authors, so too are many websites. Books still tend to be written by a single author, but Lila’s function of compiling content into views will make it easier for authors to collaborate on a work with the complexity and coherence of a book.

In the past, the act of writing was more isolated. There was a clear separation between authors and readers. Today, writing is more social. Authors blog their way through books and get immediate feedback. Readers talk with authors during their readings. Fans publish their own spin on book endings. Lila extends reading and writing capabilities. I have considered additional capabilities with regard to publishing drafts to the web for feedback and iteration. A WordPress integration perhaps.

Pirsig’s book, Lila, was published in 1991, not long after the advent of the personal computer and just at the dawn of the web. His slip-based writing system used print index cards, but he deliberately chose that unit of text over pages because it allowed for “more random access.” He also categorized some slips as “program” cards, instructions for organizing other slips. As cards about cards, they were powerful, he said, in the way that John Von Neuman explained the power of computers, “the program is data and can be treated like any other data.” Pirsig’s slip-based writing system was no doubt inspired by the developments in information technology.

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. Lila is for writing non-fiction; poetry, not so much.

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. And style. Think clearly and the rest comes easy. Lila is designed to extend human writing capabilities by performing cognitive work:

  1. The work of reading, especially during the early research phase. Writers can simply drop unread digital content onto disk, and Lila will convert it into manageable chunks — slips. These slips are shorter than the full length originals, making them quicker to evaluate. More important, these slips are embedded in the context of relevant content written by the author; context is meaning, so unread content will be easier to evaluate
  2. The work of analyzing content and sorting it into the best view, using visualization. As Pirsig said, “Instead of asking ‘Where does this metaphysics of the universe begin?’ – which was a virtually impossible question – all he had to do was just hold up two slips and ask, ‘Which comes first?'” This work builds of a table of contents, a hierarchical view of the content. Lila will show multiple views so the author can choose the best one.
  3. The ability to uncover bias and ensure completeness of thought. Author bias may filter out content when reading, but Lila will compel a writer to notice relevant content.

robotpoetLila’s cognitive abilities depend on the author’s engagement in a writing project, generating content that guides the above work. Lila is designed expressly for the writing of non-fiction; poetry, not so much. The cognitive work is performed in most kinds of writing, and so Lila will aid with other kinds of fiction. Both fiction and creative non-fiction still require substantial stylistic work after Lila has done her part.

The author slip selects and boosts words for questioning unread content

Lila uses author slips to “question” a collection of unread articles and books, suggesting “answers” or responses that extend the author’s material. The term, question, is appropriate because Lila uses natural language processing to enhance search. The application of natural language is shown here in three ways.

1. Distill the focus of the author slip

Perhaps the most important step is to decide which words are the most meaningful for questioning unread content. The design of the slip provides the necessary structures for making this decision, as shown in the figure:

template-slip-weights

An algorithm could use these design features to group keywords and calculate relative weights for use in searching, as shown in the table:

Figure # Field Calculation Word (weight)
4 Content Weight of 1 for each uncommon word. Increase by 1 for each occurrence, so static and dynamic add up to 3. static (3), dynamic (3), quality (1), scientific (1), knowledge (1), cave (1), political (1), institutions (1), centuries (1), king (1), constitution (1), destroying (1), government (1)
3 Subject Line Weight of 2, twice that of Content. Stop words removed. pencil (2), mightier (2), pen (2)
2 Tags Weight of 4, twice that of Subject Line staticVsDynamic (4)
1 Categories Weight of 8, twice that of Tags. “Quality” appears in both Content and Categories; the weight for this word could be their sum, 9. quality (8+1=9)

The words can be used as keywords in a natural language query. The weights would be included as boost factors, ranking search results higher if they contain those words.

2. Apply other natural language analysis, such as word frequency

In the above table, not all words were selected. In the Subject Line, stop words (e.g., “the”) are removed. This is a common practice in the query construction, since stop words are too common to add value. Similarly, in Content, only uncommon words are kept. In this case, word frequency could be calculated using a scientific measure. Words falling below a threshold could be skipped. Word frequency and other linguistic features, such as repetition and word concreteness, will be discussed in detail later on. These steps utilize knowledge of language to improve search relevance.

3. Take advantage of natural language index configurations

Unread content will be crawled and organized in a natural language index, such as Apache Solr’s Lucene index. An index of this sort can be configured to apply other natural language processing, e.g., synonym matching between queries and documents.


Update. February 20, 2015. Alternative approach, most simple. “Bag of words” … simplify matching in NLP, disregard grammar and word order, keep multiplicity.

Update. January 31, 2015. Alternative approach, more complex. Convert natural language to SPARQL, then query the slip repository, converted to RDF through Parts-of-Speech analysis. Might produce better matches.

Lila Slip Factory II: Answer the questions and find “semantic corners” to build new slips.

Lila is a cognitive system that extends reading abilities by automatically dividing unread content into “slips”, relatively small units of text for later visualization and analysis. This automatic operation is called the Slip Factory. It involves two processes. The first process takes slips manually written by an author and converts them into a collection of “questions;” see the earlier post. The second process, represented here, involves iterating through the unread content, answering the questions and applying a “semantic corners” algorithm that splits and groups text into new slips.

slipfactory2

  1. The process iterates through the collection of unread content. The collection includes content curated by the author, and additional content collected by the system. The content is a large volume of unstructured content in digital format.
  2. The content is tokenized in readiness for search. Search requirements may include an index.
  3. The process iterates through each question generated by the first process.
  4. The question is a natural language query on each item in the unread content collection. The query identifies and ranks a set of search results, documents that are considered to “answer” the question.
  5. The most interesting point in this process is the “semantic corners” algorithm that will analyze the answer passages, splitting and grouping them into new slips. The semantic corners algorithm is the heart of the Lila concept. It will be detailed in future posts.
  6. New slips are added to the collection of slips for later visualization and analysis.