Metadata Visualization

There are at least two ways of interpreting a table of data.

Date Temperature Humidity
June 18 92 57
June 19 95 NULL
June 20 84 51

The first interpretation treats the table as a collection of facts about the world. For example, on June 18 the temperature was 92 degrees and the humidity was 57%. On June 19, the temperature was 95 degrees and humidity was unknown.

The second interpretation treats the table as a literal list of data points. For example, on June 18, someone recorded the temperature at 92 degrees and the humidity at 57%. On June 19, the humidity sensor was broken. The data is stored in a table with three columns. Before June 18, the data was being recorded in a different table.

In other words, we can focus on what the data says about the world, or we can focus on the data itself.

We can think of the data ephemerally as information, or we can think of it as a physical thing that exists in and of itself.

This is analogous to written language: a sentence or paragraph generally means something, but it also exists as physical letters and punctuation on the page.

The second interpretation is often called metadata: data about the data. How was it collected, by whom, for what purpose, and where and how is it stored? How accurate is it likely to be?

If we are very confident about the accuracy and relevance of the data, we can summarize and visualize it cleanly. We could show a line chart of temperature over time and start to draw conclusions about what the temperature trend means.

But if the accuracy and relevance is unknown, we need to take steps to better understand the metadata. How much data is there? Which parts are missing, or appear to be duplicated? Where did it come from? What metrics are most relevant?

Suppose the default behavior of a data analysis tool is to ingest your data and take you directly to a clean line chart. Is that convenient or misleading? Does that clean line chart imply that you are looking at truth, when in fact you may just be looking at data?

Can we assume that the line chart is about temperature, or should we emphasize that it shows data about temperature? What is the best way to communicate that distinction?

Swift

Apple announced a new programming language called Swift earlier this week at WWDC 2014. The focus during the keynote was ease of use, and indeed the language is incredibly exciting as a learning tool. But this is not a simplistic language. It is extremely powerful, extremely well crafted, and designed to replace Objective-C for professional software development. In many ways it feels like the next evolution in the line of C, C++, Obj-C, C#… but they ran out of “C” modifiers and instead called it Swift.

Swift can be easily adopted by software companies because it is interoperable with most existing code written in C, C++, and Objective-C. You don’t have to rewrite your app from scratch just to get started.

The developer tools team is also shipping a live coding environment inspired by Bret Victor. This is truly exciting to see, and I suspect they are only just getting started. This environment is not only useful for beginners, it will also change the way professional programming is done: instead of building and debugging entire apps, developers can prototype, explore, and debug individual modules interactively in the “playground”. The documentation also lives in this environment, so you can play with example code and see the results in real time.

I have a lot more to learn about Swift, but my initial impressions are that it has achieved the high praise of “obvious only in retrospect.” I suspect it will significantly influence the software community.

The World is Continuous, but the Mind is Categorical

I’m going to contend that the physical world, at least at human scale, is continuous. For example, when objects move through space, they visit every perceivable intermediate position along the way. When you heat a room, the temperature passes though all intermediate temperatures. Colors, sounds, materials, emotions… even objects that we see as discrete entities, such as dogs and cats, have all sorts of continuous dimensions like size and weight, ear length and paw length, hairs per square centimeter, etc.

Yet we humans are constantly categorizing everything into discrete buckets. Hot, cold, warm, lukewarm… dog, cat, fish, zebra… Democrat, Republican, Whig, Tory… introvert, extrovert, intuitive, logical… using labels, names, groupings, and sub-groupings, essentially all of human language is an exercise in chopping the world up into manageable chunks.

This categorization allows us to communicate with each other, remember things, and reason logically. In fact, there is evidence that the part of the brain that most distinguishes humans from other species (our large neocortex) is structured specifically to support the storage of hierarchical, discrete concepts.

Cognitive psychology has shown that people typically define categories by using one or more canonical examples of each. For example, to determine whether a red-orange swatch is red or orange, we mentally compare it to our memories of canonical red and canonical orange and decide which is closer. Similarly, to decide whether something is a cat or a dog, we compare the specimen to our mental representation of a canonical cat and a canonical dog.

The less common, alternative method of distinguishing between categories is to define their boundaries (instead of their centroids). For example, some jurisdictions define a blood alcohol level of .08 as the boundary above which you are not allowed to drive. Notice that the precision is arbitrary: the cutoff could just as well have been .085 or .08222 repeating. The precision is there to make legal decisions easier. But in the real world, the boundary between “safe driver” and “unsafe driver” is fuzzy.

And indeed, the pesky, continuous real world interferes with even the most sophisticated attempts to categorize it. For example, biologists might agree that the way to distinguish between two species of fish is that one has a dorsal fin and the other doesn’t. It sounds black and white. But then you find a specimen that has sort of a partially formed dorsal-fin-like appendage that might just be a bump. The biologists get squirmy, and end up categorizing the specimen the intuitive way, based on similarity to their mental canonical examples of each species.

Even the distinction between “continuous” and “discrete” is itself fuzzy! Consider that human perception is inherently discrete because it operates using individual nerve cells; similarly, computers deal only with ones and zeroes so are discrete by definition. Yet the high resolution of both perception and computer displays gives the illusion of a continuous process, and indeed it is often most useful to think of these systems as continuous.

So the great advantage of using categories is that they allow us to convert the infinitely complex world into finite pieces that we can gain familiarity with and reason about. The great disadvantage is that categories are fuzzy, subjective. They form a simplified model of reality that is subject to interpretation, especially around the edges.

When you think about it, it’s astonishing how smoothly humans can navigate this very rough interface between models and reality. Every time we do almost anything, we have to first perceive the continuous world, translate it into categorical thinking, make a discrete decision, and then translate that back into a continuous motor action. All of this happens innately, below the threshold of consciousness.

I started pursuing this Interesting Thought because data analysis systems distinguish between continuous (numerical) fields that can be summed and averaged, and categorical (string) fields that can only be compared or filtered. But the deeper I got, the more it started to feel like a fundamental underpinning of Life, The Universe, and Everything.

For example, cultures get dragged down by divisive categorizations that form stereotypes. Religions suffer from rigid definitions of good and bad. Lawyers make their case by arguing that their client’s actions are best seen as an example of some discrete law or case history. And scientists and other professionals are often limited — or inspired — by arbitrary boundaries between companies, departments, and fields of study.

At its core, the search for a unified theory of physics can be seen as an effort to eliminate all distinctions when describing the physical world. Now I wonder: would we even be able to comprehend such a theory, given that our minds fundamentally think in categories?

Are there ways to get around this limitation? Zen? Math? Computation? How have people coped with this through history? Is this the next step of evolution? Or just a philosophical insight? It’s probably somewhere in between.

Why Platforms Must Be Simple

“From a practical (and historical) standpoint, we can assume that no complex specification will be implemented exactly. […] A platform consisting of the union of all possible implementations is thus arbitrarily unreliable.”

-Bret Victor, Magic Ink

Impediments to talent

“We start from the presumption that our people are talented and want to contribute. We accept that, without meaning to, our company is stifling that talent in myriad unseen ways. Finally, we try to identify those impediments and fix them.”

-Ed Catmull, Creativity, Inc. (p. 22)

Untangling the profession of teaching

The Hour of Code website offers this tip for teachers: “It’s okay to respond: ‘I don’t know. Let’s figure this out together.'”

Read that again. “I don’t know. Let’s figure this out together.” How many times did your high school teachers say that?

There is a slowly growing recognition that technology and culture are changing much too quickly for teachers to have all the answers. The answers are typically found on the internet. In contrast, the teacher is there to provide the many other intangibles that are a prerequisite for learning, such as connecting with students on a personal level; nurturing curiosity and integrity; creating a space where students can fail safely along the way to mastery; and many others.

So many aspects of teaching have been intertwined for decades in the single profession of teacher. But software technology is starting to allow a much greater degree of specialization, and the various strands of the profession are gradually being untied and examined individually.

I know many teachers will miss their wide-ranging traditional roles. But specialization is also the best route I can fathom to cope with the increasingly urgent need to update the curriculum to keep up with the times. Code.org is “basically training existing math and science teachers […] to become computer science teachers.” [link] Computer science didn’t exist a few decades ago. Social media studies didn’t exist five years ago. The next world-changing technology is being developed right now. How can teachers keep up?

By not trying to do everything themselves — such as developing their own lesson plans, or knowing all the answers. Fantastic curriculum is increasingly available for free online (often with built-in quizzes and other feedback) — developed by teachers who are specializing in those subject areas. Can classroom teachers take advantage of those resources to focus on other strands like student engagement and motivation?

From another perspective, the teacher without all the answers is practicing a type of “growth mindset.” It’s hard to expect a student to understand the value of lifelong learning if their teachers do not model it. From this perspective, the accelerating pace of technological change has the byproduct of reinforcing the need for lifelong learning across all walks of life. Teachers have the dual challenge of preparing students for this changing world and coping with it themselves.

Computer science teachers

“We’re basically training existing math and science teachers […] to become computer science teachers. The beauty of training those existing teachers is that the school doesn’t need to hire anybody new; there’s no budget change for the school; the existing staff can offer this course.”

-Hadi Partovi, Code.org [video ~ 33:00 mark]

Skills gap

“In Washington State, each graduate in computer science is met with 27 open computer science jobs. [Nationally, the ratio is 3 jobs per graduate.]”

-Hadi Partovi, Code.org [video ~20:30 mark]

Mobile apps are the new Internet

To [the younger] generation, it seems slow, purposeless even to go from website to website in a single, sub-par Web browser environment when they can get rich app experiences right from their [mobile] home screen.”

– Owen Williams, TNW

Both introverted and extroverted

“The way these creative individuals confront life suggests that it is possible to be both extroverted and introverted at the same time. In fact, expressing the full range from inner- to outer-directedness might be the normal way of being human. What is abnormal is to get boxed in at one of the ends of this continuum, and experience life only as a gregarious, or only as a solitary being.”

-Mihaly Csikszentmihalyi, Finding Flow