Archive for the ‘Author: Abhijit Mahabal’ category

The State of Seqsee

March 17, 2008

I am relieved to have reached a stage where Seqsee sees all the sequences that I wanted it to see in the initial release. This does not mean that the work is done. It is still a long way home.

So what sequences can it see? If you allow me to include sequences that it sometimes sees, it is a long list. Many of these it can reliably extend, and making Seqsee reliable on the other sequences is the main work left.

The sequences:

  • 1, 2, 3, 4…
  • 1, 1, 2, 2, 3, 3, 4, 4…
  • 1, 2, 2, 3, 3, 3, 4, 4, 4, 4…
  • 1, 7, 2, 8, 3, 9…
  • 1, 7, 1, 2, 8, 1, 2, 3, 9…
  • 1, 1, 2, 1, 2, 3, 1, 2, 3, 4…
  • 1, 1, 1, 2, 1, 1, 2, 1, 2, 3, 1, 1, 2, 1, 2, 3, 1, 2, 3, 4…
  • 1, 1, 1, 2, 1, 3, 1, 4…
  • 2, 1, 2, 2, 2, 2, 2, 3, 2, 2, 4, 2…
  • 1, 2, 3, 2, 3, 4, 5, 3, 4,5, 6, 7…
  • 1, 1, 2, 1, 1, 2, 3, 2, 1…
  • 1, 17, 17, 1, 1, 1, 17, 17, 17, 17…
  • 1, 1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6
  • 2, 3, 5, 7, 11… (when primes are allowed in the domain via a switch)
  • 2, 3, 2, 3, 4, 5, 4, 3, 5, 6, 7, 6, 5, 7, 8, 9, 10, 11, 10, 9, 8, 7…
  • 1, 2, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7…
  • 1, 2, 2, 3, 3, 5, 4, 7, 5, 11, 6, 13, 7, 17…
  • 1, 2, 3, 17, 4, 5, 6, 17, 7, 8, 9, 17…
  • 1, 2, 3, 17, 3, 4, 5, 17, 5, 6, 7, 17…
  • 1, 1, 1, 2, 3, 2, 3, 2, 3, 4, 5, 6, 4, 5, 6, 4, 5, 6…
  • 1, 1, 2, 3, 1, 2, 2, 3, 4, 1, 2, 3, 3, 4, 5…
  • 1, 2, 3, 1, 2, 2, 3, 1, 2, 2, 2, 3…
  • 1, 1, 2, 3, 1, 2, 2, 2, 3, 4, 1, 2, 3, 3, 3, 3, 4, 5…

Now that Seqsee has reached a feature-freeze, I will begin in earnest the task of fine-tuning. I have spent a long part of my work building tools, and these had better serve me well here.

I hope that this story does not end with me likened to George W. Bush when he said (five years ago!) “My fellow Americans: Major combat operations in Iraq have ended”. There are likely to be many surprises as I fine tune (and redo substantial chunks of various codelet families, remove accumulated cruft and so forth). That fine-tuning is a story unto itself, and it will get its own post.

  

Advertisements

Interesting Numbers

March 7, 2008

About an year ago, my Ph.D. advisor (Douglas Hofstadter) showed me a graph he had been drawing by hand. The x-axis consisted of the natural numbers: 1, 2, 3… There was a stack of dots above each, forming the y-axis. There was a red dot above all the squares (1, 4, 9…), a blue dot over the triangular numbers (1, 3, 6 …), a yellow dot over powers of “2” (1, 2, 4, 8, 16…) and so forth, the idea being that the more interesting a number is, the higher the stack of dots above it.

I decided to help and automate the process. I downloaded the Encyclopedia of Integer Sequences, and created a different graph, based on the same idea. The x-axis was the same. Instead of dots of various colors, I just drew the regular, everyday graph, with the y value corresponding to an integer being how many distinct sequences in the Encyclopedia it was present in.

I had access to a whopping 128000 sequences. Two graphs are shown below. One shows numbers upto two hundred, the other between 4000 and 4200.

 I can also provide a file containing the “interestingness values” of numbers upto 50,000, if you like.
The first five numbers that are “boring” are, incidentally, 8795, 9734, 9935, 10017, 10418. 

 

An extensible FARG implementation, Part II

February 28, 2008

[Microsoft Word seems to be messing up my fonts. Apologies.]

The question I explore in this post is how much and what can be shared between CRCC projects. I will try to stay as concrete as I can. In this post, I have managed to talk about a single “simple” aspect.

One of the most obviously sharable aspects of FARGitecture is the codelet system. I am not referring to the types of codelets, but rather to the infrastructure for dealing with codelets. If a good, solid, and reusable framework had been available to me, it would indeed have saved time. My final implementation is only a couple of hundred lines of code, but the effort required and mistakes repeated make it quite costly. Needlessly costly.

What are the petty things needed of this infrastructure? Obviously, we need a simple interface to add codelets and select codelets. That much is obvious. What may become more apparent only by implementing a system and struggling to grapple with its complexity are other features and tools, like the following (and I have not counted these “optional” features in the 200 lines I mentioned):

  • We need the ability to remove stale codelets. When a Codelet is added to the Coderack, it may refer to some structure in the workspace. While the codelet is awaiting its turn to run, this workspace structure may be destroyed. At the very least, we need code to recognize stale codelets to prevent them from running.
  • Writing new codelets should be easy. Most codelets, when written in the programming language being used, share boilerplate code. Tools that allow us to write closer to the domain (and then compile our code down to the programming language) have proved very useful to me. They improve code readability, lower the barrier to experimenting with new codelets, even improve error messages. Just to illustrate, here is the code for one codelet type:

    CodeletFamily flipReln( $reln ! ) does {
    RUN: {
    my $new_reln = $reln->FlippedVersion() or return;
    $reln->uninsert;
    $new_reln->insert;
    }
    }

    Mind you, I am not claiming this to be hard or undoable by appropriately setting up classes so that the “compiler” is not required. It is not difficult, but it takes careful thought and time. I am just pointing out that something needs to be done to ease codelet writing.

  • Seqsee has the notion of scripts. Some tasks are by their very nature somewhat serial. Like makings tea. Or describing a solution. How would a FARGitecture do such tasks? Certainly not by using a single codelet. We do not need a “make tea from scratch” codelet, or a “write dissertation” codelet. Seqsee “solves” this (ha ha) by splitting the task into several codelets that call each other. Painful to write? Yes. Repetitive to code? Yes. Extend the programming language to reduce busy work? Yes. So here is Seqsee’s code for one task (which is a subtask of a bigger task, but we don’t have to care).

    CodeletFamily DescribeRelationCompound( $reln !, $ruleapp ! ) does scripted {
    STEP: {
    my $category = $reln->get_category();
    SCRIPT DescribeRelnCategory, { cat => $category };
    }
    STEP: {
    my $meto_mode = $reln->get_meto_mode();
    my $meto_reln = $reln->get_metonymy_reln();
    SCRIPT DescribeRelnMetoMode,
    {
    meto_mode => $meto_mode,
    meto_reln => $meto_reln,
    ruleapp => $ruleapp,
    };
    }
    }

    The SCRIPT calls launch other codelets, which may call more scripts, and return. All along, all the laws of codelets hold: they can be removed from the Coderack without being run, several “scripts” may run in parallel, and so forth. Again, being able to focus at the task on hand is a big win.

  • At any stage, several codelets are on the Coderack. Several avenues are being explored in parallel. In order to understand what’s going on, or even to demonstrate what’s happening, we can build tools. These are only related to the codelets, and can be completely domain independent. Here are some such visualizations from Seqsee.
    • Codelets launch other codelets. By logging what codelet added what other codelet when, we can obtain trees of codelet launching. Very useful for debugging. This image is from a tool that allows exploration of such trees, allows searching for specific types of codelets, and also lets us see how long codelets sit on the Coderack.

    • At any stage, there is pressure to do several things. Each codelet is a tiny bit of pressure. The log mentioned above also allows us to create graphs that display the change in pressure to do a particular thing over time.

      The image below shows the pressure to extend groups during a single run.

All tools mentioned here are domain independent. Just writing the system that manages the infra takes little time, but writing tools takes longer. And deciding what tools to build takes longer still, and yet, in the long run, they save time. Without these tools, I may not have been able to write Seqsee.

Imagine writing a FARGitecture to be like programming a task. Well, don’t imagine, it is programming a task. Except that we design it from scratch: we build our own virtual machine, write our own tools. It is high time we had a shareable toolset, at the very least.

I will end by pointing out that the images I have added above were generated from a log file. This file could have been dumped by a COBOL program, if somebody wanted to write a FARGitecture in that wonderful language. We don’t need to agree on a programming language to share tools.

An extensible FARG implementation

February 28, 2008

I have been meaning to write this post for a while, now. Over the last few years, the idea and urgent discussion regarding a hypothetical library that would allow a new FARG project to be achieved in a few keystrokes has surfaced, repeatedly, and has not led anywhere thus far. The latest outcropping was last week, and this post is an attempt to delineate what I believe needs to be done.

These days, I am working on my dissertation. I thought about what I want to say here, and what part of this I can use in the dissertation. I think I can use some parts of this post there.

The Current Problem: Starting from Scratch

Several FARG projects have been created thus far, and are listed in the following table. (FARG Historians and Lore Masters! Your help in filling and correcting this table will be much appreciated.) Programs that have not been completed are marked with an asterisk.

Program

Author

Language

Year

Domain

Jumbo

Douglas Hofstadter

Scheme?

1983?

Anagrams

Seek-Whence

Marsha Meredith

?

1986

Integer pattern sequences

Numbo

Daniel Defays

?

1986

 

Copycat

Melanie Mitchell

Franz Lisp

1990

Letter analogy puzzles

Tabletop

Bob French

?

1992

 

Letter Spirit

Gary McGraw

Scheme?

1995

Grid fonts

Metacat

Jim Marshall

Chez Scheme

1999

Letter analogy puzzles

Letter Spirit II

John Rehling

Scheme?

2001

Grid fonts

Phaeaco

Harry Foundalis

C++

2006

Bongard Problems

*Seqsee

Abhijit Mahabal

Perl

2008+

Integer pattern sequences

*Musicat

Eric Nichols

C#

2008+

 

*?

Francisco Lara-Dammer

Java

2008+

Geometry

Capyblanca

Alexander Linhares

Delphi?

2007?

Chess

 

Column 3 (“language”) is very diverse. For implementing a complex program, using a powerful language that the lone programmer knows inside out makes sense. Especially if there is no code to reuse, anyway, as has been the case so far. Other reasons that influence language choice (such as availability of people to hire who are skilled in that language) are simply non-existent in a PhD. Most PhD projects are prototypes, proofs of concepts.

Each project has consequently started from scratch (except for the obvious exceptions of Metacat and Letter Spirit II, both of which extended prior projects). Each project takes years to complete, and it is natural to wonder how much effort, if any, could have been saved if it had been possible to reuse some of the code. Michael Roberts, Alexander Linhares, Harry Foundalis, Eric Nichols, and I have again been discussing these issues recently.

My next post will examine (my guess about) what kinds of things may be shared between projects. A subset of what can be shared is the core of FARG architectures, the crux without which the program ceases to be a FARG implementation. I will try to spell out what I think is in the core.

[Edited: In his comment, A. Joseph Hagar pointed out several more projects which were not done here at the CRCC. I have listed these in the table below.]

Program

Author

Language

Year

Domain

Fluid Analogies Engine

Scott Bolland

Java

2005

Potentially General

*
B-Cat

Payel Ghosh, Ralf Juengling, Lanfranco Muzi, Mick Thomure

?

2007+

Bongard Problems

Starcat

Joseph Lewis et al.

?

2004

Potentially General