Changes from starlib1 to starlib2, that require changes in the way it is used.

PLEASE NOTE: PLEASE CONSULT THE PAGE ABOUT THE NEW CACHE IN ADDITION TO THIS PAGE. The major change with this revision is that there is now a cache used for all looped DataValueNodes, but due to the fact that C++ has no garbage collection, the caller of this library must tell the library when it is done using the DataValueNode pointers that may have been cached.

Substantial changes were made in this release of the star library. Because of these changes, some minor alterations to the user code that calls this library will be needed. We tried avoiding this where we could, but in some cases it was inevitable that the changes to the library would end up exposing changes to the users of the library. We have our own code that relies on this library, so rest assured that we are aware of the hassle that a change to the interface creates. We would not have done these changes if we did not deem them necessary.


ofstream *os has become ostream *os

The change:

The external 'os' object was previously defined as being of type ofstream * . It is now of type ostream * . Thus, for example, code needs to be changed thusly:
    // old code was:
    //   ofstream *os; // in the global variable section.
    //   ...
    //   os = new ofstream(filedes(stdout));
    //   ...
    // NEW CODE:
    ostream *os;  // in the global variable section.
    ...
    os = &cout;
    
Also, in the past the os pointer was deleted at the end of most programs. In the case where os is now being initialized by pointing it at cout (as in the example above), it should never be deleted. Instead it should merely be flushed with os->flush(). Deleting (and thus closing) the standard output yields unpredictable results.

Justification for the change

Recent changes to the compiler exposed that ofstream is meant to be used only for files that are physically present in some type of filesystem, with a filename. It is not to be used for things like pipes, or standard output. Because the user of this library is supposed to be able to use the os pointer to point to any type of file object, including pipes or standard output, this had to change. The operation to create an ofstream object out of a file descriptor has been removed from the C++ language standard.

iterators are no longer interchangable with pointers

The change:

In the past one was able to use pointers to objects and iterators on those objects interchagably. Thus the following code would work:
    LoopTableNode *table = ... ;
    ...
    table->erase( &((*table)[N]) );
    
This no longer compiles, because it depended upon the ability to pass a pointer to the object at (*table)[4] to a routine that wanted an iterator on table. Behind the scenes this was because the table object was implemented as an array where a pointer is the only thing one needs to know to find out where one is in the array. You can replace any array subscript notion code as in the following example:
    LoopTableNode *table = ... ;
    ...
    table->erase( table->begin() + N );  //where N = the index into the table
    
This uses iterator arithmetic to achieve the same effect as an array subscript, but does it with iterators instead of pointers.

Justification for the change

There is no guarantee that the implementation behind the scenes will remain as an array, and so there is no guarantee that passing a pointer to some item in that array will actually continue to work as a substitute for an iterator. The STL data structure on which this library is based, the vector, recently made this change and we had no choice but to expose the change upward to users of the library. The only algorithm that would work to re-implement the ability to use pointers as iterators would have changed the performance from constant time to linear time, and this is not the sort of change that should be made silently without the users of the library knowing.


In those cases where it is not easy to switch to an array notation, there is a workaround, explained below, that may be used to still operate by a pointer, but be warned that it operates in linear (N) time, where N = number of nodes in the list of things being deleted. (For example to delete a DataNameNode from a loop with 5 columns, N = 5, but to delete a row from a loop of 5000 rows, N = 5000).

Essentially, this technique ends up scanning the list of nodes sequentially to find the right iterator that goes with the pointer passed in.

This workaround is as follows:

	// (someNode is either a LoopTableNode, or a LoopRowNode, or a
	// LoopNameListNode, or a DataLoopNameListNode, and ptr is a 
	// pointer to the type of data node stored inside someNode.)

	// If it used to say this:
	someNode->erase( &ptr  );
	// Then change it to this:
	someNode->erase( someNode->iteratorFor(ptr) );

	// If it used to say this:
	someNode->insert( &ptr , newPtr );
	// Then change it to this:
	someNode->insert( someNode->iteratorFor( ptr ), newPtr );
    
Again, it cannot be emphasised enough that this is a slow method not to be used repeatedly on large lists of things (like certain loops). This could have been implemented silently by making a copy constructor that would transform pointer-to into an iterator but I chose to deliberately not do that. People using the new starlib2 should be alerted to this change and make the decision themselves as to whether to use this slower method or to rewrite the code to not use pointers.)

Using the ValCache

The change:

Data Value Node are now in a cache, that should periodically be cleared.

Justification for the change

The amount of memory wasted with keeping a DataValueNode alive in memory for each value in a loop was immense. This change allows the DataValueNodes inside loops to be stored in a more compact form internally, to be converted out into DataValueNodes only when they are actually being used by the caller. But to make this work in a language like C++ that has no garbage collection, the caller of the library must tell the library when he is no longer using any of the live DataValueNode pointers that have been passed to him. This change is detailed in a seperate page.

Ability to skip parsing

The change:

The ability now exists to let you insert a directive into the input file to specify a section or sections to be "skipped" during parsing. The section skipped is stored in memory in a simple string, without any parsing into nodes in the tree. When writing the file out using unparse(), the section will simply be dumped as-is out to the output file, at same point. In this way one could preserve the majority of the file intact without actually parsing it - while still parsing that portion of the file that contains the data of interest.

Justification for the change

Some files are rather large, and some software is only concerned with a very small portion of the file, such that if the entire file is parsed into the node tree, time and RAM is wasted filling the tree with data the program doesn't use.

How it works:

In the file, the following directives may be inserted:
#<START-SKIP>
#<END-SKIP>
    
These directives must be typed exactly as shown, and must appear at the very start of a line, much like a C preprocessor directive. The section of the file that is contained between the #<START-SKIP> and the #<END-SKIP> directives will be stored as an inaccessable string to be dumped out later when Unparse() is called. There exist no methods for getting access to these strings. They are hidden from the application program that is using starlib2.

To be legal, the remainder of the file after the skipped section has been "removed" must still form valid star syntax. For example, the following would be illegal:

    data_ex
    save_example1
        _tag1  value1
	loop_
	    _looptag1
	    _looptag2

	    lval1  lval2
#<START-SKIP>
	    lval3  lval4
	    lval5  lval6
	stop_

    save_
#<END-SKIP>
    
The above example is illegal because it cuts out the trailing save_ from the saveframe called save_example1, leaving behind invalid STAR syntax.

This, however, would be perfectly legal, and is how this feature is intended to be used - imagine that you had 5 saveframes in the file and you were only interested in the one called "save_data4":

    data_ex
#<START-SKIP>
    save_data1
        ...assume there is a lot of data here...
    save_
    save_data2
        ...assume there is a lot of data here...
    save_
    save_data3
        ...assume there is a lot of data here...
    save_
#<END-SKIP>
    save_data4
        ...assume there is data here that the program is interested in...
    save_
#<START-SKIP>
    save_data5
        ...assume there is a lot of data here...
    save_
#<END-SKIP>
    

simple string get/set interface for loops

The change:

The ability now exists to let you get at the string values of a loop from the level of the loop (instead of getting the loop row, and then getting the value from the row). The methods that have been added to make this work are: The documenation in the doc/html directory contains further information on the use of these methods, although their use should be fairly intuitive. The example program "simple_string.cc" in the examples directory will probably provide a good enough idea of how the methods work. These methods completely avoid the value cache, and as such are faster than the older methods. When the extra complexity of LoopRowNodes and DataValueNodes is not needed, these methods will be faster.