This document covers changes from the old starlib (starlib1) to the new starlib (starlib2). The changes were cheifly to allow starlib to now handle much larger files than it could previously.
These changes WILL require some small alterations to code that uses the library. It is essential that any programmer using starlib2 read this page first before making the change from starlib1 to starlib2. These changes affect the C++ programs using starlib, not the Java programs using starlibj.
ValCache::flushValCache()
to clear out the cache
of any DataValueNodes there may be.flushValCache()
in code that is executed after the call to
flushValCache()
.myParallelCopy()
method on the level of a DataValueNode. It might still work, but
not reliably so. This is because DataValueNode objects no longer have
persistance within the run of the program and so the parallel
references don't have anywhere permanent to point to at this level.Let's explain those in detail below:
void ValCache::flushValCache(void)
. Whenever your
program is finished using any DataValueNode pointers it may be
holding on to, it should call this method. It is safe to call the
method as many times as you wish, so long as you never use
DataValueNode pointers you created before the flush once the flush
has occurred. Here is an example of okay usage, and bad usage:
GOOD |
---|
LoopRowNode *lrn; int idx, numVals; DataValueNode *dvn; ...For this example, assume lrn is already set to something valid... numVals = lrn->size(); for( idx = 0 ; idx < numVals ; idx++ ) { dvn = (*lrn)[idx]; cout << "val " << idx << " is " << dvn->myValue(); ...perhaps do other things with dvn as well... } ValCache::flushValCache(); |
ALSO WORKS, BUT MAYBE FLUSHING TOO OFTEN |
LoopRowNode *lrn; int idx, numVals; DataValueNode *dvn; ...For this example, assume lrn is already set to something valid... numVals = lrn->size(); for( idx = 0 ; idx < numVals ; idx++ ) { dvn = (*lrn)[idx]; cout << "val " << idx << " is " << dvn->myValue(); ...perhaps do other things with dvn as well... ValCache::flushValCache(); } |
BAD, WILL CAUSE CRASHES |
LoopRowNode *lrn; int idx, numVals; DataValueNode *dvn; ...For this example, assume lrn is already set to something valid... numVals = lrn->size(); for( idx = 0 ; idx < numVals ; idx++ ) { dvn = (*lrn)[idx]; ValCache::flushValCache(); cout << "val " << idx << " is " << dvn->myValue(); ...perhaps do other things with dvn as well... } |
OKAY |
LoopRowNode *lrn; int idx, numVals; DataValueNode *dvn; ...For this example, assume lrn is already set to something valid... numVals = lrn->size(); for( idx = 0 ; idx < numVals ; idx++ ) { dvn = (*lrn)[idx]; ...perhaps do other things with dvn as well... ValCache::flushValCache(); cout << "val " << idx << " is " << (*lrn)[idx]->myValue(); // NOTE THE DIFFERENCE BETWEEN THIS AND THE // BAD EXAMPLE ABOVE. Here, the dvn pointer // is not being used after the flush. Instead, // a new pointer is being generated with the // syntax: (*lrn)[idx] } |
In general, memory size usage vs CPU usage is a tradeoff. If you flush too often, then you are wasting too much CPU time, but if you flush not often enough, then your program might end up using entirely too much memory on larger files. As a general rule of thumb, it's a good idea to flush at the bottom of major code loops, especially if you know the code loop will be iterating over a column in a very large STAR loop.
There also exists a flushValCache()
at the LoopTableNode
level. You may use this to flush just the values in that one loop,
rather than all the ones that exist in the entire program.