A few years ago I joined a data discovery project at work that is developing software tools that make it easier for analysts to examine scientific datasets and extract higher-level features about what happened during a simulation. These analysis operations are especially important in parameter studies where we run a number of simulations with different input parameters, and want to see how a particular resulting effect varies with each parameter. For example, in finite element codes, we might ram one object into another at different angles in order to get an idea of how well each object could survive the impact. While the simulations produce mesh result datasets that can be manually inspected with visualization tools, we want command line tools that quantify specific effects. How big of a rip occurred? How many bolts have been strained beyond a specific tolerance? How far in did a dent venture?
Our project has developed a software package called the Feature Characterization Library (or FCLib) to make it easier for analysts to examine a mesh dataset and quantify custom features. FCLib is written in C and hides a number of painful data manipulation details from the user through simple abstractions. A dataset is comprised of meshes, a mesh is comprised of elements, and an element is comprised of points. Multidimensional data values can be associated with each item in this hierarchy, and commands are provided for retrieving neighbors, parents, and children. With these mechanisms, users can easily walk through a dataset and implement their own algorithms for quantifying a higher-order feature. For reference, we provide several feature characterization point tool examples that were developed to help our analysts in real problems.
One of the analysis tasks I worked on in the FCLib project was to find a way to make it easier to compare structure data in different simulations. This task is challenging because our structures are incredibly complex, and we generally want to know higher-level properties (how far did the wing bend from the fuselage) instead of local properties (how far did this point move relative to its neighbor). What I realized was that we needed a way to grossly simplify the structures into representations that would be easier to analyze.
My approach was to simplifying the structure was to try to extract a skeleton for the object. The algorithm found a center for the item and them radiated shortest paths through elements to reach each element in the sturcture. I then recursively merged the paths less traveled into their neighbors until I was left with a reduced structural representation. It was computationally expensive, but I found I a good bit of the work could be done in parallel using OpenMP constructs.
Most of the rest of my work in FCLib involved a lot of behind-the-scenes software engineering. I put a lot of time into adding unit and component tests for the library, doing code profiling for testing coverage, and scripting together tools that made it easier to download and build all our necessary dependencies. The lab was undergoing a big software engineering witch hunt, so we received a great deal of praise for proactively using quality control procedures that others were skipping.
End of FCLib
In the end, my project lead got sick of dealing with the politics of the funding source and decided to move on to other work. While I had to do a lot of nasty grunt work in FCLib, I did get a lot out of the project. It was valuable for me to go out and talk to actual scientific computing users to find out what they were doing and see just how bad the tools that they were working with were. I picked up a lot of good software engineering practices, and realized that for every developer hour you spent putting in new features, you probably needed an hour or two more developing good QA code. Finally, this project got me thinking about a lot of the difficulties in analyzing large datasets. FCLib and most Viz codes are written in an in-core memory style, where data is parsed out of data files and stored in easy-to-process structures in system memory. The more I worked with analysts, the more I realized that this practice just doesn't work for scientific datasets, and that what we need is better out-of-core processing techniques and frameworks.
SAND Report Wendy Doyle, Ann Gentile, Philip Kegelmeyer, and Craig Ulmer "FCLib: The Feature Characterization Library". Sandia Report SAND2008-7687.
FCLib-1.7.0.tar.gz The 1.7 release of FCLib
FCLib-1.6.1.tar.gz The 1.6.1 release of FCLib