Discussion QA/QC
Summary of recent QA/QC discussion from the Oostethys Email list.
Discussion on the email list raised three issues regarding QA/QC
- Do the SML (SOS, SWE,OM, ...) schemas provide elements for encoding QA/QC values?
- Should OOSTethys add these elements to it's schemas and templates and where?
- What values should be utilized as a best practice examples.
Tony Cook's answer to 1.
FOR GetObservation:The <swe:Quantity> element has an optional child element called <swe:quality>. Until a couple of weeks ago, this element had as its children <swe:Accuracy>, <swe:ConfidenceLevel>, <swe:Precision>, and <swe:Tolerance>.
In the most recent version of the SWE common schemas, the<swe:quality> element is still there, but it now uses soft-typed (generic) components to define the measurement quality. The <swe:quality> element takes a <swe:Quantity> as its child, which can be defined like any other quantity we have been using. I need to come up with some concrete examples of exactly how this will look. If we can define some rough guidelines about how the OOS community handles QA, I will try to encode that into <swe:quality> entries which can be added to our templates. We will also need to add some corresponding terms to our CV documentation.
The decision was made to go with more generic <Quantity> defs for <quality> because it was felt that using hard-typed parameters (<Accuracy><Precision>...) would not be sufficiently broad for all potential applications.
FOR DescribeSensor, one can define accuracy or error in the <Detector> element. The <Detector> is a description of the physical component taking the measurements. The error can be a curve, with one axis being a measured property of the detector (i.e. temperature). I will try to formulate and post and example of how this looks. You can also see an example of a WeatherStation which uses the <Detector> and <error> elements here: http://vast.uah.edu/SensorML/instances/sensors/SimpleWeatherStation.xml
I think this is an older example, and not updated for the latest Schemas.
OOSTethys Should Deal with QA/QC
At the very least OOSTethys should add QA/QC elements to it's GetObservation response templates and schemas using the <swe:quality> and <Qantity> elements describe by Tony above.We should also begin discussion investigating the work already started by the oceanographic community concerning the actual vaules to use. In particular the work done by QARTOD and the NDBC and any examples that have been implemented.
Examples
- QARTOD (Quality Assurance of Real-Time Oceanographic Data) http://nautilus.baruch.sc.edu/twiki/bin/view
GoMOOS Example
GoMOOS has attended all theQARTOD Workshops held thus far starting from the first in the winter of 2003. As a result the GoMOOS data management team has begun to implement QA/QC standards based on QARTOD recommendations. While not definitiive we present here the values and standards that GoMOOS have implemented in all our observation databases, including the SOS database implemented for displaying SOS platforms at http://www.openioos.org/testbed/sos.
Each observation database has three master tables dealing with QA/QC
Quality Values
This table describes the data quality values used in the main data table.The values are simple, and intended to indentify data acceptable fo release.The actual numbers and their meanings are selected to be the same as used inthe GoMOOS NetCDF files. Note that the last entry is not in the NetCDF files. It could correspond to NaN where we might want to make our time series non-sparse.
- 0 NONE Quality unknown or not yet determined
- 1 BAD Data is bad
- 2 SUSPECT Quality might be bad: needs review before release
- 3 GOOD Data is acceptable for distribution
- 4 BEST Data has passed rigorous checks
- 8 REPLACED With data of a higher quality control level
- -9 MISSING The data has been reported as missing
The entries are used to indicate the primary REASON behind the actual quality value. The entries may be added to so that we can use the same ones as found in the NetCDF files. However, the table will begin life filled with some generic entries.
- 1 'OK' 'valid' 'Data is acceptable'
- 2 'BAD' 'invalid' 'Data has been manually marked as bad for some reason'
- 3 'INV' 'data marked invalid by source' 'Data has been marked as invalid by the provider'4 'SUSPECT' 'data presumed invalid' 'This data has been marked as suspect and is being investigated'5 'CORRUPT' 'corrupt transmission or file' 'The data is presumed bad because of corruption in transmission or storage.'
- 6 'UNK' 'unknown problem' 'data is bad due to a programming error'
- 7 'NULL' 'null data' 'Data missing or null'8 'NAN' 'data matches a Not A Number value' 'Data matches a Not-A-Number value from the provider (or error code from an instrument)'
- 9 'REACQ' 'reacquire' 'Data needs to be reacquired from source'
- 10 'FAULTY' 'instrument failure' 'Instrument failure has been indicated for this data'
- 11 'OOER' 'out of engineering range' 'Data exceeds range specified for instrument'12 'OODTR' 'out of range for datatype' 'Data is out of valid range for this datatype; for example 105 C seawater'13 'OOCR' 'out of calculation range' 'Data exceeds range that its calculation should allow'
- 14 'CLIM' 'climatology' 'Data outside of reasonable range based on climatology'
Quality Control
This table represents the level of processing and quality control to which the data has been subjected. This is useful when the same data point is re-entered in the database; the better quality data should be preferred. See the data addition procedures on how to set policies about how one piece of data replaces another.
- 0 NONE No quality control done
- 1 PROCESSED Automated quality control done and initial value
- 2 REVIEWED Manually reviewed suspect data marked as good or bad.
- 3 POST-PROCESSED Additional review performed after instrument recovery.
- 4 FINAL Post-processed post-calibrated complete review.


Q/A Q/C as data provider's attributes
To echo what I said on the call, some of the data providers (Janet Fredericks with Martha's Vineyard Observing System for example) want to include these QA/QC flags with their data as sort of inseparable disclaimer to show that people have not checked the data and are not warranted to be accurate and perhaps limit their exposure to liability.