Personal tools
You are here: Home Overview Best Practices Best Practices-Get Latest Data (v2)
Document Actions

Best Practices-Get Latest Data (v2)

by John Graybeal last modified 2008-04-26 22:49

Recommendations and discussions of best practices for creating a getLatest capability, and similar methods allowing queries for data by time window.

This is version 2 of this document. An archival version is at best-practices-getobslatesttime-v1.

This document contains both the discussions, and the agreed recommendations, for best practices when trying to get data by time constraint using SWE standards. It started as an OOSTethys discussion and email thread (posted below for reference). After discussion at the 2008.04.11 OOSTethys telecon, and more recent exchanges in the SWE WG mail list, our best understanding is now summarized here.

This topic will continue to be discussed in the OGC SWE Working Group; you'll need to join OGC and then that email list to keep up. Any conclusions from that group will be made available on this page.

Although the OOSTethys team initially identified these issues on OOSTethys use cases, they should in most cases apply beyond OOSTethys, to many or even most applications using the technology.

Each practice is marked with '[proposed]' until it is finalized and agreed by the team. Then the marker is removed.

Best Practice Topic List

Use Case 1: Get Latest Value Before [Now | Time T]

Background

How do we get the value with the latest time in SOS/SWE specifications? Or with the latest time in an interval, or the latest before a given time T?

A trip through the probably relevant schemas [1]:

  • sos/1.0.0/sosGetObservation.xsd
        (GetObservation/eventTime)
  • gml/3.1.1/base/temporal.xsd
       (RelatedTimeType/relativePosition=[Before|After|Begins|Ends|During|Equals|Contains|Overlaps|Meets|OverlappedBy|MetBy|BegunBy|EndedBy])
       (TimeInstant, TimePeriod (with begin, end, and a duration), TimeInterval, TimePosition)
       (TimeIndeterminateValueType=[after|before|now|unknown])
  • sos/1.0.0/ogc4sos.xsd
       (TemporalOpsType=[TM_Before|TM_After|...|TM_EndedBy])
    {note typo in line 29, 'TM_Overalps' spelling}
does not reveal a best or proposed practice.

Based on previous constructs (see for example [2] and [3] ), initial options include:

  1. a time before an instant (note that finding a value that is during a time instant is an unlikely proposition, and so TM_During isn't the best choice).
     <sos:eventTime>
        <ogc:TM_Before>
          <gml:TimeInstant>
            <gml:timePosition>2007-02-04T12:24:00</gml:timePosition>
          </gml:TimeInstant>
        </ogc:TM_Before>
      </sos:eventTime>
    
  2. a time in a period
       <sos:eventTime>
        <ogc:TM_During>
          <gml:TimePeriod>
            <gml:beginPosition>2007-02-04T12:24:00</gml:beginPosition>
            <gml:endPosition>2007-02-04T15:24:00</gml:endPosition>
          </gml:TimePeriod>
        </ogc:TM_During>
    
    and
  3. a time before now.
     <sos:eventTime>
        <ogc:TM_Before>
          <gml:TimeInstant>
            <gml:timePosition indeterminatePosition="now"/>
          </gml:TimeInstant>
        </ogc:TM_Before>
      </sos:eventTime>
    

{Note these have not been validated as appropriate constructions using the current schema.} Other formulations are possible. For example, a TM_During with a beginPosition in the TimePeriod, but no endPosition, clearly requests all data after the beginPosition time, as should a TM_After with a TimeInstant.

Presumably these can be read as follows in the context of a getObservation request: "Get me observations that have an event time that matches the (ogc:) temporal operator evaluation on the (gml:) time object specification." (For a more detailed analysis, see note on Filter Encoding IMplementation Standard, or FES[4].)

But in any case, the fundamental question of "how do you get the latest value" appears to be unanswered. We looked at a few potential answers to this puzzle that would have made it unambiguous:

  • If an enhanced SOS operation like getResult had some capability to limit and order the number of responses. It does not, being designed for bandwidth limitations but using the same time concepts.
  • If getObservation by default was intended to return only 1 observation that fits the criteria, and that it will be the most recent observation. This is nowhere suggested.
  • If the timePosition instances are meant to refer not to the time of the event, but the time for which the data is valid, then 'now' would get the most recent data, and any timeInstant would get the data valid at that instant (last data before or at the time period). This would be confusing for time objects of type TimePeriod, however ('give me all the data that is valid during this TimePeriod' is not the interpretation most expect).
  • If a mechanism existed to specify the number and order of observations ('I want 1 observation sorted by time descending'). Neither can be specified with the current standard.

Discussion in the SWE Working Group mail list confirms that no mechanism to do this exists. No one expressed satisfaction with the mechanisms previously proposed, which in fact seemed to conflict with interpretations in [4].

It became clear via the discussion that the desired results are actually not about time constraints, but about ordering and limiting the result set. The desired getLatest capabilities can be achieved with a specification that describes the ordering of results, and how many to return (e.g., 'descending time', and '1'). To get the nearest time would require introducing the concept of 'nearest', or an absolute distance from the specified time period or instant. (Similar capabilities would be needed to find the spatially nearest sample along a given dimension.)

The topic will be discussed further in the SWE WG, hopefully leading to resolution in the next version of the standard.

Proposals

Since there is nothing in the standard, the OOSTethys team must come up with an agreed practice. Two concepts were originally discussed (if no time is specified, either return n values, or return the most recent value), with the most support for returning the most recent value. In the telecon of 2008.04.25, participants reiterated the importance of being able to exclude data older than a user-selected time -- but that involves specifying a time.

So we have these options for getLatest (all of them except the first are clearly temporary workarounds, to be abandoned when the specification is fixed):

  1. Client finds the most recent result from the results that are returned.
  2. Specifying no time means you get the most recent observation.
  3. Choose an arbitrary, otherwise unusable combination of OGC and GML temporal terms to mean getLatest. (My candidate: ogc:TM_Equals + gml:TimeInstant(gml:timePosition indeterminatePosition="now").
  4. Go with the previous proposals.
  5. Make up our own variant of the specification by adding a few temporal terms (TM_Latest).
  6. Make up our own variant of the specification by figuring out the right way to add what we want to the spec.

The associated problems:

  1. Requires additional programming on every client.
  2. Contradicts expected behavior of queries; precludes excluding old data.
  3. Not likely to be a community practice; precludes excluding old data.
  4. Are likely at odds with the actual interpretation of those parameters
  5. Won't validate against spec; it is bad to mingle sorting, quantity terms with temporal terms.
  6. Won't validate against spec; takes too much time and energy for what we have.

I don't know what's best. If I had a bit more time right now, I'd pick #6 and take it on myself. But I can't work on it heavily at this moment, and it would take weeks, I imagine.

Knowing that anything other than #1 is a hack, my best suggestion is #2 -- when the capability is available in SWE, we can change the XML and still get back the same result. But let's vote on this via email.

Additional Recommendations to OGC

These recommendations reflect our observations about needed but unavailable constructs. If there are constructs to meet these needs explicitly, we would be happy to use them.

The first two recommendations may be provided by a single mechanism. Note that the first recommendation would not apply to geospatial relations, whereas the second recommendation could be made generic to any single axis (but not horizontal distance, which includes 2 axes).

Recommendation: The capability to select LatestTime, EarliestTime, and NearestTime should be provided, either by explicitly document existing terms that provide those functions, or adding these functions in some way.

Recommendation: The ability to specify a mechanism to select which data is returned by sorting on the time axis should be added to getObservation. Sorts should be either unidirectional (could support LatestTime, EarliestTime) or in absolute terms (could support NearestTime).

Recommendation: The ability to specify number of observations to return should be added to getObservation.

Recommendation: The functionality of the different OGC temporal operators, the GML time object specifications, and their interactions, should be defined explicitly within the standard (e.g., where those terms are specified), or a reference provided to an explanatory specification.

Use Case 2: Getting the Nearest Time

To be discussed. This is done in WMS with 'current' + 'latest', per email discussion. Recommendation above suggests a NearestTime capability, and the ability to sort in absolute terms.

Use Case 3: Getting All Data Since Last Call

To be discussed.

Additional Considerations

Event Time vs Message Time

Best Practice: Use the best known time of the (measured or calculated) observation when comparing data timestamps to the query. Do not use the time the message arrived.

Use of indeterminatePosition="now"

We agree that 'now' is a variable for time that is replaced at the server with the current time known to the server. The effect should be (depending on the other parameters) to get data that has arrived before now. Actual implementations may produce different effects, depending on lag times and so on.

Best Practice [proposed]: If 'now' is specified, the server should be capable of returning the appropriate very recent data, even if some timestamps are slightly advanced. (But see Future Times below.)

Future Times

Best Practice [proposed]: When dealing with observational data, if a datum has errantly been timestamped with a future date/time, do not provide said datum when asked for the latest data, unless something else in the query makes clear the query is requesting future data.

Appendix 1: Correspondence

Correspondence re use of time with SOS.

Date:	Mon, 10 Mar 2008 15:59:26 -0400
From: Luis Bermudez <bermudez@SURA.ORG>

Hi Tony,

On the SOS / WFS matrix we are saying that SOS supports to query for getLatest. How is this supported ? I know paging and get number of records is not part of SOS 1.0.

Thanks,

- Luis

Date:	Mon, 10 Mar 2008 16:54:52 -0400
From: Eric Bridger <eric@GOMOOS.ORG>

I can't speak to the SOS spec on this, but from the start we've been using GetObservation without any time parameter input. In the GetCapabilities templates the time parameter is listed as optional. In the description of the AVAILABLE_OFFERING_TIME we have an indeterminatePostion attribute with a value of "now".

Seems to me like that would cover it, but perhaps there is a more explicit way to express it.

Eric

Date:	Mon, 10 Mar 2008 15:05:04 -0700
From: Bill Howe <howeb@STCCMOP.ORG>

Note that eventTime=now is a different query than getLatest

Bill

Date:	Tue, 11 Mar 2008 12:41:33 -0500
From: Tony Cook <tcook@NSSTC.UAH.EDU>

SOS does not define the time schema/syntax, but instead references time schema from GML. The "now" value comes from gml/3.1.1./base/temporal.xsd schema. This schema is available at http://schemas.opengis.net/

We have been using 'now' to query for the latest measurement available. So for an XML Post request, this will look something like this::

<sos:eventTime>
<ogc:T_During>
<gml:TimeInstant>
<gml:timePosition indeterminatePosition="now"/> </gml:TimeInstant>
</ogc:T_During>
</sos:eventTime>

For KVP/Get request, we have been using '&time=now' as part of the URL query string. That will change to '&eventTime=now' for 1.0. Keep in mind, though, that SOS 1.0 does not define KVP requests, so this is somewhat of a kludge.

I do not think that there is an explicit distinction between 'now' and 'latest' in the GML time schemas.

Tony

Bill Howe wrote: >Note that eventTime=now is a different query than getLatest

>Bill

Date:	Tue, 11 Mar 2008 14:53:34 -0400
From: Eric Bridger <eric@GOMOOS.ORG>

That's too bad. I don't think that it is too much of a stretch to assume now is equivalent to latest. 'now' is fleeting, even at the speed of light and IMHO must always approximate to ' latest'. At the same time on the SOS map at openioos.org (all requests are w/o a time parameter) I'm getting some results from OceanWatch from 1998, not quite 'now' but probably the 'latest'.

Eric

On Mar 11, 2008, at 1:41 PM, Tony Cook wrote: >I do not think that there is an explicit distinction between 'now' and 'latest' in the GML time schemas.

Date:	Tue, 11 Mar 2008 12:14:23 -0700
From: Bill Howe <howeb@STCCMOP.ORG>

>I do not think that there is an explicit distinction between 'now' and 'latest' in the GML time schemas.

From the ISO 19136 standard (GML 2007): ""now" indicates that the specified value shall be replaced with the current temporal position whenever the value is accessed."

GML doesn't understand "latest." GML just provides a syntax for addressing positions in time. The term "latest" only has meaning relative to some sequence of events. SOS models such sequences, but GML doesn't.

Also, appealing to intuition: If one asks for "now", one presumably doesn't want to get a sensor measurement from 6 months ago just because that's the latest value.

Put another way, if "now" means "latest", how do I ask for "now"?

I suggest allowing the tiem portion of the query to be omitted. When omitted, the semantics is to call getLatest.

Bill

Date: Tue, 11 Mar 2008 14:05:28 -0700
From: John Graybeal <graybeal@mbari.org> Subject: Re: getLatest SOS

I was going to agree that ISO 19136 defines 'now' to mean latest, per your descriptions. (Some aspects of standards do not deserve respect.)

But then I took a closer look. The definition does not say what is done with the term 'now', all it says is 'replace with the current time'. So having done that, how is that time value used in the interface? (This is repeating my earlier question, but putting it in this context.)

Responding to your totally reasonable intuition with my best effort design skill (;->), I am confident that your version of 'now' has a different range of acceptable age than my version of 'now'. So the only way to ask for a 'now' that isn't a latest is in fact to ask for all the data in the range from 'oldest acceptable age' to 'now', then take the most recent one.

It would be nice if the default ordering of data was sequential, and reducing the number of values requested always eliminated older values. That is the way it *should* have been specified. Whatever the 'it' is that I'm talking about....

John

Date:	Tue, 11 Mar 2008 16:32:12 -0400
From: Luis Bermudez <bermudez@SURA.ORG>

I think my concern was answered. It is not currently possible.

Within OOSTethys what is the amount of data we want to make available ? Maybe it is the minimum amount of data required by a numerical modeler to compare the output? Not sure. But I think we are talking about days. So when we say latest we are asking for the latest within x days, which is very different from the latest available which could be years ago.

So just asking for a time interval of "X days before now" and "now", will do the trick to get the latest and near now observations.

But, I think, if the time is not given, then ALL of the available time should be retrieved.

Maybe the way to get latest, is to limit the number of records being requested to 1 and ask them to be served in descendent time order. Not sure when this is going to be possible.

So.. should we unchecked the getLatest from SOS ?

-luis

Date:	Tue, 11 Mar 2008 13:46:05 -0700
From: Bill Howe <howeb@STCCMOP.ORG>

Luis Bermudez wrote: >I think my concern was answered. It is not currently possible.

>Within OOSTethys what is the amount of data we want to make available ? Maybe it is the minimum amount of data required by a numerical modeler to compare the output? Not sure. But I think we are talking about days. So when we say latest we are asking for the latest within x days, which is very different from the latest available which could be years ago.

>So just asking for a time interval of "X days before now" and "now", will do the trick to get the latest and near now observations.

>But, I think, if the time is not given, then ALL of the available time should be retrieved.

I'm -1 on the "get all" semantics -- that is 10+ years of data on our servers.

>Maybe the way to get latest, is to limit the number of records being requested to 1 and ask them to be served in descendent time order. Not sure when this is going to be possible.

That works! Not difficult at all to implement on the PySOS server.

>So.. should we unchecked the getLatest from SOS ?

Don't all the implementations support it currently?

Date: Tue, 11 Mar 2008 13:56:13 -0700
From: John Graybeal <graybeal@mbari.org> Subject: Re: getLatest SOS

So I ran into this when writing descriptions for each of the rows in the SOS/WFS list of features. When I got to getting data at time(T), I had no idea what that meant. Does it mean: a) If there is a point exactly at this particular T, return that point, OR b) Return the point that is closest to T, OR c) Return the latest point before T (same as (b), but only past points are OK), OR d) Return the latest point before T but within X time? The only one that makes reasonable sense is (c), but I have a feeling that the correct answer is (e) undefined.

Does the feature exist to request the data at time(T), and if so, how is it defined? (And if it doesn't exist, yes it shouldn't be in the matrix either.)

John

Date:	Tue, 11 Mar 2008 14:01:50 -0700
From: Matthew Arrott <marrott@UCSD.EDU>

Good thread =AD a subtle and important illustration of semantics. I do not believe you have it resolved.

From my perspective the definition of =B3now=B2 provide by Bill should be the baseline. I do not see that =B3now=B2 have and temporal extent so the provider should determine the =B3Time To Live=B2 quality of their sample and determine i= f =B3now=B2 is within TTL window of the last sample.

=B3Latest=B2 is an ambiguous term for any other return than the last sample. If this was a more explicit subscription interaction model the server and/or client would know the =B3Extent=B2 of the last request upon which to determine the =B3Extent=B2 of latest. By =B3extent=B2 this could mean time frame or sample range.=20

I do not agree with Luis=B9s assertion that =B3latest=B2 should return some serve= r only determined number of samples - unless that number is 1.

Only an observers opinion,

Matthew

Correction - I meant to say I agree with Luis.

getLatest without any specification of extent should return only 1 sample, which is the most resent.

Date:	Tue, 11 Mar 2008 17:15:57 -0500
From: Gerry Creager <gerry.creager@TAMU.EDU> Subject: Re: getLatest SOS

Matthew Arrott wrote: >Good thread – a subtle and important illustration of semantics. I do not believe you have it resolved.

Concur.

>From my perspective the definition of “now” provide by Bill should be the baseline. I do not see that “now” have and temporal extent so the provider should determine the “Time To Live” quality of their sample and determine if “now” is within TTL window of the last sample.

This point is well-made. However, I think the requestor should make the decision on the window, as they're the ones responsible for use of the data in their application. IF they want "The last measurement between 'right now time' and the last 10 years" then that should be in the request. Conversely, if we allow the data provider to make this determination, a requester with a different set of criteria than something imagined by the provider could find themselves not getting data perfectly valid for their purposes.

>“Latest” is an ambiguous term for any other return than the last sample. If this was a more explicit subscription interaction model the server and/or client would know the “Extent” of the last request upon which to determine the “Extent” of latest. By “extent” this could mean time frame or sample range.

A community definition of "latest" would have to be agreed upon, else we have to make that definition explicit in some form of temporal window in the request.

>I do not agree with Luis’s assertion that “latest” should return some server only determined number of samples - unless that number is 1.

Noting your later statement of agreement... "getLatest" for some value of "latest" should return one value assuming a point source, or a set of latest values from observations described by a bounded area, but not "everything" Bill's right: "everything" is too inclusive.

gerry

>On 3/11/08 1:32 PM, "Luis Bermudez" <bermudez@SURA.ORG> wrote:

>I think my concern was answered. It is not currently possible.

Date:	Tue, 11 Mar 2008 15:43:00 -0700
From: John Graybeal <graybeal@mbari.org>

My own take:

getLatest() implies to me the most recent observation that fits all the criteria. (I agree with Matt and Luis.) A specific number of samples other than 1 is unanticipatable and therefore not computationally useful.

The implication of adding a single 'time' value is that you want the latest *before* that time. Otherwise it would be 'getNearest()'.

getLatest() && nSamples(10) does not seem ambiguous to me -- it is the last 10 observations that fit all the other criteria. It has nothing to do with any previous query. (Said another way, 'getLatest()' has no associated session history.)

'now' is a substitution command that is replaced by the current date and time when the expression is evaluated. It is therefore redundant with getLatest() without a time, because that will also define the relevant time as the current date and time when the expression is evaluated.

The instance of adding a time range is that you want the latest observation *before* the end time, that is still later than the start time. If there is none in that range, you get an empty set back.

If a time range AND an nSamples() are specified, all criteria are used as limits, so a data item must be within both the time range and within the last nSamples observations. (You'll never get more than nSamples and you'll never get outside of the time range.)

An observation could be a point, vector, array, or other grouping of data. Unfortunately I do not know the implicit semantics of the request -- e.g., whether it will be obvious what type of obsevation is being requested, and what type of observation is being returned -- so I can't comment usefully on the appropriate defaults.

An important subtlety, so far not addressed, is whether by 'latest' we mean 'the last to arrive', or 'the last to be measured'. Since things can and will arrive out of order, this can be an important distinction. Based on the likelihood of greatest value (a metric I have just made up, and whose evaluation I just performed entirely without concrete data) I conclude 'latest' should mean the latest to be measured but not later than now (since that is a temporal impossibility, so far as we know). This definition is likely to be most useful, as it will not return older data than the most recent sample available, and will also not return obviously faulty data over and over again. Boy would *that* be irritating.

Is this sufficiently precise, consistent, comprehensive, and reasonable to consider the default set of assumptions? What is missing?

More to the point, how much of this is explicitly defined already in SOS, and how much if any is contradicted?

Whatever we end up with should be documented as Best Practices.

John

Date:	Tue, 11 Mar 2008 15:56:43 -0700
From: Matthew Arrott <marrott@UCSD.EDU>

>I thought the correct question was "what is up with Matt's email program?"

Plus 1 ... Ya what is up with Entourage sending through this list server ...

>My own take:

Plus 1 ... For all below ...

>getLatest() implies to me the most recent observation that fits all the criteria. (I agree with Matt and Luis.) A specific number of samples other= than 1 is unanticipatable and therefore not computationally useful.

>The implication of adding a single 'time' value is that you want the latest *before* that time. Otherwise it would be 'getNearest()'.

>getLatest() && nSamples(10) does not seem ambiguous to me -- it is the last 10 observations that fit all the other criteria. It has nothing to do with= any previous query. (Said another way, 'getLatest()' has no associated= session history.)

>'now' is a substitution command that is replaced by the current date and time when the expression is evaluated. It is therefore redundant with= getLatest() without a time, because that will also define the relevant time= as the current date and time when the expression is evaluated.

>The instance of adding a time range is that you want the latest observation *before* the end time, that is still later than the start time. If there is= none in that range, you get an empty set back.

>If a time range AND an nSamples() are specified, all criteria are used as limits, so a data item must be within both the time range and within the= last nSamples observations. (You'll never get more than nSamples and you'll= never get outside of the time range.)

>An observation could be a point, vector, array, or other grouping of data. Unfortunately I do not know the implicit semantics of the request -- e.g.,= whether it will be obvious what type of obsevation is being requested, and= what type of observation is being returned -- so I can't comment usefully= on the appropriate defaults.

>An important subtlety, so far not addressed, is whether by 'latest' we mean 'the last to arrive', or 'the last to be measured'. Since things can and=

Possible parsing

Latest to mean most recent in time

Last to mean most recent to arrive

>will arrive out of order, this can be an important distinction. Based on= the likelihood of greatest value (a metric I have just made up, and whose= evaluation I just performed entirely without concrete data) I conclude= 'latest' should mean the latest to be measured but not later than now= (since that is a temporal impossibility, so far as we know). This= definition is likely to be most useful, as it will not return older data= than the most recent sample available, and will also not return obviously= faulty data over and over again. Boy would *that* be irritating.

Date: Tue, 11 Mar 2008 16:25:59 -0700
From: John Graybel <marrott@ucsd.edu>

From: Matt Arrott

[snip...]

>Possible parsing

>Latest to mean most recent in time

>Last to mean most recent to arrive

Neat. ÊI'll buy it, but the semantically challenged probably can't .... Êmaybe getLastArriving(), though.

Also tricky that 'last' means either 'last in the list' or 'previous one before this one'.

john

Date:	Tue, 11 Mar 2008 16:35:46 -0700
From: Bill Howe <howeb@STCCMOP.ORG>

John Graybeal wrote: >If a time range AND an nSamples() are specified, all criteria are used as limits, so a data item must be within both the time range and within the last nSamples observations. (You'll never get more than nSamples and you'll never get outside of the time range.)

Consider a series of samples t0, t1, t2, t3, t4, t5, t6, t7, t8, t9.

Consider a query "time < t5 && nSamples(2)"

Does this query return {t3, t4}, or {}?

The semantics of the LIMIT / TOP K clauses in SQL from all DB vendors would dictate {t3, t4}. There is some literature available as to why they chose this meaning.

>An important subtlety, so far not addressed, is whether by 'latest' we mean 'the last to arrive', or 'the last to be measured'. Since things can and will arrive out of order, this can be an important distinction. Based on the likelihood of greatest value (a metric I have just made up, and whose evaluation I just performed entirely without concrete data) I conclude 'latest' should mean the latest to be measured but not later than now (since that is a temporal impossibility, so far as we know). This definition is likely to be most useful, as it will not return older data than the most recent sample available, and will also not return obviously faulty data over and over again. Boy would *that* be irritating.

The timestamp used to determine "latest" ought to be the same timestamp used in other time criteria. I agree that timestamp ought to be "measured" not "arrived".

>Is this sufficiently precise, consistent, comprehensive, and reasonable to consider the default set of assumptions? What is missing?

>More to the point, how much of this is explicitly defined already in SOS, and how much if any is contradicted?

>Whatever we end up with should be documented as Best Practices.

>John

Date: Tue, 11 Mar 2008 17:08:41 -0700
From: John Graybeal <graybeal@mbari.org> 

At 4:35 PM -0700 3/11/08, Bill Howe wrote:

Consider a series of samples t0, t1, t2, t3, t4, t5, t6, t7, t8, t9.

Consider a query "time < t5 && nSamples(2)"

Does this query return {t3, t4}, or {}?

The former, I would say. Making a number of obvious assumptions....

Time<=t5 && nSamples(2) would return {t4, t5}.

Right?

No idea, but I assume getLatest(Time) to be a <= relation.

Date:	Tue, 11 Mar 2008 18:39:49 -0500
From: Gerry Creager <gerry.creager@TAMU.EDU> Subject: Re: getLatest SOS

Aw, geez, and I thought I'd go home before it got dark!

John Graybeal wrote: >I thought the correct question was "what is up with Matt's email program?"

>My own take:

>getLatest() implies to me the most recent observation that fits all the criteria. (I agree with Matt and Luis.) A specific number of samples other than 1 is unanticipatable and therefore not computationally useful.

I think, with this explanation, that I do, too.

>The implication of adding a single 'time' value is that you want the latest *before* that time. Otherwise it would be 'getNearest()'.

An ambiguity was raised, possibly in my interpretation of what was written. getLatest() implies, to me, that it returns the *last* value before, or cotemporally with, the time of the request.

>getLatest() && nSamples(10) does not seem ambiguous to me -- it is the last 10 observations that fit all the other criteria. It has nothing to do with any previous query. (Said another way, 'getLatest()' has no associated session history.)

But unless there was a context switch I missed, Luis asked that nSamples() was unlimited. Your example is consistent and reasonable.

>'now' is a substitution command that is replaced by the current date and time when the expression is evaluated. It is therefore redundant with getLatest() without a time, because that will also define the relevant time as the current date and time when the expression is evaluated.

is 'now' substituted at evaluation on the server or surrogate? Even if it is, 'now' isn't 'latest regardless of how long ago that was'. Or is it? I do think that some definition of temporal window should be explicit in here.

>The instance of adding a time range is that you want the latest observation *before* the end time, that is still later than the start time. If there is none in that range, you get an empty set back.

The instance of adding a time range allows one to determine the useful life of an observation. And that helps distinguish between getLatest() and getNow(). In a way, getNow() almost makes an implicit call to the sensor to snag an immediate observation: Does this also imply a call to SPS to enable an additional, unplanned observation, one out of sync with an existing schedule?

>If a time range AND an nSamples() are specified, all criteria are used as limits, so a data item must be within both the time range and within the last nSamples observations. (You'll never get more than nSamples and you'll never get outside of the time range.)

That wasn't ever disputed, but I also don't think it was raised as a question.

>An observation could be a point, vector, array, or other grouping of data. Unfortunately I do not know the implicit semantics of the request -- e.g., whether it will be obvious what type of obsevation is being requested, and what type of observation is being returned -- so I can't comment usefully on the appropriate defaults.

For the sake of argument (in a Monte-pythonesque sense, of course), let's assume it's a point observation, but it really doesn't matter. Vector, array or swatch, etc., would have an attached observation time and therefore could be so queried. So you do have the tools to work with.

>An important subtlety, so far not addressed, is whether by 'latest' we mean 'the last to arrive', or 'the last to be measured'. Since things can and will arrive out of order, this can be an important distinction. Based on the likelihood of greatest value (a metric I have just made up, and whose evaluation I just performed entirely without concrete data) I conclude 'latest' should mean the latest to be measured but not later than now (since that is a temporal impossibility, so far as we know). This definition is likely to be most useful, as it will not return older data than the most recent sample available, and will also not return obviously faulty data over and over again. Boy would *that* be irritating.

I'll step out on a limb here and define 'latest' as last to be measured. Out-of-sequence measurements shouldn't be an issue because I'm assuming that there's capability to tell by timestamps when they were taken.

This could be dangerous. I can give examples...

>Is this sufficiently precise, consistent, comprehensive, and reasonable to consider the default set of assumptions? What is missing?

>More to the point, how much of this is explicitly defined already in SOS, and how much if any is contradicted?

At one point, I thought I knew... right now I'm no longer sure.

>Whatever we end up with should be documented as Best Practices.

In violent agreement with this statement...

gerry

Date:	Tue, 11 Mar 2008 17:29:06 -0700
From: Bill Howe <howeb@STCCMOP.ORG>

John Graybeal wrote: >At 4:35 PM -0700 3/11/08, Bill Howe wrote:

>>Consider a series of samples t0, t1, t2, t3, t4, t5, t6, t7, t8, t9.

>>Consider a query "time < t5 && nSamples(2)"

>>Does this query return {t3, t4}, or {}?

>The former, I would say. Making a number of obvious assumptions....

>Time<=t5 && nSamples(2) would return {t4, t5}.

>Right?

Looks good to me! So the nSamples() clause is applied last, after all the other criteria.

Bill

Date: Tue, 11 Mar 2008 17:46:49 -0700
From: John Graybeal <graybeal@mbari.org>

At 6:39 PM -0500 3/11/08, Gerry Creager wrote: >Aw, geez, and I thought I'd go home before it got dark!

You're going home?

>>'now' is a substitution command that is replaced by the current date and time when the expression is evaluated. It is therefore redundant with getLatest() without a time, because that will also define the relevant time as the current date and time when the expression is evaluated.

>is 'now' substituted at evaluation on the server or surrogate? Even if it is, 'now' isn't 'latest regardless of how long ago that was'. Or is it? I do think that some definition of temporal window should be explicit in here.

1) Now is substituted at evaluation on the server that executes the query. (He says definitively, without worrying about any possibly applicable facts....)

2) How the resulting value is *used* depends on the context of the particular method to which it is parameter. When 'now' is the single time parameter to a getLatest() call, I interpret it to mean return the latest observation taken before 'now' (i.e., before the current time), without constraint as to how far back that is. Measurements from a year ago may be every bit as useful as those that are more recent, depending on the client's context.

If the client is unhappy with how far back the result may be, it has two choices: look at the time(s) on the returned observation(s), or supply a date range, as Gerry says. (And it would be nice if getLatest ranges could include offsets from now, not just start times, so calls could be getLatest(now-3d,now), but that's for another thread.)

i'm going to ignore the getNow topic, which risks starting a whole new set of interesting things for me to do instead of my real job...

I think we're getting consensus that getLatest is by measurement time, not by arrival time.

Would these questions be appropriate to post back to SOS-related lists, or do we just declare our answer and go home?

John

Date:	Wed, 12 Mar 2008 10:52:21 -0400
From: Eric Bridger <eric@GOMOOS.ORG>

I agree that we are reaching consensus. GetObservation with no time parameter should return one observation where t <= now.

I don't mean to muddy the waters but the SOS spec seems to address this issue with optional Enhanced Operations elements. GetObservation can request a responeMode = responseTemplate (vs. inline) then using that template id to call the optional GetResult method. The example responseTemplate in the spec has time parameters showing the TTL of the responseTemplate id, during which the client may call GetResult with the time of the last result received and receives everything after that time stamp. The first GetResult call uses no time and gets all the results. Subsequent calls pass a time so the Client gets no duplicates.

-Eric

Date:	Wed, 12 Mar 2008 08:22:07 -0700
From: Matthew Arrott <marrott@UCSD.EDU>

On 3/12/08 7:52 AM, "Eric Bridger" <eric@GOMOOS.ORG> wrote:

>I agree that we are reaching consensus. GetObservation with no time parameter should return one observation where t <= now.

+1

>I don't mean to muddy the waters but the SOS spec seems to address this issue with optional Enhanced Operations elements. GetObservation can request a responeMode = responseTemplate (vs. inline) then using that template id to call the optional GetResult method.

Sounds like a form of a Queue management mechanism (only based on what is stated here)

Date: Wed, 12 Mar 2008 08:29:32 -0700
From: John Graybeal <graybeal@mbari.org> Subject: Re: getLatest SOS

Ahh, that may be very useful. (Though it seems very complex, is it simpler than it sounds?) I think it is a different use case than the primary one for which we were discussing getLatest(). Or maybe I was just ignoring that other use case!

Some discussion on the thread hinted at using getLatest for this same purpose, and one could certainly do so using time ranges (maybe not with 'now', though) and an nSamples that was large. (Hmm, an nSamples of -1 should get all data that meets the criteria, perhaps?)

Just so everyone knows, out-of-order data will be a problem with many data streams. *That* will muddy the water, and I don't think we should dig into it now.

John

Date:	Wed, 12 Mar 2008 11:46:16 -0400
From: Luis Bermudez <bermudez@SURA.ORG>

So, as Eric said "GetObservation with no time parameter should return one observation where t <= now." Could we add that the t should be the max known t ? And is it still OK that my Max t was 10 years ago ? I could envision how a client will make the decision, based on its view of "latest" or "now", about accepting or discarding the observation result.

-luis

On Mar 12, 2008, at 11:29 AM, John Graybeal wrote:

>Ahh, that may be very useful. (Though it seems very complex, is it simpler than it sounds?) I think it is a different use case than the primary one for which we were discussing getLatest(). Or maybe I was just ignoring that other use case!

Date:	Wed, 12 Mar 2008 10:57:06 -0500
From: Gerry Creager <gerry.creager@TAMU.EDU> Subject: Re: getLatest SOS

I can live with this.

gerry

Date:	Wed, 12 Mar 2008 10:46:18 -0600
From: Carl Reed <creed@OPENGEOSPATIAL.ORG> Organization: OGC

Not sure if this email will get through.

Has anyone contacted Simon Cox, Mike Botts, or Ron Lake for their input? They might have some insight that could be useful.

Regards

Carl

Date:	Wed, 12 Mar 2008 15:12:49 -0400
From: "David R. Forrest" <drf5n@MAPLEPARK.COM> Subject: Re: getLatest SOS

On Wed, 12 Mar 2008, Luis Bermudez wrote:

>So, as Eric said "GetObservation with no time parameter should return one observation where t <= now." Could we add that the t should be the max known t ? >And is it still OK that my Max t was 10 years ago ? I could envision how a client will make the decision, based on its view of "latest" or "now", about accepting or discarding the observation result.

Looking at WMS's solution to this problem, http://portal.opengeospatial.org/modules/admin/license_agreement.php?suppressHeaders=0&access_license_id=3&target=http://portal.opengeospatial.org/files/i ndex.php?artifact_id=4756 section C.1) it uses a little different terminology: 'current' for 'now' and 'current'+'nearestValue' for 'getLatest'. As for what you get when you give no TIME value, if it accepts a default like 'current', that default should be defined in the GetCapabilities response. If the server does not have a default for TIME, it returns an error.

Some of my data gets timestamped at a second or two past the every-15-minute sample, and a time<=12:00 might pick up the 11:45:01 sample.

If I were trying to distribute estimates of observations from forecast model data through SOS, the 'max known t' might be 3-5 days in the future, and now+15 minutes would be more 'now'-like than now-2:45.

Dave

Date: Wed, 12 Mar 2008 13:03:13 -0700
From: John Graybeal <graybeal@mbari.org> 

Definitely there are two different use cases and I only targeted the one. (I think that's because the documentation I was looking at only seemed to care about the one.)

It seems important to define the two different use cases in our document.

But WMS' current+nearestValue is not doing what I think we agree getLatest should do. The WMS current+nearestValue is much more of a getNearest. Which is important for 'use case 2'.

And what does WMS do when you only enter 'current' -- does it give you a value only if its timestamp matches perfectly the timestamp associated with 'current', or does it give you whatever value is the latest as of the 'current' time?

Since WMS has this capability, have we filled out our matrix wrong?

So yes, I agree there should be a getNearest function, which I think I'm hearing WMS can do but SOS can not?

John

Date:	Wed, 12 Mar 2008 17:12:09 -0400
From: Eric Bridger <eric@GOMOOS.ORG>

Well I'm not sure what to say about forecasts, but I would argue that the concept of 'now' is completely different for a machine than a human.

We use Postgres. select now from .... yields => 2008-03-12 17:04:29.090697-04 which is a timestamp with seconds to 6 decimal places. THUS any request for an observation 'now' (The default for an SOS GetObservation request with no time parameter), at least in OOSTethys, must be approximated to the most recent observation. If not, then any request for 'now' data would almost always fail since our observations are stored only to the nearest half-hour.

Eric

Date:	Thu, 13 Mar 2008 10:58:37 -0400
From: Luis Bermudez <bermudez@SURA.ORG>

Hi Carl, good idea. I think we need us to rich an interim conclusion and then we will contact them. Maybe after this week telecon.

-luis

Appendix 2: Previous Proposals

These proposals have been archived here for reference, but are no longer actively discussed.

Although our discussion reached the conclusion that a getObservation query without a time should return just one observation, I do not believe this is consistent with expected behavior of getObservation. Given that it works like a query, the typical query behavior is to return all the replies matching the query. I therefore substitute a different proposal, as follows.

Best Practice [proposed]: A getObservation with a relativePosition of "TM_Begins" or "TM_Ends" should return the earliest or latest observation in the specified time object, respectively. See Example 1 for XML.

Example 1:

 <sos:eventTime>
    <ogc:TM_Ends>
      <gml:TimePeriod>
        <gml:beginPosition>2007-02-04T12:24:00</gml:beginPosition>
        <gml:endPosition>2007-02-04T15:24:00</gml:endPosition>
      </gml:TimePeriod>
    </ogc:TM_Ends>
  </sos:eventTime>

Best Practice [proposed]: A getObservation with a relativePosition of "TM_Ends" and a TimePosition with indeterminatePosition="now" should return the latest observation. (Note that there is no other deterministic meaning for this construct that is likely to yield data, since 'now' will be interpreted with arbitrary and unknowable precision at the server side, so having a data sample with timestamp exactly matching the moment of execution is unlikely, unpredictable, and of no apparent value.) See Example 2 for XML.

Example 2:

 <sos:eventTime>
    <ogc:TM_Ends>
      <gml:TimeInstant>
        <gml:timePosition indeterminatePosition="now"/>
      </gml:TimeInstant>
    </ogc:TM_Ends>
  </sos:eventTime>

Best Practice [proposed]: Users who wish to avoid getting an excessively old observation when asking for the latest observation should use a getObservation with a relativePosition of "TM_Ends", and a TimeInterval time object with a beginPosition that is recent enough to exclude unwanted data, and no endPosition specified. See Example 3 for XML.

Example 3:

 <sos:eventTime>
    <ogc:TM_Before>
      <gml:TimePeriod>
        <gml:beginPosition>2007-02-04T12:24:00</gml:beginPosition>
      </gml:TimePeriod>
    </ogc:TM_Before>
  </sos:eventTime>

References

  1. OGC Schemas on the web (to find a particular schema cited above, append the path to this base URL)
  2. Tony Cook
  3. Panagiotis (Peter) A. Vretanos (PPT), slide 14
  4. 05-093r2: Filter Encoding Implementation Standard (referred to in SWS WG email thread as 'FES'), a change request to original document OGC 04-095, v1.1.0; URL not found for either reference).
    The fact that the temporal operator relative position and the time object can each apply to either an instant or a time range, and the temporal operator can even be the opposite concept of the time object (e.g., TM_Before used with an indeterminatePosition of 'after'), makes things pretty, umm, interesting. We would have to work through the combinations to see what each implies.
    The request-for-change cited here defines the temporal operators like TM_Ends, but only in this direction: If I give you my time (instant or period) and some other time (instant or period), which temporal operator describes the relationship of myTime to otherTime? Not quite our problem, but I think what it describes can be inverted into the framing we are interested in, as follows: Give me all your points that meet the test:
        my_time(instant or period) hasMyTemporalRelationshipTo time(your point)


OOSTethys.org: hosted by GoMOOS | powered by Plone