feat: support time travel#201
Conversation
hamersaw
left a comment
There was a problem hiding this comment.
Does this mean that the reading with options example from the docs currently does not work?
I have PR out to add tag support to VERSION AS OF. If this gets merged first I can rebase and build on top, seems like a pretty eash update to the correct loadTable function.
| public static long findVersion(List<Version> versions, long timestamp) { | ||
| long versionID = -1; | ||
| Instant instant = instantFromTimestamp(timestamp); | ||
| for (Version version : versions) { |
There was a problem hiding this comment.
Are versions guaranteed to be ordered here? If not, I think we could return an incorrect version.
There was a problem hiding this comment.
| public static LanceSparkReadOptions createReadOptions( | ||
| String location, | ||
| Long versionId, | ||
| LanceSparkCatalogConfig catalogConfig, | ||
| LanceNamespace namespace, | ||
| List<String> tableId) { | ||
| LanceSparkReadOptions.Builder builder = | ||
| LanceSparkReadOptions.builder().datasetUri(location).withCatalogDefaults(catalogConfig); | ||
|
|
||
| if (tableId != null) { | ||
| builder.tableId(tableId); | ||
| } | ||
|
|
||
| if (namespace != null) { | ||
| builder.namespace(namespace); | ||
| } | ||
|
|
||
| if (versionId != null) { | ||
| builder.version(versionId.intValue()); | ||
| } | ||
| return builder.build(); | ||
| } |
There was a problem hiding this comment.
If we're just checking for null on all of the inputs here do we need to have the other two overloaded instances?
| return builder.build(); | ||
| } | ||
|
|
||
| // Determine if the timestamp is in microseconds or nanoseconds and convert to Instant |
There was a problem hiding this comment.
In which cases is the timestamp given in either nanos or micros?
There was a problem hiding this comment.
Changed. only need to take care of microseconds is good enough based on spark
/**
* Load table metadata at a specific time by {@link Identifier identifier} from the catalog.
* <p>
* If the catalog supports views and contains a view for the identifier and not a table, this
* must throw {@link NoSuchTableException}.
*
* @param ident a table identifier
* @param timestamp timestamp of the table, which is microseconds since 1970-01-01 00:00:00 UTC
* @return the table's metadata
* @throws NoSuchTableException If the table doesn't exist or is a view
*/
default Table loadTable(Identifier ident, long timestamp) throws NoSuchTableException {
throw QueryCompilationErrors.noSuchTableError(ident);
}
| String location = describeResponse.getLocation(); | ||
| Map<String, String> initialStorageOptions = describeResponse.getStorageOptions(); | ||
| @Override | ||
| public Table loadTable(Identifier ident, String version) throws NoSuchTableException { |
There was a problem hiding this comment.
It looks like all of the loadTable instances are basically duplicated. Would it make sense to consolidate some of the code? maybe provide a "version" either from string parsing or timestamp lookup as an argument and if it doesn't exist than use latest?
Hi @hamersaw Thanks a lot for your review. As I known current lance-spark does not support |
hamersaw
left a comment
There was a problem hiding this comment.
Looks good to me. I'm excited to build on this for TAG support!
This PR fixes load table when using a Lance namespace implementation, by setting the correct read options. This issue was introduced in #201. This fixes #223 To reproduce the problem, attempt to query a table when using a namespace catalog that vends credentials. The credentials will not be used and the query will fail. Note that throughout the code there are a different of ways that datasets are opened but I left those as-is for now as I don't have the full context on if that is intentional or not. Co-authored-by: Bryan Keller <bkeller@netflix.com>
Support
TIMESTAMP AS OFandVERSION AS OF