codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Rust] move parquet into a separate sub-crate


Thanks Paddy. Similarly, I can't see a reason for arrow to reference
parquet since it is all about the in-memory data representation and tools
to build and process the data. In case it does, we probably should move the
logic to the parquet side.

IMO the arrow-parquet integration (e.g., reading from parquet to arrow,
writing arrow into parquet) should happen in parquet side as it involves
encoding/decoding mechanisms which are specific to the latter. Therefore,
parquet needs to depend on arrow. The cargo workspace is pretty flexible
about this so a sub-crate is allowed to depend on the main crate.

On Fri, Dec 28, 2018 at 6:55 PM paddy horan <paddyhoran@xxxxxxxxxxx> wrote:

> This seems reasonable.  The flexibility with CI is a positive for sure.
>
> > 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> > depends on arrow, we can't reference parquet in arrow.
>
> This is my only concern, the Rust implementation is evolving rapidly and
> adopting workspaces may reduce our flexibility, I can’t think of a specific
> situation right now but might this be a problem in the future?
>
> If we have this restriction we have to agree on which way dependencies
> flow, what is your preference?
>
> I have not used workspaces in anger but it seems it is designed for crates
> to flow up?  i.e. that arrow would be allowed to reference the parquet
> sub-crate.
>
> P
>
> From: Renjie Liu<mailto:liurenjie2008@xxxxxxxxx>
> Sent: Thursday, December 27, 2018 10:09 PM
> To: dev@xxxxxxxxxxxxxxxx<mailto:dev@xxxxxxxxxxxxxxxx>
> Subject: Re: [Rust] move parquet into a separate sub-crate
>
> Cool. It may also be worthy to put adapters into a separate crate.
>
> On Fri, Dec 28, 2018 at 4:10 AM Chao Sun <sunchao@xxxxxxxxxx> wrote:
>
> > Hi,
> >
> > It just occurs to me that it may be a better idea to move the parquet
> > module into a separate sub-crate by using cargo workspaces
> > <https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html>. The
> > advantage is that we can make the project more modular (in future, we may
> > want to add more sub-crates such as arrow/parquet_derive, orc, gandiva,
> > etc), and allow us to run CI jobs separately on each crate.
> >
> > Some small caveats:
> > 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> > depends on arrow, we can't reference parquet in arrow. This doesn't seem
> > like an issue though since arrow itself should be physical on-disk format
> > independent. I also didn't see any reference on parquet in cpp/src/arrow.
> > 2. The path dependency used in workspace has to be changed to a version
> > number when we do "cargo publish". This should be added to the release
> > instructions and committer who performs the job should do the extra step.
> >
> > Thoughts?
> >
> > Chao
> >
>
>
> --
> Renjie Liu
> Software Engineer, MVAD
>
>