The way that dictionary encoding is implemented in C++ (with DictionaryType, DictionaryArray) is a construct particular to the library. At the protocol level, dictionary encoding is a property of field at some level of a schema tree . The dictionary itself is a record batch with a single field/column  So based on the protocol there is no requirement for uniqueness in the dictionary. I would say it would be preferable for implementations to avoid constructing dictionaries with duplicates, though. - Wes : https://github.com/apache/arrow/blob/master/format/Schema.fbs#L226 : https://github.com/apache/arrow/blob/master/format/Message.fbs#L71 On Wed, Dec 19, 2018 at 5:51 PM Ben Kietzman <ben.kietzman@xxxxxxxxxxx> wrote: > > Is it legal to create a DictionaryType whose dictionary has repeated > values?