Hi @RyanMoravick,
The duplicate records in the dataset make up the Datalake size. This increases the license storage limit. Having said that, the data in the Datalake is compressed into Parquet file format, so the impact will be minimal unless you ingest millions of duplicate rows every day.
Regards,
Kumar Saurabh