Expand my Community achievements bar.

Adobe Experience Platform (AEP) & Apps User Groups are live to Network, learn, and share in your regional locations.
SOLVED

difference in datalake and profile store size

Avatar

Level 5

i have seen majorly dataLake size is smaller than profile store, and in very few cases dataLake size is larger than profile store.

based on my understanding, one of the reason dataLake size is comparatively is smaller bcz dataLake uses compression technique to store the data, hence the size is smaller that profile store.

And if dataSet is enabled for profile, much after the data ingestion has been done already . this causes very less profile to get ingested into profile store, which causes dataLake size to be higher than profile store.

 

Apart from above two reason, what are the other explanation for these size variance between dataLake and profile store ?

I have attached screenshot for reference.

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 5

Got this response from adobe support team

 

Data Lake: Stores data in compressed, column formats. Highly optimized for analytics, resulting in smaller on-disk size. *** - Profile Storage: Data is indexed for fast lookups, less compressed, richer in metadata, ready for real-time activation, which means a larger footprint. - Profile Storage: Merges identities and keeps one record per stitched profile. Sometimes ingesting events or attributes from various sources. This can inflate size if the identity graph is large or complex. - Profile Storage: Union schema retains all fields ever enabled for profile, even if deprecated later.

View solution in original post

5 Replies

Avatar

Level 4

Hi @Pradeep-Jaiswal ,

One reason I can think of is - In the data lake, the data is stored in its raw format, containing no structural (schema) information, no metadata and possibly less indexes. Whereas in the Profile Store, the data is accompanied with schema details and metadata information, efficient indexing (for faster access), which makes it heavier.

 

Thanks!

Avatar

Level 3

data lake store data predominantly in parquet format , whereas profile store data in columnar database (could be cosmodb). also identity graph is stored in graph database. this could be the reason for higher storage size in profile store compared with data lake.

Avatar

Administrator

Hi @Pradeep-Jaiswal,

Were you able to resolve this query with the help of the provided solutions, or do you still need further assistance? Please let us know. If any of the answers were helpful in moving you closer to a resolution, even partially, we encourage you to mark the one that helped the most as the 'Correct Reply.'

Thank you!



Sukrity Wadhwa

Avatar

Correct answer by
Level 5

Got this response from adobe support team

 

Data Lake: Stores data in compressed, column formats. Highly optimized for analytics, resulting in smaller on-disk size. *** - Profile Storage: Data is indexed for fast lookups, less compressed, richer in metadata, ready for real-time activation, which means a larger footprint. - Profile Storage: Merges identities and keeps one record per stitched profile. Sometimes ingesting events or attributes from various sources. This can inflate size if the identity graph is large or complex. - Profile Storage: Union schema retains all fields ever enabled for profile, even if deprecated later.

Avatar

Administrator

Thanks @Pradeep-Jaiswal for sharing the update!



Sukrity Wadhwa