i have seen majorly dataLake size is smaller than profile store, and in very few cases dataLake size is larger than profile store.
based on my understanding, one of the reason dataLake size is comparatively is smaller bcz dataLake uses compression technique to store the data, hence the size is smaller that profile store.
And if dataSet is enabled for profile, much after the data ingestion has been done already . this causes very less profile to get ingested into profile store, which causes dataLake size to be higher than profile store.
Apart from above two reason, what are the other explanation for these size variance between dataLake and profile store ?
I have attached screenshot for reference.
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Got this response from adobe support team
Data Lake: Stores data in compressed, column formats. Highly optimized for analytics, resulting in smaller on-disk size. *** - Profile Storage: Data is indexed for fast lookups, less compressed, richer in metadata, ready for real-time activation, which means a larger footprint. - Profile Storage: Merges identities and keeps one record per stitched profile. Sometimes ingesting events or attributes from various sources. This can inflate size if the identity graph is large or complex. - Profile Storage: Union schema retains all fields ever enabled for profile, even if deprecated later.
Hi @Pradeep-Jaiswal ,
One reason I can think of is - In the data lake, the data is stored in its raw format, containing no structural (schema) information, no metadata and possibly less indexes. Whereas in the Profile Store, the data is accompanied with schema details and metadata information, efficient indexing (for faster access), which makes it heavier.
Thanks!
data lake store data predominantly in parquet format , whereas profile store data in columnar database (could be cosmodb). also identity graph is stored in graph database. this could be the reason for higher storage size in profile store compared with data lake.
Views
Replies
Total Likes
Hi @Pradeep-Jaiswal,
Were you able to resolve this query with the help of the provided solutions, or do you still need further assistance? Please let us know. If any of the answers were helpful in moving you closer to a resolution, even partially, we encourage you to mark the one that helped the most as the 'Correct Reply.'
Thank you!
Views
Replies
Total Likes
Got this response from adobe support team
Data Lake: Stores data in compressed, column formats. Highly optimized for analytics, resulting in smaller on-disk size. *** - Profile Storage: Data is indexed for fast lookups, less compressed, richer in metadata, ready for real-time activation, which means a larger footprint. - Profile Storage: Merges identities and keeps one record per stitched profile. Sometimes ingesting events or attributes from various sources. This can inflate size if the identity graph is large or complex. - Profile Storage: Union schema retains all fields ever enabled for profile, even if deprecated later.
Thanks @Pradeep-Jaiswal for sharing the update!
Views
Replies
Total Likes
Views
Likes
Replies
Views
Likes
Replies
Views
Likes
Replies