Adobe Analytics

SasikalaEs · 5/22/25

Hi Everyone,

I would like to check if someone has come across this type of request for data nalysis with Adobe data feed.To achieve this i am trying to build a data pipeline. i have a medallion architecture in mind. And the goal is to have one table at the visit level and aggregated tables as per business requirement. Please share your thoughts/challenges/experience if you have any regarding this.

Thanks

pradnya_balvir · 5/22/25

Hi @SasikalaEs ,

Key Design Considerations:

Data Ingestion

Format: Adobe feeds are TSV with thousands of columns.
Delivery: Often daily, partitioned by date/hour.
Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.

Visit-Level Aggregation

Adobe doesn’t explicitly give visits in data feeds, you must:

Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
Ensure sessionization logic (handling visit timeouts, cross-day visits).

Identity Resolution

post_visid_high/low or mcvisid/mid fields used for visitor ID.
Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.

Medallion Architecture

Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
Silver: Normalize fields, resolve sessions, de-duplicate hits.
Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.

Aggregation Examples

Sessions by traffic source.
Page views by product category.
Time spent on site by user cohort.
Custom attribution models for conversions.

Suggested Tools:

Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery
Orchestration: Airflow, Azure Data Factory, dbt
Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)
Analytics: Power BI, Tableau, Looker
Schema Evolution: Apache Iceberg or Delta for handling schema changes

View solution in original post

pradnya_balvir · 5/22/25

Hi @SasikalaEs ,

Key Design Considerations:

Data Ingestion

Format: Adobe feeds are TSV with thousands of columns.
Delivery: Often daily, partitioned by date/hour.
Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.

Visit-Level Aggregation

Adobe doesn’t explicitly give visits in data feeds, you must:

Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
Ensure sessionization logic (handling visit timeouts, cross-day visits).

Identity Resolution

post_visid_high/low or mcvisid/mid fields used for visitor ID.
Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.

Medallion Architecture

Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
Silver: Normalize fields, resolve sessions, de-duplicate hits.
Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.

Aggregation Examples

Sessions by traffic source.
Page views by product category.
Time spent on site by user cohort.
Custom attribution models for conversions.

Suggested Tools:

Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery
Orchestration: Airflow, Azure Data Factory, dbt
Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)
Analytics: Power BI, Tableau, Looker
Schema Evolution: Apache Iceberg or Delta for handling schema changes

Jennifer_Dungan · 5/23/25

One more thing to keep in mind.

Rad Data feeds have every row of data collected... including rows that have been excluded (bots, internal traffic, malformed data, etc).

When processing your raw data, don't forget to check the exclude_hit and make sure that you don't include these rows, or your data will be inflated.

Also, make sure you are using the "post" version of the data where ever possible.. this is the post-processed version of the data (so your processing rules, vista rules, etc).

Adobe Analytics

Build Data pipeline with Adobe data feeds

Learn

Documentation

Events

Community

Support

Resources

Adobe account

Adobe