Expand my Community achievements bar.

Submissions are now open for the 2026 Adobe Experience Maker Awards
SOLVED

Build Data pipeline with Adobe data feeds

Avatar

Level 1

Hi Everyone,

 

I would like to check if someone has come across this type of request for data nalysis with Adobe data feed.To achieve this i am trying to build a data pipeline. i have a medallion architecture in mind. And the goal is to have one table at the visit level and aggregated tables as per business requirement. Please share your thoughts/challenges/experience if you have any regarding this. 

 

Thanks

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @SasikalaEs ,

 

Key Design Considerations:

Data Ingestion

  • Format: Adobe feeds are TSV with thousands of columns.
  • Delivery: Often daily, partitioned by date/hour.
  • Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.
  1. Visit-Level Aggregation
  • Adobe doesn’t explicitly give visits in data feeds, you must:
    • Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
    • Ensure sessionization logic (handling visit timeouts, cross-day visits).
  1. Identity Resolution
  • post_visid_high/low or mcvisid/mid fields used for visitor ID.
  • Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.
  1. Medallion Architecture
  • Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
  • Silver: Normalize fields, resolve sessions, de-duplicate hits.
  • Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.
  1. Aggregation Examples
  • Sessions by traffic source.
  • Page views by product category.
  • Time spent on site by user cohort.
  • Custom attribution models for conversions.

Suggested Tools:

 

  • Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery

  • Orchestration: Airflow, Azure Data Factory, dbt

  • Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)

  • Analytics: Power BI, Tableau, Looker

  • Schema Evolution: Apache Iceberg or Delta for handling schema changes

 

View solution in original post

2 Replies

Avatar

Correct answer by
Community Advisor

Hi @SasikalaEs ,

 

Key Design Considerations:

Data Ingestion

  • Format: Adobe feeds are TSV with thousands of columns.
  • Delivery: Often daily, partitioned by date/hour.
  • Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.
  1. Visit-Level Aggregation
  • Adobe doesn’t explicitly give visits in data feeds, you must:
    • Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
    • Ensure sessionization logic (handling visit timeouts, cross-day visits).
  1. Identity Resolution
  • post_visid_high/low or mcvisid/mid fields used for visitor ID.
  • Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.
  1. Medallion Architecture
  • Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
  • Silver: Normalize fields, resolve sessions, de-duplicate hits.
  • Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.
  1. Aggregation Examples
  • Sessions by traffic source.
  • Page views by product category.
  • Time spent on site by user cohort.
  • Custom attribution models for conversions.

Suggested Tools:

 

  • Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery

  • Orchestration: Airflow, Azure Data Factory, dbt

  • Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)

  • Analytics: Power BI, Tableau, Looker

  • Schema Evolution: Apache Iceberg or Delta for handling schema changes

 

Avatar

Community Advisor and Adobe Champion

One more thing to keep in mind.

 

Rad Data feeds have every row of data collected... including rows that have been excluded (bots, internal traffic, malformed data, etc).

 

When processing your raw data, don't forget to check the exclude_hit and make sure that you don't include these rows, or your data will be inflated.

 

 

Also, make sure you are using the "post" version of the data where ever possible.. this is the post-processed version of the data (so your processing rules, vista rules, etc).