Build Data pipeline with Adobe data feeds

Question

Hi Everyone,

I would like to check if someone has come across this type of request for data nalysis with Adobe data feed.To achieve this i am trying to build a data pipeline. i have a medallion architecture in mind. And the goal is to have one table at the visit level and aggregated tables as per business requirement. Please share your thoughts/challenges/experience if you have any regarding this.

Thanks

pradnya_balvir · Accepted Answer

Hi @sasikalaes ,

Key Design Considerations:

Data Ingestion

Format: Adobe feeds are TSV with thousands of columns.
Delivery: Often daily, partitioned by date/hour.
Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.

Visit-Level Aggregation

Adobe doesn’t explicitly give visits in data feeds, you must:

Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
Ensure sessionization logic (handling visit timeouts, cross-day visits).

Identity Resolution

post_visid_high/low or mcvisid/mid fields used for visitor ID.
Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.

Medallion Architecture

Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
Silver: Normalize fields, resolve sessions, de-duplicate hits.
Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.

Aggregation Examples

Sessions by traffic source.
Page views by product category.
Time spent on site by user cohort.
Custom attribution models for conversions.

Suggested Tools:

Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery
Orchestration: Airflow, Azure Data Factory, dbt
Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)
Analytics: Power BI, Tableau, Looker
Schema Evolution: Apache Iceberg or Delta for handling schema changes

Jennifer_Dungan · Answer

One more thing to keep in mind.

Rad Data feeds have every row of data collected... including rows that have been excluded (bots, internal traffic, malformed data, etc).

When processing your raw data, don't forget to check the exclude_hit and make sure that you don't include these rows, or your data will be inflated.

Also, make sure you are using the "post" version of the data where ever possible.. this is the post-processed version of the data (so your processing rules, vista rules, etc).

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded