Expand my Community achievements bar.

SOLVED

What's the best solution to update a dataflow

Avatar

Level 3

Hi there, 

 

I wonder what's the best solution to update a dataflow, here is the background: We have a existing dataflow, which will ingest data every day, and there are mapping, dataset, schema associate with this dataflow.

 

If we need to add a new field to the schema, and would like to use this dataflow as well, what changes we need to make? Please correct me if I'm wrong:

1. Update the schema with new field

2. Update the mapping with new field

 

After I finish the changes above, the dataflow didn't ingest the new field, and when I query the dataset, no new field either. Do I miss something? Or in this case, I can only setup a new dataflow, mapping, dataset, etc,.

 

Thanks

1 Accepted Solution

Avatar

Correct answer by
Level 5

@xliu schema changes and dataflow update(update just mapping through API or UI) will suffice in terms of configuration, make sure upstream data records have this new attribute value and check mapping is correct to pickup to flow into platform.

 

based on streaming on batch wait for relevant time for data to make to datalake for verification. Let me know if you need further assistance on this. 

View solution in original post

4 Replies

Avatar

Correct answer by
Level 5

@xliu schema changes and dataflow update(update just mapping through API or UI) will suffice in terms of configuration, make sure upstream data records have this new attribute value and check mapping is correct to pickup to flow into platform.

 

based on streaming on batch wait for relevant time for data to make to datalake for verification. Let me know if you need further assistance on this. 

Avatar

Level 3

Hi @nnakirikanti 

 

Thank you very much for reply.

 

I confirmed that the schema changed, and also the mapping. And the upstream data records have the new value as well. And the dataflow shows successfully processed the data file.

 

However, the new field is not in the dataset when I previewing the dataset, also not in the dataset when I querying data.

 

Did I miss any steps? Or do I need to change the version number of schema or mapping?

 

Thank you

Avatar

Level 5

@xliuI envision this issue would be mapping, please do provide more details on data format with sample data, dataflow mapping screenshot, dataflow execution detailed screenshot, schema with field details screenshot for further vetting on this.

Avatar

Level 3

Thank you so much @nnakirikanti 

 

I double checked the details of the dataflow and mapping info, and found that after I update the mapping, the version number changed from 0 to 1, and the dataflow still associated with mapping version 0. Then I updated the mapping version number in the dataflow, and all good now.

 

So the steps are:

1. update schema

2. update mapping

3. check new mapping version, and udpate dataflow