Expand my Community achievements bar.

SOLVED

Is there a technical solution to import an Excel file with 60k+ rows

Avatar

Level 3

I need a technical solution to import an Excel file with 60k+ rows, multiple columns with different prices, into AEM (Adobe Experience Manager). These records should be filterable, searchable, and paginated. The results should later be exportable as CSV. Content Fragments are not a viable solution, as it would require parsing all the records just to populate the available filters (some filters depend on others).

The solution should be a migration of this functionality: https://billing.christianacare.org/transparency/chargemaster?combine=knee&ipid=1&ipid2=2&hsp=Wilming...

 

Reading the file with each request/search is not a viable solution. The file we received is over 7MB in size, and it would take forever to read, format the data into something understandable by Java, filter it, and then return it.

Displaying the data is not the issue, not with the frontend being the bottleneck. We need a data structure that can be filtered, sorted, and paginated IN AEM, using Java.

These files are updated approximately every 6 months. The data from that file will be accessed by multiple instances of the same component or different components, and we do not want to update dozens of components every time the file changes.

 

Also, regardless of how the data is stored, publishing should be taken into account because the file is uploaded to Author, and the data will need to be available on Publish as well.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

@Mohammed-Skouti 

After you create/update the nodes with all the required properties, you can collect those node full paths into an ArrayList and replicate the paths in this ArrayList from author instance. This makes sure you have most updated synched content on all the publishers.

View solution in original post

5 Replies

Avatar

Community Advisor

@Mohammed-Skouti ,

You may write a Servlet/Service to read your excel file and store the pricing data in the /var/commer/yourproject/products/organized-structure/node1 and store the data as the nodes.

 

You can set the "Charge Type" "HCPC" "Price" "Minimum" "Maximum" "Aetna" as properties on these data-nodes. If you have a better hierarchy to create these data-nodes, then you will have better time organizing and maintaining.

 

We are doing this approach in my project, We have multiple input options to get the data but the data is stored in /var/commerce/products/<company>/product-family/category/product way.

 

Once you have data-stored in the nodes, all it takes is to write required models to read the nodes and export the data in the required format like JSON/Java objects.

 

Hope this idea helps.

 

Thanks,

Sudheer.

 

 

@Sudheer_Sundalam  @aanchal-sikka 

How does the nodes from author get to publish instance, we can't publish them from the CRX

Avatar

Correct answer by
Community Advisor

@Mohammed-Skouti 

After you create/update the nodes with all the required properties, you can collect those node full paths into an ArrayList and replicate the paths in this ArrayList from author instance. This makes sure you have most updated synched content on all the publishers.

Avatar

Community Advisor

Adding to @Sudheer_Sundalam 's suggestions.

 

Avoid reprocessing of the excel on publish instance. 

- Process it on Author

- Replicate the updated structure from author to publish.

 

That way you can also be sure that what you see in author is also on publish. This will specially be helpful in case of AEMaaCS.

 

You might also want to consider clearing up the var nodes, once they become redundant


Aanchal Sikka

Avatar

Employee

Assuming that this is the only Excel file which needs this kind of handling, and that there are no other (access) restrictions on it:

  • Read that file on startup in an optimized in-memory structure, which allows you to perform these required search operations efficiently. This should be done in a dedicated OSGI service, so it's loaded once at startup (you might to add functionality so it will be reloaded in the background when the underlying Excel file is updated).
  • Requests accessing these data just use the service interface, that means that no access triggers any kind of parsing etc, and therefor should be lightweight.
  • The Excel file is stored as-is as binary (not being transformed into a JCR node structure), as you don't want to access any element of it via JCR.

So in the end it's not really an AEM-specific question, much of the functionality (finding the right datastructure and implementing efficient access on it) is pure Java-coding. You just need to make it work in an OSGI world.