Exporting Content into 3rd party search

Question

Hey All,
 
I am trying to figure out the best way to export pages and their content into a 3rd party search company. The data needs to look like this :
 
{
  "op": "add",
  "path": "/path/to/page",
  "value": {
    "attributes": {
      "title": "Page Title",
      "url": "https://www.website.com/path/to/page.html",
      "description": "This is basiclly all of the content on the page. So if there is 2 different text area's on the page it should put that content inside this description."
    }
  }
}

I know I can ask the page for everything including a description, but that doesn't account for things on the page. For example, lets say I put a new text component on the page and I added text that I want searched on, then it wouldn't pull that data. I started to look into jSoup (and httpclient since I am a SPA), to crawl the page, but is that the best option?
 
Thanks for anyone who has an opinion

Sean-McK · Accepted Answer

This is only good if you are running outside of AEM. I realized a couple of things...one of them was using HTMLUnit and Jsoup to scrape and parse. I decided to use content services and the json model to parse.