We're in the design phase of a large project to replace our existing classic static technical documents with DITA-based content. This will allow us to customise the content on the fly to meet a users environment, as well as bringing advantages to content creation via managed reuse of topics.
One of the advantages is being able to assemble one-off documents that contain topics (chunks) from different stored objects.The problem is what to tag in a case like this and how to report on it.
I'm trying to find examples of successful implementations of SiteCatalyst (or even Google Analytics ) in such an environment. What level of granularity have they tagged, how have they managed the resolution of multiple assembled topics back to the parent objects in the CCM etc? So far I've found nothing.
To give an idea of scale, a guesstimate of the size of the google sitemaps suggest around 5 million indexable (and hence potentially reportable) objects.
Can anyone point me at any examples or white papers or implementations?
Even though we use DITA to create and assemble the site pages, we track everything at the web page level, so there isn't too much that is DITA-specific about our Analytics implementation. There are many many ways to use content chunks to assemble web pages (databases, markdown, DITA, AEM/JCR, and so on), but in the end most organizations are concerned only with the performance of the assembled HTML page toward their site goals.
Another thing to keep in mind is that 5 million unique items is well beyond what any web analytics tool I'm aware of will track (remember, these tools track a lot of data about each item). Adobe Analytics can report 500k - 1 million unique values in any given month, and I think that GA is 50k free/75k premium for a reporting period (definitely not an expert here so you'll need to verify). Point is that these tools are not intended to track millions of items, they are best used to track hundreds of data points for tens of thousands of items.
I might suggest thinking about this less as a DITA site, and more about it as a content site. What metrics will be actionable for your content team, and how much complexity is needed to enable action? For example, if you identify pages with poor customer feedback or pages with a lot of traffic, can the authors generate a report from the CCM to find out what chunks are on the page rather than going to your Analytics implementation?
If you approach this as a content site, Here are some ideas of things you could track:
Track internal searches and search terms, and also track search loops (searching for the same term multiple times might indicate that good answers are not being found ).
Track the topic type (concept, task, reference) so you can see what types of content your customers view.
Track the breadcrumb / navigation hierarchy so you can view aggregate metrics for each section.
Use Activity Map to see where customers click, and use flow reports to see how they navigate.
Track language and geo data to help determine what translation is needed.
Many of the built-in metrics will also be very useful, such as time spent, pages viewed per visit, acquisition sources, bounces, and so on.
All that said, I'd love to hear any use cases you have in mind for chunk level tracking, and I might have some ideas if you end out going this route. For example, if you want to see metrics for content chunks, you could export a report from Analytics with page metrics, then export content chunk by page data from your CCM and combine the data.