Need Assistance with Extracting Component & Template Structure in JSON Format from AEM 6.5

Question

We are trying to extract component and template structure in JSON format from AEM CMS version 6.5, we required template structure details like fields Name, field types, any datasource url which added into field and for component, we required template name along with content which used to build the respective component. Currently for Page content extraction we used JSON exporter module(.model.json) and for template extraction we tried with querybuilder => /bin/querybuilder.json?path=/conf/we-retail/settings/wcm/templates/hero-page/initial/jcr:content/root&type=nt:unstructured&p.hits=selective&p.properties=jcr:path&p.limit=-1. We would like to understand is there any direct way to fetch the response for the above requirement

giuseppebaglio · Accepted Answer

You can think of writing an external script if you don't need to get the result from AEM directly. Previously I have written a Python script to retrieve a list of components and templates from a project, with the output formatted as CSV:

Title,Resource Type
Text (v2),core/wcm/components/text/v2/text
Breadcrumb (v1),core/wcm/components/breadcrumb/v1/breadcrumb

The code can be easily updated to generate JSON and parse dialog files to fetch all properties.

import os
import xml.etree.ElementTree as ET

def find_xml_components(search_path, searchComponents=True):
    """
    Recursively searches within a specified path for XML files named ".content.xml" that meet certain criteria.
    If searchComponents is True, it looks for components (jcr:primaryType="cq:Component") with a componentGroup other than ".hidden".
    If searchComponents is False, it looks for templates (jcr:primaryType="cq:Template").
    Prints a CSV row for each file that meets the criteria, with the file path and the value of the jcr:title property.

    Args:
        search_path (str): The path in which to search for XML files.
        searchComponents (bool): If True, searches for components; if False, searches for templates.
    """

    # Check if the passed path is a valid directory
    if not os.path.isdir(search_path):
        print(f"Error: The path '{search_path}' is not a valid directory.")
        return

    csv_data = []
    for root, _, files in os.walk(search_path):
        for file in files:
            if file == ".content.xml":
                file_path = os.path.join(root, file)
                # Skip files in header or footer folders
                if 'header' in file_path or 'footer' in file_path:
                    continue
                try:
                    # Extract the resource type from the file path
                    resource_type = file_path.split('/apps/')[1].replace('/.content.xml', '')
                    tree = ET.parse(file_path)
                    root_element = tree.getroot()

                    # Discover namespaces
                    ns = {}
                    for event, elem in ET.iterparse(file_path, events=("start-ns",)):
                        if event == "start-ns":
                            ns[elem[0]] = elem[1]

                    # Check if jcr and cq namespaces are present, or set default namespaces to avoid errors
                    jcr_ns = ns.get("jcr", "http://www.jcp.org/jcr/1.0")
                    cq_ns = ns.get("cq", "http://www.day.com/jcr/cq/1.0")

                    # Get the values of the attributes
                    primary_type_element = root_element.attrib.get(f"{{{jcr_ns}}}primaryType")
                    component_group_element = root_element.attrib.get(f"{{{cq_ns}}}componentGroup")
                    jcr_title_element = root_element.attrib.get(f"{{{jcr_ns}}}title")
                    if jcr_title_element is None:
                        continue

                    if searchComponents and primary_type_element == "cq:Component":
                        # Check if the componentGroup is not ".hidden" or if it is not present
                        if component_group_element is None or (component_group_element != ".hidden"):
                            title = jcr_title_element if jcr_title_element is not None else ""
                            csv_data.append((resource_type, title))
                        else:
                            continue
                    elif primary_type_element == "cq:Template":
                        csv_data.append((resource_type, jcr_title_element))

                except ET.ParseError as e:
                    print(f"Error parsing XML file '{file_path}': {e}")
                except Exception as e:
                    print(f"Error handling file '{file_path}': {e}")

    # Print the results in CSV format
    if csv_data:
        print("Title,Resource Type")
        for row in csv_data:
            print(f"{row[1]},{row[0]}")
    else:
        print("No files matching the criteria found")


if __name__ == "__main__":
    print("Components CSV:")
    for search_directory in [
        '~/aem-core-cif-components/ui.apps/src/main/content/jcr_root/apps/core/cif'
        ]:
        find_xml_components(search_directory, True)
    print("Templates CSV:")
    for search_directory in [
        '~/aem-core-wcm-components/content/src/content/jcr_root/apps/core'
        ]:
        find_xml_components(search_directory, False)

AmitVishwakarma · Answer

Hi @bhuvaneshwarig ,Unfortunately, AEM 6.5 does not provide a direct OOTB API to extract component/template structure as JSON including field types and datasources.Recommended Approach:1. Use QueryBuilder with enriched properties for partial info.2. Write a Groovy script or Sling Servlet for detailed, structured extraction.3. If frequently needed, consider creating a custom tool/endpoint that recursively parses component/template folders and outputs structured JSON.1. JSON Exporter (Sling Model Exporter)Use Case: Extracting page content as JSON.Limitation: Primarily for content rendering, not for template or component metadata.Your Current Use: Correct for page content but not suitable for extracting field configurations or datasources from templates/components.2. QueryBuilder Approach for Template FieldsYou’re using something like:/bin/querybuilder.json?path=/conf/we-retail/settings/wcm/templates/hero-page/initial/jcr:content/root&type=nt:unstructured&p.hits=selective&p.properties=jcr:path&p.limit=-1This helps extract node structure, but you might miss granular field-level details, like: - Field Names - Field Types (e.g., text, pathbrowser, multifield) - Datasource paths (if any)Suggestion: - Enhance the query with more p.properties like sling:resourceType, name, fieldLabel, fieldDescription, datasource, options, etc.Example:p.properties=jcr:path,name,sling:resourceType,fieldLabel,fieldDescription,datasource 3. JCR Node Traversal (Custom Servlet or Script)For detailed extraction, you may write a custom servlet or use Groovy scripts via ACS AEM Tools to recursively traverse:/conf//settings/wcm/templates → for templates/apps//components → for component dialogsYou can access:cq:dialog or _cq_dialog nodes for componentsFetch field names, field types, and datasource values from granite:Field typesExample JSON structure output (custom):{ "component": "my-project/components/content/hero", "template": "my-project/templates/hero-template", "fields": [ { "name": "title", "type": "text", "label": "Hero Title" }, { "name": "image", "type": "pathbrowser", "datasource": "/mnt/overlay/dam/gui/content/assets.html" } ]}4. ACS AEM Tools – Groovy Console (Quick Extraction) - URL: /etc/groovyconsole.html - Write a script to traverse templates/components, read dialogs, and output JSON. - Ideal for one-time extraction or prototyping.Regards,Amit

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded