Need Assistance with Extracting Component & Template Structure in JSON Format from AEM 6.5 | Community
Skip to main content
March 18, 2025
Solved

Need Assistance with Extracting Component & Template Structure in JSON Format from AEM 6.5

  • March 18, 2025
  • 2 replies
  • 464 views

We are trying to extract component and template structure in JSON format from AEM CMS version 6.5, we required template structure details like fields Name, field types, any datasource url which added into field and for component, we required template name along with content which used to build the respective component. Currently for Page content extraction we used JSON exporter module(.model.json) and for template extraction we tried with querybuilder => /bin/querybuilder.json?path=/conf/we-retail/settings/wcm/templates/hero-page/initial/jcr:content/root&type=nt:unstructured&p.hits=selective&p.properties=jcr:path&p.limit=-1. We would like to understand is there any direct way to fetch the response for the above requirement

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by giuseppebaglio

You can think of writing an external script if you don't need to get the result from AEM directly. Previously I have written a Python script to retrieve a list of components and templates from a project, with the output formatted as CSV:

Title,Resource Type Text (v2),core/wcm/components/text/v2/text Breadcrumb (v1),core/wcm/components/breadcrumb/v1/breadcrumb

 

The code can be easily updated to generate JSON and parse dialog files to fetch all properties.

import os import xml.etree.ElementTree as ET def find_xml_components(search_path, searchComponents=True): """ Recursively searches within a specified path for XML files named ".content.xml" that meet certain criteria. If searchComponents is True, it looks for components (jcr:primaryType="cq:Component") with a componentGroup other than ".hidden". If searchComponents is False, it looks for templates (jcr:primaryType="cq:Template"). Prints a CSV row for each file that meets the criteria, with the file path and the value of the jcr:title property. Args: search_path (str): The path in which to search for XML files. searchComponents (bool): If True, searches for components; if False, searches for templates. """ # Check if the passed path is a valid directory if not os.path.isdir(search_path): print(f"Error: The path '{search_path}' is not a valid directory.") return csv_data = [] for root, _, files in os.walk(search_path): for file in files: if file == ".content.xml": file_path = os.path.join(root, file) # Skip files in header or footer folders if 'header' in file_path or 'footer' in file_path: continue try: # Extract the resource type from the file path resource_type = file_path.split('/apps/')[1].replace('/.content.xml', '') tree = ET.parse(file_path) root_element = tree.getroot() # Discover namespaces ns = {} for event, elem in ET.iterparse(file_path, events=("start-ns",)): if event == "start-ns": ns[elem[0]] = elem[1] # Check if jcr and cq namespaces are present, or set default namespaces to avoid errors jcr_ns = ns.get("jcr", "http://www.jcp.org/jcr/1.0") cq_ns = ns.get("cq", "http://www.day.com/jcr/cq/1.0") # Get the values of the attributes primary_type_element = root_element.attrib.get(f"{{{jcr_ns}}}primaryType") component_group_element = root_element.attrib.get(f"{{{cq_ns}}}componentGroup") jcr_title_element = root_element.attrib.get(f"{{{jcr_ns}}}title") if jcr_title_element is None: continue if searchComponents and primary_type_element == "cq:Component": # Check if the componentGroup is not ".hidden" or if it is not present if component_group_element is None or (component_group_element != ".hidden"): title = jcr_title_element if jcr_title_element is not None else "" csv_data.append((resource_type, title)) else: continue elif primary_type_element == "cq:Template": csv_data.append((resource_type, jcr_title_element)) except ET.ParseError as e: print(f"Error parsing XML file '{file_path}': {e}") except Exception as e: print(f"Error handling file '{file_path}': {e}") # Print the results in CSV format if csv_data: print("Title,Resource Type") for row in csv_data: print(f"{row[1]},{row[0]}") else: print("No files matching the criteria found") if __name__ == "__main__": print("Components CSV:") for search_directory in [ '~/aem-core-cif-components/ui.apps/src/main/content/jcr_root/apps/core/cif' ]: find_xml_components(search_directory, True) print("Templates CSV:") for search_directory in [ '~/aem-core-wcm-components/content/src/content/jcr_root/apps/core' ]: find_xml_components(search_directory, False)

 

2 replies

AmitVishwakarma
Community Advisor
Community Advisor
March 18, 2025

Hi @bhuvaneshwarig ,

Unfortunately, AEM 6.5 does not provide a direct OOTB API to extract component/template structure as JSON including field types and datasources.

Recommended Approach:

1. Use QueryBuilder with enriched properties for partial info.
2. Write a Groovy script or Sling Servlet for detailed, structured extraction.
3. If frequently needed, consider creating a custom tool/endpoint that recursively parses component/template folders and outputs structured JSON.

1. JSON Exporter (Sling Model Exporter)

Use Case: Extracting page content as JSON.
Limitation: Primarily for content rendering, not for template or component metadata.
Your Current Use: Correct for page content but not suitable for extracting field configurations or datasources from templates/components.

2. QueryBuilder Approach for Template Fields

You’re using something like:

/bin/querybuilder.json?path=/conf/we-retail/settings/wcm/templates/hero-page/initial/jcr:content/root&type=nt:unstructured&p.hits=selective&p.properties=jcr:path&p.limit=-1

This helps extract node structure, but you might miss granular field-level details, like:

     - Field Names
     - Field Types (e.g., text, pathbrowser, multifield)
     - Datasource paths (if any)

Suggestion:

     - Enhance the query with more p.properties like sling:resourceType, name, fieldLabel, fieldDescription, datasource, options, etc.
Example:

p.properties=jcr:path,name,sling:resourceType,fieldLabel,fieldDescription,datasource

 

3. JCR Node Traversal (Custom Servlet or Script)

For detailed extraction, you may write a custom servlet or use Groovy scripts via ACS AEM Tools to recursively traverse:

/conf/<your-site>/settings/wcm/templates → for templates /apps/<your-project>/components → for component dialogs

You can access:

cq:dialog or _cq_dialog nodes for components Fetch field names, field types, and datasource values from granite:Field types

Example JSON structure output (custom):

{ "component": "my-project/components/content/hero", "template": "my-project/templates/hero-template", "fields": [ { "name": "title", "type": "text", "label": "Hero Title" }, { "name": "image", "type": "pathbrowser", "datasource": "/mnt/overlay/dam/gui/content/assets.html" } ] }

4. ACS AEM Tools – Groovy Console (Quick Extraction)

     - URL: /etc/groovyconsole.html
     - Write a script to traverse templates/components, read dialogs, and output JSON.
     - Ideal for one-time extraction or prototyping.


Regards,
Amit

 

giuseppebaglio
giuseppebaglioAccepted solution
Level 10
March 19, 2025

You can think of writing an external script if you don't need to get the result from AEM directly. Previously I have written a Python script to retrieve a list of components and templates from a project, with the output formatted as CSV:

Title,Resource Type Text (v2),core/wcm/components/text/v2/text Breadcrumb (v1),core/wcm/components/breadcrumb/v1/breadcrumb

 

The code can be easily updated to generate JSON and parse dialog files to fetch all properties.

import os import xml.etree.ElementTree as ET def find_xml_components(search_path, searchComponents=True): """ Recursively searches within a specified path for XML files named ".content.xml" that meet certain criteria. If searchComponents is True, it looks for components (jcr:primaryType="cq:Component") with a componentGroup other than ".hidden". If searchComponents is False, it looks for templates (jcr:primaryType="cq:Template"). Prints a CSV row for each file that meets the criteria, with the file path and the value of the jcr:title property. Args: search_path (str): The path in which to search for XML files. searchComponents (bool): If True, searches for components; if False, searches for templates. """ # Check if the passed path is a valid directory if not os.path.isdir(search_path): print(f"Error: The path '{search_path}' is not a valid directory.") return csv_data = [] for root, _, files in os.walk(search_path): for file in files: if file == ".content.xml": file_path = os.path.join(root, file) # Skip files in header or footer folders if 'header' in file_path or 'footer' in file_path: continue try: # Extract the resource type from the file path resource_type = file_path.split('/apps/')[1].replace('/.content.xml', '') tree = ET.parse(file_path) root_element = tree.getroot() # Discover namespaces ns = {} for event, elem in ET.iterparse(file_path, events=("start-ns",)): if event == "start-ns": ns[elem[0]] = elem[1] # Check if jcr and cq namespaces are present, or set default namespaces to avoid errors jcr_ns = ns.get("jcr", "http://www.jcp.org/jcr/1.0") cq_ns = ns.get("cq", "http://www.day.com/jcr/cq/1.0") # Get the values of the attributes primary_type_element = root_element.attrib.get(f"{{{jcr_ns}}}primaryType") component_group_element = root_element.attrib.get(f"{{{cq_ns}}}componentGroup") jcr_title_element = root_element.attrib.get(f"{{{jcr_ns}}}title") if jcr_title_element is None: continue if searchComponents and primary_type_element == "cq:Component": # Check if the componentGroup is not ".hidden" or if it is not present if component_group_element is None or (component_group_element != ".hidden"): title = jcr_title_element if jcr_title_element is not None else "" csv_data.append((resource_type, title)) else: continue elif primary_type_element == "cq:Template": csv_data.append((resource_type, jcr_title_element)) except ET.ParseError as e: print(f"Error parsing XML file '{file_path}': {e}") except Exception as e: print(f"Error handling file '{file_path}': {e}") # Print the results in CSV format if csv_data: print("Title,Resource Type") for row in csv_data: print(f"{row[1]},{row[0]}") else: print("No files matching the criteria found") if __name__ == "__main__": print("Components CSV:") for search_directory in [ '~/aem-core-cif-components/ui.apps/src/main/content/jcr_root/apps/core/cif' ]: find_xml_components(search_directory, True) print("Templates CSV:") for search_directory in [ '~/aem-core-wcm-components/content/src/content/jcr_root/apps/core' ]: find_xml_components(search_directory, False)