Unable to Prevent LinkChecking on Servlet Request

luke_grover

14-05-2018

I have a simple servlet that should return some commerce data. I have simplified the example here... but the issue is that if there is a link in any of the properties then I get invalid JSON and the link checker filter appears to altering the response even though I have disabled it via configuration.

package com.example.issue;

import org.apache.sling.api.SlingHttpServletRequest;

import org.apache.sling.api.SlingHttpServletResponse;

import org.apache.sling.api.resource.Resource;

import org.apache.sling.api.resource.ResourceResolver;

import org.apache.sling.api.resource.ValueMap;

import org.apache.sling.api.servlets.ServletResolverConstants;

import org.apache.sling.api.servlets.SlingAllMethodsServlet;

import org.json.JSONException;

import org.json.JSONObject;

import org.osgi.service.component.annotations.Component;

import org.slf4j.Logger;

import org.slf4j.LoggerFactory;

import javax.servlet.Servlet;

import javax.servlet.ServletException;

import java.io.IOException;

@Component(

    service= Servlet.class,

    property = {

            SimpleCommerceServlet.RESOURCE_TYPE_DEFAULT,

            SimpleCommerceServlet.SELECTOR,

            SimpleCommerceServlet.METHOD_GET,

            SimpleCommerceServlet.EXTENSION_JSON

    }

)

public class SimpleCommerceServlet extends SlingAllMethodsServlet {

    private static final long serialVersionUID = 1647028361800528653L;

    private static final Logger LOGGER = LoggerFactory.getLogger(SimpleCommerceServlet.class);

    public static final String RESOURCE_TYPE_DEFAULT = ServletResolverConstants.SLING_SERVLET_RESOURCE_TYPES + "=" + ServletResolverConstants.DEFAULT_RESOURCE_TYPE;

    public static final String EXTENSION_JSON = ServletResolverConstants.SLING_SERVLET_EXTENSIONS+"=json";

    public static final String METHOD_GET = ServletResolverConstants.SLING_SERVLET_METHODS + "=GET";

    public static final String SELECTOR = ServletResolverConstants.SLING_SERVLET_SELECTORS +"=getCommerceDetails";

    private ResourceResolver resourceResolver;

    @Override

    protected void doGet(SlingHttpServletRequest request, SlingHttpServletResponse response) throws ServletException, IOException {

        try {

            resourceResolver = request.getResourceResolver();

            Resource resource = resourceResolver.getResource("/etc/commerce/products/we-retail/custom/sample_product");

            JSONObject jsonObject = new JSONObject();

            ValueMap vm = resource.adaptTo(ValueMap.class);

            jsonObject.put("Title", vm.get("jcr:title"));

            jsonObject.put("Summary", vm.get("summary"));

            response.setHeader("Content-Type", "application/json");

            response.setCharacterEncoding("UTF-8");

            response.getWriter().write(jsonObject.toString());

        } catch (JSONException  e) {

            LOGGER.error("Failed to get and process JSON");

        } finally {

            resourceResolver.close();

        }

    }

}

I have created a new product at /etc/commerce/products/we-retail/custom/sample_product with properties

jcr:title = Sample Product

summary =

<p>This is a summary but it also has a <a title="Google" href="https://www" target="_blank">link</a>.</p>

Calling the Servlet like:

http://localhost:8080/bin/commerce.getCommerceDetails.details.json

Results in invalid json

{"Title":"Sample Product","Summary":"<p>This is a summary but it also has a <img src="/libs/cq/linkchecker/resources/linkcheck_o.gif" alt="invalid link: Google\\" title="invalid link: Google\\" border="0">link<\/a>.<\/p>\r\n"}

Note the insertion of the linkcheck gif!

I have tried using LinkCheckerSetting to ignore Internal and External.

I have disabled link checking and rewriting via com.day.cq.rewriter.linkchecker.impl.LinkCheckerTransformerFactory

I have changed the Link Check Override to ^. in com.day.cq.rewriter.linkchecker.impl.LinkCheckerImpl

Any other ideas on how to prevent the Link Checker from changing the response of this Servlet?

I'm running AEM 6.3 SP1

Accepted Solutions (1)

Accepted Solutions (1)

luke_grover

16-05-2018

So thanks to Jörg Hoh​'s comment, I added a new sling rewriter configuration for this case as it seems the default rewriter configuration was being used which tries to rewrite the html tags in the json response.

So I ended up with this:

<?xml version="1.0" encoding="UTF-8"?>

<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"

          jcr:primaryType="nt:unstructured"

          contentTypes="[application/json,text/html]"

          enabled="{Boolean}true"

          generatorType="htmlparser"

          order="{Long}1"

          selectors="[getCommerceDetails]"

          serializerType="htmlwriter"

          paths="[/bin/commerce]"

          transformerTypes="[]">

    <generator-htmlparser

      jcr:primaryType="nt:unstructured"

      includeTags="[NOT_A_REAL_TAG]"/>

</jcr:root>

So, a few takeaways from this...

  1. If utilizing the htmlparser like this, you must specify a value for the includeTags otherwise, the configuration does nothing.
  2. I specified application/json in the contentTypes but the content type always seems to be text/html
  3. I didn't need to use selectors and paths but I wanted to be really specific on this, so I can see working vs failing if I change the path slightly. The servlet is based on selector so just having selector would probably be the best solution to match the servlet.
  4. Not having transformers breaks the UI of /system/console/status-slingrewriter ... it doesn't seem to know how to handle an empty list. So the question is where else does this cause problems that I haven't seen yet?!? I've thought about adding a non-transforming transformer just to satisfy the need for a transformer so I can assure myself and others that I'm not breaking anything outside of my set path/selector combo.

Answers (8)

Answers (8)

Jörg_Hoh

Employee

14-05-2018

JSON requests are not rewritten by default. How is the rewriter configured on your system? Have you enabled it for json? In the webconsole you can check all rewrites at localhost:4502/system/console/status-slingrewriter

Jörg

cqsapientu69896

29-01-2019

Thanks Luke, this helped . Just putting it here if anyone else tries this - the rewrite configuration sometimes works after restarting AEM instance.

Jörg_Hoh

Employee

16-05-2018

Hi Luke,

thanks for letting us know how you resolved your issue. Regarding the finding 4 (not having a rewriter breaks the console) I would ask you to raise a ticket at the Sling Jira. That should not happen 🙂

Regards,

Jörg

luke_grover

15-05-2018

So are you suggesting that we need to have a custom sling rewriter to resolve? There isn't a rewriter for json but it does appear to get picked up by the default at the end of the list.

Here is the response from the link /system/console/status-slingrewriter ... these are just the OOTB rewriters as the instance doesn't have custom ones.

Current Apache Sling Rewriter Configuration

=================================================================

Active Configurations

-----------------------------------------------------------------

Configuration hybrid-app

Name : hybrid-app

Content Types : [text/html]

Paths : [/content/phonegap, /content/mobileapps, /content/campaigns]

Order : 1001

Active : true

Valid : true

Process Error Response : true

Pipeline :

    Generator :

        htmlparser : {includeTags=[Ljava.lang.String;@1f9494e7}

    Transformers :

        linkchecker

        contentsync : {component-optional=true}

        hybridapp : {component-optional=true}

        mobileappscampaign : {component-optional=true}

    Serializer :

        htmlwriter

Resource path: /libs/mobileapps/config/rewriter/hybrid-app

Configuration campaign-link-rewrite

Name : campaign-link-rewrite

Content Types : [text/html]

Resource Types : [mcm/campaign/components/newsletter, mcm/neolane/components/newsletter, mcm/campaign/components/campaign_newsletterpage]

Order : 1000

Active : true

Valid : true

Process Error Response : true

Pipeline :

    Generator :

        htmlparser : {includeTags=[Ljava.lang.String;@3479b3d5}

    Transformers :

        campaign-link-rewrite : {component-optional=true}

    Serializer :

        htmlwriter

Resource path: /libs/mcm/config/rewriter/campaign-link-rewrite

Configuration cfm

Name : cfm

Content Types : [text/html]

Resource Types : [dam/cfm/components/contentfragment]

Selectors : [rawcontent]

Order : 1000

Active : true

Valid : true

Process Error Response : true

Pipeline :

    Generator :

        html-generator

    Transformers :

        cfm-payload

        cfm-parfilter : {component-optional=false}

        cfm-assetprocessor : {component-optional=false}

    Serializer :

        htmlwriter

Resource path: /libs/dam/config/rewriter/cfm

Configuration pdf

Name : pdf

Extensions : [pdf]

Order : 0

Active : true

Valid : true

Process Error Response : false

Pipeline :

    Generator :

        empty-generator

    Transformers :

        htmlparser

        xslt : {source=sling://libs/wcm/core/content/pdf/page2fo.xsl}

    Serializer :

        fop : {mime-type=application/pdf}

Resource path: /libs/cq/config/rewriter/pdf

Configuration default

Name : default

Content Types : [text/html]

Order : -1

Active : true

Valid : true

Process Error Response : true

Pipeline :

    Generator :

        htmlparser

    Transformers :

        linkchecker

        mobile : {component-optional=true}

        mobiledebug : {component-optional=true}

        contentsync : {component-optional=true}

    Serializer :

        htmlwriter

Resource path: /libs/cq/config/rewriter/default

So, if remove the transformers from default I still get an invalid response

{"Title":"Sample Product","Summary":"<p>This is a summary but it also has a <a title="\" href="Google\\" target="https://www\\">link<\/a>.<\/p>\r\n"}