Expand my Community achievements bar.

SOLVED

AEM- Extract Text from page for translation and Workflow

Avatar

Level 10

Hello,

Is there a way to extract text (Texts) from a page say from a filter sling:resourceType as foundation/components/text, OOTB? If not any pointers how i could achieve the same? Could i have a workflow to be triggered for extracting texts from all the text components on the page.

Once done i want to have an XLIFF generated for translation. i have managed to write a code for the XLIFF Creation but how do i trigger it on the desired page on the site via a workflow.

Please help,

Regards,

Nicole

1 Accepted Solution

Avatar

Correct answer by
Level 7

No worries, I've been writing a few examples how to use the QueryBuilder api but I can surely write another small example here.

First of we want to state which page we are wanting to search on like following:

String searchPath = "/conent/mysite/en/category/mypage";

this could also be sent to the code through some other way… but just say that we have somehow defined the path to the page we are investigating.

The next step would be to set-up the query like this:

//some code …. HashMap<String, String> map = new HashMap<String, String>(); map.put("path", searchPath); map.put("type", "nt:unstructured"); //Here we specify which property we are looking for on the node = "sling:resourceType" map.put("property", "sling:resourceType"); //Here we specify which property value for resourceType you are looking for = "foundation/components/text" map.put("property.value", "foundation/components/text"); //Here we can have a limit on the number of text components we are looking for -1 = unlimited map.put("p.limit", Integer.toString(this.paginateAfter)); // same as query.setHitsPerPage(20) below //Initate the querybuilder QueryBuilder builder = request.getResourceResolver().adaptTo(QueryBuilder.class); Session session = request.getResourceResolver().adaptTo(Session.class); //Create the query with the new search predicate group we made above that define our search criteria Query query = builder.createQuery(PredicateGroup.create(map), session); //Get the query result SearchResult result = query.getResult(); //Get the hits List<Hit> hits = result.getHits(); //Iterate over all the hits (where a hit represent a text component hit) for (Hit hit: hits) { Resource textResource = hit.getResource(); //Here we extract the text component resource from the hit ValueMap textProperties = textResource.adaptTo(ValueMap.class); //Here we fetch all the properties from that resource String textText = textProperties.get("text", String.class); //Here we fetch the text that has been entered into the text property //Do something with that text, in this case some transformation that you already have implementet } //More code...

Hopefully that will get you started, good luck :)


/Johan

View solution in original post

10 Replies

Avatar

Level 7

Hi,
I'm not really sure what it is you want to do here. Could you try to explain a bit more what the purpose is.

Basic text extraction from text components on a page
There are quite a few ways for extracting all the text components from a page. This could easily be done via the e.g. the Query api. What you could do then is to just set the page as the search root for the query and then you would do a query for all the foundation/component/text components. Then you simply iterate over the result and extract their text content and apply the XLIFF logic to them.

This extraction could then be triggered in a number of ways. Either via some recurring tasks (like a service implementing the runnable interface) that runs on set times and takes care of this process. Or you could use it in a workflow if you prefer that (http://dev.day.com/docs/en/cq/5-4/developing/developing_workflows.html). Then as a payload for that workflow you can choose whichever page you like.


Hope that helped you in some way.

Regards
Johan
 

Avatar

Level 10

Thanks Johan,

Now this is what i am looking forward to do, would be glad if you could guide me through. Seems complex to me

I am looking for translating text on a page or on overall site. So heres what am trying to do.

1. Trying to find help to identify text either page level or site level,

2. Once identified am supposed to generate an XLIFF for translation to be sent to a vendor.

3. I then would have to parse the translated text and dump it back into respective places on the page.

I guess to identify text on page level as you mentioned Query builder api will do, since am new to it, could you guide me with query to extract text from text component and text image components with resource types on a page? I am hoping to emit any formatting tags, when i retrieve it.

I am planning to tackle 3 once i achieve [2], please let me know if this makes any sense to you.

Regards,

Nicole

Avatar

Correct answer by
Level 7

No worries, I've been writing a few examples how to use the QueryBuilder api but I can surely write another small example here.

First of we want to state which page we are wanting to search on like following:

String searchPath = "/conent/mysite/en/category/mypage";

this could also be sent to the code through some other way… but just say that we have somehow defined the path to the page we are investigating.

The next step would be to set-up the query like this:

//some code …. HashMap<String, String> map = new HashMap<String, String>(); map.put("path", searchPath); map.put("type", "nt:unstructured"); //Here we specify which property we are looking for on the node = "sling:resourceType" map.put("property", "sling:resourceType"); //Here we specify which property value for resourceType you are looking for = "foundation/components/text" map.put("property.value", "foundation/components/text"); //Here we can have a limit on the number of text components we are looking for -1 = unlimited map.put("p.limit", Integer.toString(this.paginateAfter)); // same as query.setHitsPerPage(20) below //Initate the querybuilder QueryBuilder builder = request.getResourceResolver().adaptTo(QueryBuilder.class); Session session = request.getResourceResolver().adaptTo(Session.class); //Create the query with the new search predicate group we made above that define our search criteria Query query = builder.createQuery(PredicateGroup.create(map), session); //Get the query result SearchResult result = query.getResult(); //Get the hits List<Hit> hits = result.getHits(); //Iterate over all the hits (where a hit represent a text component hit) for (Hit hit: hits) { Resource textResource = hit.getResource(); //Here we extract the text component resource from the hit ValueMap textProperties = textResource.adaptTo(ValueMap.class); //Here we fetch all the properties from that resource String textText = textProperties.get("text", String.class); //Here we fetch the text that has been entered into the text property //Do something with that text, in this case some transformation that you already have implementet } //More code...

Hopefully that will get you started, good luck :)


/Johan

Avatar

Level 10

Hello Johan,

I am running into issues with having a method (that makes use of QueryBuilder API, a code like above in a different class), invoked within the execute method of a workflow.

When i do that i would need to pass session object, now i have a workflow session in place and hence i tried the two methods,

Session sess=null;
Repository repo =JcrUtils.getRepository("http://localhost:4502/crx/server");
sess= repo.login(new SimpleCredentials("admin", "admin".toCharArray()));

i get a message which gets caught in the catch clause as 
"In catch Unable to access a repository with the following settings:
org.apache.jackrabbit.repository.uri: http://localhost:4502/crx/server
The following RepositoryFactory classes were consulted:
Perhaps the repository you are trying to access is not available at the moment."

If i use the below code i get a Null pointer exception at the line indicated below.
final Session session = wfSession.getSession();

<<NPE>>Query query = builder.createQuery(PredicateGroup.create(map),session);

Please let me know best way to retrieve session and pass it over to the method .

    public void execute(WorkItem item, WorkflowSession wfSession, MetaDataMap args)
            throws WorkflowException {

SearchService sr = new QueryBuilderSearch();
        Session session=null;
        try {

//Repository repo =JcrUtils.getRepository("http://localhost:4502/crx/server");
//session= repo.login(new SimpleCredentials("admin", "admin".toCharArray()));

             session = wfSession.getSession();

            sr.SearchCQForContent(session); // Method inside QueryBuilderSearch  Impl Class

            System.out.println("Success !!");
        } catch (RepositoryException e) {
            // TODO Auto-generated catch block
            System.out.println(e.getMessage());
            e.printStackTrace();
        }

Avatar

Level 8

The real challenge in these scenarios isn't extracting the text - the real challenge is that you need not just extract the text, but you need extract it in a manner that allows you to effective reinsert the content into the page effectively. 

The other challenge is that you have to develop business rules to understand which values are translatable and which aren't. For example you can't just assume that all text within a component subject to translation. For example anything that is controlled by a selection widget of some kind (drop down, checkbox, radio buttons, etc) you shouldn't translate - and you probably can't just query for certain components. 

Generally I would think that you do this during a workflow step for a particular page - generally you are translating pages and not whole web sites. I'd think you be better off iterating over all the resourcesunder a page (because you will generally want to translate things like the title in addition to text displayed on the page, and things like the description which show up in the head of the page). You then you need to have a set of business rules that specify per resource type which properties are subject to translation and which are and then output those into some sort of XML - the XML will then needs to have some attribute or structure that helps you make the linkage back to that properties original location in the repository. You'll probably need to store this in attributes that map to the XLIFF external file href attribute and the trans-unit id attribute so that when you get the XLIFF back you can effective map back to the right locations in the repository. 

When you get the translated content back you'll need some workflow process potentially to ingest the translated content again. 

Avatar

Level 10

Thank you Johan, am working on it, I will try to make it work and would need your help to make that a success, will re post if any questions.

Thanks,

Avatar

Level 7

Hi, regarding session.
In your class that implements the WorkflowProcess the correct way would be to initiate the session like this (which you have already done):
 

public class MyProcess implements WorkflowProcess { //....code @Override public void execute(WorkItem workItem, WorkflowSession workflowSession, MetaDataMap args) throws WorkflowException { try { //..more code Session session = workflowSession.getSession(); //Get the session here //.. }catch (RepositoryException e) { //handle errors.. } } //.. }

There is also an alternative in classes that don't use the workflow. Consider the following component that implements the EventListener:

@Component public class MySpecialEventlistener implements EventListener { @Reference private SlingRepository repository; private ObservationManager om; private Session session; protected void activate(ComponentContext context) throws Exception { session = repository.loginAdministrative(null); om = session.getWorkspace().getObservationManager(); observationManager.addEventListener(this, Event.PROPERTY_CHANGED, "/", true, null, null, true); LOGGER.info("Allright, we activated a property change listener!"); } //... }

Here we use a whole different system when getting the session and the two examples are just for reference.

In your case, can you have the method that performs the query in the same workflow implementing class? It looks like you are using a few different methods in you classes. If you simply use the workflow session there and handle it in the same class I'm quite sure it wont be any problem. If you do pass the session object to another class just make sure that you don't start of by setting it to null in the class as it looks like at the moment.

/Johan

Avatar

Level 10

Thanks Oratas,

Will bear in mind,.It was quite informative. Appreciate it.

Avatar

Level 10

Thanks Johan, worked like a charm, avoided passing it to a different class method and it worked. Thanks again:)

I am now trying to trigger this workflow on rollout (MSM), wondering how i could achieve this ..any clue?

Also was looking for an approach to only translate/pick nodes that have been modified in past two days, is it possible using query builder or do you suggest we go about another approach?

Really appreciate all the help.:) I will also have to work for the backward flow, of putting back translated text into AEM, hence probably will keep the thread open, for your valuable inputs.

Regards,

Nicole

Avatar

Level 10

Managed to get the nodes last modified and generated the XLIFF, now wondering how i could trigger it on rollout.