Hi,
We have numerous pages under the directory "/content/products/zh-cn/construction-equipment/service".
Some examples include:
/content/products/zh-cn/construction-equipment/service/parts-and-service-for-cobra-petrol-breakers/rodsandbits
/content/products/zh-cn/construction-equipment/service/parts-and-service-for-pneumatic-rock-drills/rodsandbits
It has come to our attention that some of these pages contain English content, despite having their language setting configured as Chinese.
Can anyone help to get list of pages with english content.
Solved! Go to Solution.
Views
Replies
Total Likes
There is nothing available OOTB to validate this. But, if image paths also get localized, but are EN on zh-cn content, then maybe you can check for non-en references of EN images.
Sample code is available here: https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-to-find-the-list-of-al...
Or if there is any metadata that can help (like tags/language), it might be good to check
There is nothing available OOTB to validate this. But, if image paths also get localized, but are EN on zh-cn content, then maybe you can check for non-en references of EN images.
Sample code is available here: https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/how-to-find-the-list-of-al...
Or if there is any metadata that can help (like tags/language), it might be good to check
If she wants to get the list of English pages available under a specific path, can't she use Query Builder with fulltext search? I understand there are limitations for using fulltext. However, if there's a specific property that is set for Chinese only or English, maybe that could be helpful.
I may be wrong, but I'm open to suggestions.
Hi @Vani1012 ,
Sure, I can help you with a Java program to identify pages with English content. Here’s how you can do it:
Below is a Java program that demonstrates this approach:
You'll need the following libraries:
Add these dependencies to your pom.xml if you're using Maven:
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.3</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-langdetect</artifactId>
<version>2.1.0</version>
</dependency>
</dependencies>
Here's the Java code to fetch pages, detect the language, and list pages with English content:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.apache.tika.language.detect.LanguageDetector;
import org.apache.tika.language.detect.LanguageResult;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class EnglishContentDetector {
public static void main(String[] args) throws IOException {
String baseUrl = "https://www.example.com";
List<String> urls = List.of(
"/content/products/zh-cn/construction-equipment/service/parts-and-service-for-cobra-petrol-breakers/rodsandbits",
"/content/products/zh-cn/construction-equipment/service/parts-and-service-for-pneumatic-rock-drills/rodsandbits"
// Add more URLs as needed
);
List<String> englishPages = new ArrayList<>();
LanguageDetector detector = LanguageDetector.getDefaultLanguageDetector().loadModels();
for (String url : urls) {
String fullUrl = baseUrl + url;
Document doc = Jsoup.connect(fullUrl).get();
String text = doc.body().text();
LanguageResult result = detector.detect(text);
if ("en".equals(result.getLanguage())) {
englishPages.add(fullUrl);
}
}
System.out.println("Pages with English content:");
for (String page : englishPages) {
System.out.println(page);
}
}
}
This approach should help you identify pages with English content in the specified directory.
Views
Replies
Total Likes
Thanks ChatGPT, I've been following you, and this is considered spamming, I am reporting you.
You should have a word with the administrators, this is getting out of hand. It's just very disrespectful to the community... You should really stop.. Spamming is not good, and also shows us that you are not a professional because anyone can use chat GPT to post questions on the query.
Views
Replies
Total Likes