How to get innerHTML of whole page in selenium java driver?

Last Updated : 30 Sep, 2024

When automating web applications with Selenium WebDriver in Java, it's often necessary to retrieve the entire HTML content of a webpage. This can be useful for testing purposes, data extraction, or validating the structure of the page.

Selenium WebDriver provides a straightforward way to access the innerHTML of the entire page using Java. By fetching the HTML content, you can analyze or manipulate it programmatically.

Prerequisite

We will be required 3 main things:

Dependencies for Selenium

We will be required to have dependencies for selenium, for that, we will add dependencies in the XML file.

pom.xml

XML

<dependencies>
  <!-- Selenium Java Dependency -->
  <dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.21.0</version>
  </dependency>
</dependencies>

Example

Index.html

HTML

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Sample Page</title>
</head>
<body>
    <h1>Welcome to My Website</h1>
    <p>This is a simple paragraph on my web page.</p>
    <div>
        <h2>Section 1</h2>
        <p>This is some content in section 1.</p>
    </div>
    <div>
        <h2>Section 2</h2>
        <p>This is some content in section 2.</p>
    </div>
</body>
</html>

Application.java

Java

import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class Application {
    public static void main(String[] args) {
        // Set the path to your WebDriver executable
        System.setProperty("webdriver.chrome.driver", "path_to_chromedriver");

        // Initialize ChromeDriver
        WebDriver driver = new ChromeDriver();

        try {
            // Load the local HTML file
            driver.get("file:///path_to_your_html_file/index.html");

            // Option 1: Get the entire page source
            String pageSource = driver.getPageSource();
            System.out.println("Page Source:");
            System.out.println(pageSource);

            // Option 2: Get innerHTML of the body using JavaScriptExecutor
            JavascriptExecutor js = (JavascriptExecutor) driver;
            String bodyInnerHTML = (String) js.executeScript("return document.body.innerHTML;");
            System.out.println("\nBody InnerHTML:");
            System.out.println(bodyInnerHTML);

        } finally {
            // Close the browser
            driver.quit();
        }
    }
}

Output

GetInnerHTMLExample.java

Java

import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class GetInnerHTMLExample {
    public static void main(String[] args) {
        // Set the path to your WebDriver executable
        System.setProperty("webdriver.chrome.driver", "path_to_chromedriver");

        // Initialize ChromeDriver
        WebDriver driver = new ChromeDriver();

        try {
            // Load the local HTML file
            driver.get("https://www.geeksforgeeks.org");

            // Option 1: Get the entire page source
            String pageSource = driver.getPageSource();
            System.out.println("Page Source:");
            System.out.println(pageSource);

            // Option 2: Get innerHTML of the body using JavaScriptExecutor
            JavascriptExecutor js = (JavascriptExecutor) driver;
            String bodyInnerHTML = (String) js.executeScript("return document.body.innerHTML;");
            System.out.println("\nBody InnerHTML:");
            System.out.println(bodyInnerHTML);

        } finally {
            // Close the browser
            driver.quit();
        }
    }
}

Output

Conclusion

Retrieving the innerHTML of the whole page in Selenium Java is a simple and effective method for accessing the complete HTML structure. By utilizing JavaScriptExecutor in Selenium, you can easily extract the HTML content of the webpage. This is particularly useful for debugging or validating the HTML code during test automation. Using Selenium WebDriver allows you to automate this process efficiently across different browsers.