extract all url using php

How to extract all URL from website in PHP

Welcome to another PHP tutorial post. in our previous post we learn how to Get Domain Name from URL in PHP
today in this post we will learn how to extract all URL from a website with the help of predefined PHP functions like file_get_contents.

extract all url using php

so before start to fetching all URL of a website or a webpage we will take an short overview of DOMDocument.

What is DOMDocument in PHP

The PHP DOMDocument library allows us to manage HTML and XML pages after uploading our script in new DOMDocument.
so basically it is a best option to reading and changing into a HTML and XML files with the help of DOM library.

Advertisement

file_get_contents for reading

as we all know that file_get_contents is a predefined PHP functions which is helpful for read a content of a file into a string format.
So first of all we are creating a variable with the name of $html which will store all the string values which will converted by file_get_contents functions of PHP.

after putting the values in $html variable we are simple create a new DOM document which will Parse the HTML. The @ is used to suppress any parsing errors, that will be thrown if the $html string isn’t valid HTML.

by using DOMDocument PHP function getElementsByTagName() we will get a new instance of class DOMNodeList which contains all the elements of local tag name.
so we are passing anchor tag in side getElementsByTagName().

so in the below code we are putting our URL to fetch all the links which are inside the anchor (a) tag.

Also Read
How to Install PHP on CentOS.

How to Send Attachment on mail using PHP.

PHP Login Script With Remember me.
Unable to create a directory a wordpress error

How to integrate Razorpay Payment Gateway using PHP.
Change password using javascript, php and mysqli.
Password and Confirm Password Validation Using JavaScript

Check Email is Already Registered in Database using Ajax and JavaScript.
How to hide extension of html and php file.?

Comments are closed.