Get info from any web service or page

Oscar Otero

Last update: Jan 1, 2023

Related tags

Overview

Embed

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc).

Requirements:

PHP 7.4+
Curl library installed
PSR-17 implementation. By default these libraries are detected automatically:
- laminas/laminas-diactoros
- guzzle/psr7 (Only the unreleased version 2.x, installed as dev-master)
- nyholm/psr7
- sunrise/http-message

If you need PHP 5.5-7.3 support, use the 3.x version

Online demo

http://oscarotero.com/embed/demo

Installation

This package is installable and autoloadable via Composer as embed/embed.

$ composer require embed/embed

Usage

use Embed\Embed;

$embed = new Embed();

//Load any url:
$info = $embed->get('https://www.youtube.com/watch?v=PP1xn5wHtxE');

//Get content info

$info->title; //The page title
$info->description; //The page description
$info->url; //The canonical url
$info->keywords; //The page keywords

$info->image; //The thumbnail or main image

$info->code->html; //The code to embed the image, video, etc
$info->code->width; //The exact width of the embed code (if exists)
$info->code->height; //The exact height of the embed code (if exists)
$info->code->aspectRatio; //The aspect ratio (width/height)

$info->authorName; //The resource author
$info->authorUrl; //The author url

$info->cms; //The cms used
$info->language; //The language of the page
$info->languages; //The alternative languages

$info->providerName; //The provider name of the page (Youtube, Twitter, Instagram, etc)
$info->providerUrl; //The provider url
$info->icon; //The big icon of the site
$info->favicon; //The favicon of the site (an .ico file or a png with up to 32x32px)

$info->publishedTime; //The published time of the resource
$info->license; //The license url of the resource
$info->feeds; //The RSS/Atom feeds

Parallel multiple requests

use Embed\Embed;

$embed = new Embed();

//Load multiple urls asynchronously:
$infos = $embed->getMulti(
    'https://www.youtube.com/watch?v=PP1xn5wHtxE',
    'https://twitter.com/carlosmeixidefl/status/1230894146220625933',
    'https://en.wikipedia.org/wiki/Tordoia',
);

foreach ($infos as $info) {
    echo $info->title;
}

Document

The document is the object that store the html code of the page. You can use it to extract extra info from the html code:

//Get the document object
$document = $info->getDocument();

$document->link('image_src'); //Returns the href of a <link>
$document->getDocument(); //Returns the DOMDocument instance
$html = (string) $document; //Returns the html code

$document->select('.//h1'); //Search

You can perform xpath queries in order to select specific elements. A search always return an instance of a Embed\QueryResult:

//Search the A elements
$result = $document->select('.//a');

//Filter the results
$result->filter(fn ($node) => $node->getAttribute('href'));

$id = $result->str('id'); //Return the id of the first result as string
$text = $result->str(); //Return the content of the first result

$ids = $result->strAll('id'); //Return an array with the ids of all results as string
$texts = $result->strAll(); //Return an array with the content of all results as string

$tabindex = $result->int('tabindex'); //Return the tabindex attribute of the first result as integer
$number = $result->int(); //Return the content of the first result as integer

$href = $result->url('href'); //Return the href attribute of the first result as url (converts relative urls to absolutes)
$url = $result->url(); //Return the content of the first result as url

$node = $result->node(); //Return the first node found (DOMElement)
$nodes = $result->nodes(); //Return all nodes found

Metas

For convenience, the object Metas stores the value of all <meta> elements located in the html, so you can get the values easier. The key of every meta is get from the name, property or itemprop attributes and the value is get from content.

//Get the Metas object
$metas = $info->getMetas();

$metas->all(); //Return all values
$metas->get('og:title'); //Return a key value
$metas->str('og:title'); //Return the value as string (remove html tags)
$metas->html('og:description'); //Return the value as html
$metas->int('og:video:width'); //Return the value as integer
$metas->url('og:url'); //Return the value as full url (converts relative urls to absolutes)

OEmbed

In addition to the html and metas, this library uses oEmbed endpoints to get additional data. You can get this data as following:

//Get the oEmbed object
$oembed = $info->getOEmbed();

$oembed->all(); //Return all raw data
$oembed->get('title'); //Return a key value
$oembed->str('title'); //Return the value as string (remove html tags)
$oembed->html('html'); //Return the value as html
$oembed->int('width'); //Return the value as integer
$oembed->url('url'); //Return the value as full url (converts relative urls to absolutes)

Additional oEmbed parameters (like instagrams hidecaption) can also be provided:

$embed = new Embed();

$result = $embed->get('https://www.instagram.com/p/B_C0wheCa4V/');
$result->setSettings([
    'oembed:query_parameters' => ['hidecaption' => true]
]);
$oembed = $info->getOEmbed();

LinkedData

Another API available by default, used to extract info using the JsonLD schema.

//Get the linkedData object
$ld = $info->getLinkedData();

$ld->all(); //Return all data
$ld->get('name'); //Return a key value
$ld->str('name'); //Return the value as string (remove html tags)
$ld->html('description'); //Return the value as html
$ld->int('width'); //Return the value as integer
$ld->url('url'); //Return the value as full url (converts relative urls to absolutes)

Other APIs

Some sites like Wikipedia or Archive.org provide a custom API that is used to fetch more reliable data. You can get the API object with the method getApi() but note that not all results have this method. The Api object has the same methods than oEmbed:

//Get the API object
$api = $info->getApi();

$api->all(); //Return all raw data
$api->get('title'); //Return a key value
$api->str('title'); //Return the value as string (remove html tags)
$api->html('html'); //Return the value as html
$api->int('width'); //Return the value as integer
$api->url('url'); //Return the value as full url (converts relative urls to absolutes)

Extending Embed

Depending of your needs, you may want to extend this library with extra features or change the way it makes some operations.

PSR

Embed use some PSR standards to be the most interoperable possible:

PSR-7 Standard interfaces to represent http requests, responses and uris
PSR-17 Standard factories to create PSR-7 objects
PSR-18 Standard interface to send a http request and return a response

Embed comes with a CURL client compatible with PSR-18 but you need to install a PSR-7 / PSR-17 library. Here you can see a list of popular libraries and the library can detect automatically 'laminas\diactoros', 'guzzleHttp\psr7', 'slim\psr7', 'nyholm\psr7' and 'sunrise\http' (in this order). If you want to use a different PSR implementation, you can do it in this way:

use Embed\Embed;
use Embed\Http\Crawler;

$client = new CustomHttpClient();
$requestFactory = new CustomRequestFactory();
$uriFactory = new CustomUriFactory();

//The Crawler is responsible for perform http queries
$crawler = new Crawler($client, $requestFactory, $uriFactory);

//Create an embed instance passing the Crawler
$embed = new Embed($crawler);

Adapters

There are some sites with special needs: because they provide public APIs that allows to extract more info (like Wikipedia or Archive.org) or because we need to change how to extract the data in this particular site. For all that cases we have the adapters, that are classes extending the default classes to provide extra functionality.

Before creating an adapter, you need to understand how Embed work: when you execute this code, you get a Extractor class

//Get the Extractor with all info
$info = $embed->get($url);

//The extractor have document and oembed:
$document = $info->getDocument();
$oembed = $info->getOEmbed();

The Extractor class has many Detectors. Each detector is responsible to detect a specific piece of info. For example, there's a detector for the title, other for description, image, code, etc.

So, an adapter is basically an extractor created specifically for a site. It can contains also custom detectors or apis. If you see the src/Adapters folder you can see all adapters.

If you create an adapter, you need also register to Embed, so it knows in which website needs to use. To do that, there's the ExtractorFactory object, that is responsible for instantiate the right extractor for each site.

use Embed\Embed;

$embed = new Embed();

$factory = $embed->getExtractorFactory();

//Use this MySite adapter for mysite.com
$factory->addAdapter('mysite.com', MySite::class);

//Remove the adapter for pinterest.com, so it will use the default extractor
$factory->removeAdapter('pinterest.com');

//Change the default extractor
$factory->setDefault(CustomExtractor::class);

Detectors

Embed comes with several predefined detectors, but you may want to change or add more. Just create a class extending Embed\Detectors\Detector class and register it in the extractor factory. For example:

use Embed\Embed;
use Embed\Detectors\Detector;

class Robots extends Detector
{
    public function detect(): ?string
    {
        $response = $this->extractor->getResponse();
        $metas = $this->extractor->getMetas();

        return $response->getHeaderLine('x-robots-tag'),
            ?: $metas->str('robots');
    }
}

//Register the detector
$embed = new Embed();
$embed->getExtractorFactory()->addDetector('robots', Robots::class);

//Use it
$info = $embed->get('http://example.com');
$robots = $info->robots;

Settings

If you need to pass settings to the CurlClient to perform http queries:

use Embed\Embed;
use Embed\Http\Crawler;
use Embed\Http\CurlClient;

$client = new CurlClient();
$client->setSettings([
    'cookies_path' => $cookies_path,
    'ignored_errors' => [18]
]);

$embed = new Embed(new Crawler($client));

If you need to pass settings to your detectors, you can add settings to the ExtractorFactory:

use Embed\Embed;

$embed = new Embed();
$embed->setSettings([
    'oembed:query_parameters' => [],  //Extra parameters send to oembed
    'twitch:parent' => 'example.com', //Required to embed twitch videos as iframe
    'facebook:token' => '1234|5678',  //Required to embed content from Facebook
    'instagram:token' => '1234|5678', //Required to embed content from Instagram
]);
$info = $embed->get($url);

Note: The built-in detectors does not require settings. This feature is only for convenience if you create a specific detector that requires settings.

If this library is useful for you, say thanks buying me a beer 🍺 !

Comments

Giving false

When I try to fetch embed of a YouTube url, It gives false.

Here is my code:

    error_reporting(E_ALL);
    ini_set('display_errors', 1);

    $url = 'https://www.youtube.com/watch?v=kUylIQy-IgI';
    require_once 'inc/class/lib/Embed/src/autoloader.php';
    $embed = Embed\Embed::create($url);
    var_dump($embed);
    echo '<pre>'.print_r($embed,true).'</pre>';

Please help me with this.

opened by rohitkhatri 26

Images URLs blacklist
This PR is related to #54.

Here, the todo list:

[x] Remove empty URLs from images list

[x] Implement an imagesBlacklist options to filter images urls (see https://github.com/oscarotero/Embed/issues/54#issuecomment-85419821)

[x] Update the doc according to the new option

[x] Write some tests

Don't hesitate to review my code. :+1:
opened by soullivaneuh 13
Images blacklist

Hi, little feature idea.

What about images blacklist?

Some website always return their logo or whatever useless on open-graph.

Maybe can we have a blacklist solution to ignore some images url?

Thanks.

opened by soullivaneuh 13
Facebook API October 24th changes

Hi,

Is there a branch planned to test the upcoming change to Facebook/Intagram API endpoint ?

https://developers.facebook.com/docs/graph-api/changelog/version8.0#instagram https://developers.facebook.com/docs/graph-api/changelog/version8.0#social-plugins

thanks

opened by olivM 12
Adding Facebook?

I just discover that Facebook has their own oembed service. https://media.fb.com/2015/12/14/improved-embedding-tools-embedded-video-player-api-and-oembed-support/ Could be great adition

opened by onigetoc 12
Document vs Oembed / Instagram blocked

Hi,

I'm using v4 for resolving our embeds. Mainly (probably only) we are using embeds which does provide an oembed endpoint. So we've a big overhead by first getting the main document of that "url".

https://github.com/oscarotero/Embed/blob/master/src/Embed.php#L23-L24

I think it'll be a big benefit to first check if that url does provide an oembed endpoint (detectEndpointFromProviders) and if so to only query that specific api endpoint.

Is there any build in way that I'm missing?

Regards

opened by reflexxion 11
twitter embed stop working on v3

Recently Twitter has started returning 400 server error on Embed v3: https://oscarotero.com/embed3/demo/index.php?url=https%3A%2F%2Ftwitter.com%2Fdrupalassoc%2Fstatus%2F1217450156523692038

Whereas it still working as expected on v4: https://oscarotero.com/embed/demo/index.php?url=https%3A%2F%2Ftwitter.com%2Fdrupalassoc%2Fstatus%2F1217450156523692038

Any idea on what could have changed (either on Twitter side or on how Embed does thing differently in v4) so I can start digging in the right direction?

Much appreciated, and thanks for this awesome library!

opened by budikhafiKLY 10
Twitch provider

Hi! Thank you so much for this package and maintenance!

I was wondering if support for Twitch was on the horizon or if I should make a PR for the package to support it.

AFAIK, here's the oembed endpoint for it:

https://api.twitch.tv/v5/oembed?url=

And here's an example URL that you can inspect for the response: https://api.twitch.tv/v5/oembed?url=https%3A%2F%2Fwww.twitch.tv%2Friotgames%2Fv%2F72749628

Thanks!

opened by msurguy 10

Facebook event can't get data

Hi there I tried to get some public event data. but it return just title not image. 'https://www.facebook.com/events/1822069678114212/'

When I try to crawl that link. I can't get proper event data... such as main image. I can get only title.. Do I need some special permission? even thought this is a public event??

I also checked on your demo page : https://oscarotero.com/embed3/demo/index.php?url=https%3A%2F%2Fwww.facebook.com%2Fevents%2F1822069678114212%2F

but I can't get images, descriptions..

Here is my code

$config = [
                'adapter' => [
                    'config' => [
                        'minImageWidth' => 50,
                        'minImageHeight' => 50,
                    ]
                ],
                'providers' => [
                    'html' => [
                        'maxImages' => 3
                    ],
                    'facebook' => [
                        'key' => "XXXXXXXXX"
                    ]
                ],
                'resolver' => [
                    'config' => [
                        CURLOPT_CONNECTTIMEOUT => 5,
                        CURLOPT_TIMEOUT => 5,
                        CURLOPT_USERAGENT => 'My Crawler',
                        CURLOPT_MAXREDIRS => 3
                    ]
                ]
            ];
 $result = Embed::create($url, $config);

opened by tiger154 10

Facebook oEmbed does not work for company pages

I stumbled over a problem today, trying to embed this facebook post using the current master branch: https://www.facebook.com/TheIndependentOnline/posts/10153659888571636

If you look at the json+oembed link in the browser, the valid oembed url is returned: https://www.facebook.com/plugins/post/oembed.json/?url=https%3A%2F%2Fwww.facebook.com%2FTheIndependentOnline%2Fposts%2F10153659888571636

But if you try to fetch the same url as anonymous user (or via guzzle), facebook internally redirects you to https://www.facebook.com/TheIndependentOnline/ and the returned oEmbed-Link is wrong: https://www.facebook.com/plugins/page/oembed.json/?url=https%3A%2F%2Fwww.facebook.com%2FTheIndependentOnline%2F

To make things a bit more interesting, normal user posts are still embed-able.

So I'm not sure what the best approach to handle this problem is. I think either, facebook needs to stop this unpredictable behaviour (not sure if this will happen), or the facebook adapter needs to be re-implemented again.

opened by Nebel54 10
Extract youtube and vimeo ID

Hi, I need to store only youtube and vimeo id. I see that oembed don't provide this information, only full url. I can extract from url, but I guess if exist a way to do this with Embed lib. Maybe using request class? Thanks

opened by kamov 10
[Question] Vimeo with "Hide from Vimeo" setting but embeds allowed on specific domains

A client of mine uses some third party software to regularly upload video with the settings "Hide from Vimeo" and "Embed on specific domains".

This effectively means that https://vimeo.com/12345678 cannot be added, even though the oEmbed endpoint does return an oEmbed json (https://vimeo.com/api/oembed.json?url=https://vimeo.com/12345678).

As far as I can tell embed first makes a request to the original URL (https://vimeo.com/12345678) to see if it exists, and only after retrieves oEmbed info, from https://vimeo.com/api/oembed.json?url=https://vimeo.com/12345678.

Why does that first check happen? Would it not be enough to see if you get oEmbed data back from the endpoint?

I am sure I am missing something, but I would like to understand why, so I can see if there is any workaround possible.

opened by qrazi 0
Stopped working with SPOTIFY

Stopped working with Spotify.

Try this URL: https://open.spotify.com/artist/2HC52vHbLoRxuw5a12w7Oe

Does also not work on the demo: http://oscarotero.com/embed/demo

Any ideas?

opened by saschaende 1
charset 1251 problem

Hi. I encountered a problem with the encoding. Cyrillic does not display, how to collect data in the encoding 1251. tryed to use iconv, but Fatal error: Uncaught TypeError: iconv(): Argument https://github.com/oscarotero/Embed/issues/3 ($string) must be of type string, Embed\Extractor given

opened by sakirsa 6
Smart/Curly Quotes Problem (Plus Emojis)

I ran across this issue during my testing. I've already added a fix on my end for the quotes, but I figured I'd shed some light on it, perhaps this will help someone else out. This is happening with embed 3.x version... so maybe it isn't an issue in embed 4, but I can't test for that cause the demo page user agent isn't set to the facebook one, so Twitter doesn't return anything. Maybe the problem is Twitter, but I can't really reproduce this.

Anyway, here are some links, using the demo.php page, only adding the new user agent to the dispatcher:

https://api.parrycarry.com/test.php?url=https%3A%2F%2Ftwitter.com%2Fflexinja%2Fstatus%2F1597445652166103040

This one seems to make the smart quotes convert into вЂњ вЂќ and вЂ™, and I assume вЂ˜ is the other apostrophe version when Googling. I've already added a conversion function for this. Twitter descriptions by default all have smart quotes enclosing the actual description, so this should happen to all Tweets, but it doesn't.

https://api.parrycarry.com/test.php?url=https%3A%2F%2Ftwitter.com%2Fflexinja%2Fstatus%2F1595891254155427840

This is from the same person, using the same smart apostrophes, and it doesn't have this problem at all.

https://api.parrycarry.com/test.php?url=https%3A%2F%2Ftwitter.com%2FEevohhh%2Fstatus%2F1597470048033312771

Here is another Tweet from someone else, and it has the same problem again. It also converts the heart 💖 emoji into рџ’–. I haven't added a conversion for this, however, as I just found it.

https://api.parrycarry.com/test.php?url=https%3A%2F%2Ftwitter.com%2Fiamchrisjudge%2Fstatus%2F1597439017448243200

Which is strange cause here is another Tweet with emojis that don't do that.

So, I am not sure what could possible cause this to happen. Sometimes things get converted, and sometimes they don't.

opened by parrycarry 1
Test cannot retrieve page

I used the test page to test the URL: https://www.miragenews.com/chia-applauds-elevation-of-housing-to-federal-792438/

Is there any way around security check? Can we set a User Agent?

opened by whitsey 1
Twitter Pages

On the latest version 4 I cannot get the code for Twitter pages, for example: https://twitter.com/pepephone

Or please let me know how I can get the oembed timeline from Twitter using version 4?

opened by anghelpw 2

Releases(v4.4.7)

v4.4.7(Dec 12, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.4.6(Oct 2, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.4.5(Sep 6, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v3.4.18(Jul 15, 2022)
Fixed Embedly regression #491

Source code(tar.gz)
Source code(zip)
v4.4.4(Apr 13, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.4.3(Mar 13, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.4.2(Feb 13, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.4.1(Feb 6, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.4.0(Jan 8, 2022)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.3.5(Oct 10, 2021)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.3.4(Jun 22, 2021)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.3.3(Jun 22, 2021)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
3.4.17(May 30, 2021)

Fixed Iframely OEmbed adapter #449 Improvements in the Youtube cookie wall redirect #447 #446
Source code(tar.gz)
Source code(zip)
v3.4.16(May 7, 2021)

Fixed youtube url to cookie wall #441 #442
Source code(tar.gz)
Source code(zip)
v4.3.2(Apr 4, 2021)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v3.4.15(Apr 2, 2021)

Fixed youtu.be urls #436
Source code(tar.gz)
Source code(zip)
v3.4.14(Apr 1, 2021)

Fixed YouTube oembed returning consent.youtube.com link #434 #435
Source code(tar.gz)
Source code(zip)
v4.3.1(Mar 21, 2021)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v3.4.13(Dec 24, 2020)

Fixed Twitter error response #421 #423
Source code(tar.gz)
Source code(zip)
v3.4.12(Dec 16, 2020)

Fix youtube embed #418
Source code(tar.gz)
Source code(zip)
v3.4.11(Dec 15, 2020)

Fixed Instagram and Facebook #414 #416
Source code(tar.gz)
Source code(zip)
v3.4.10(Dec 1, 2020)
Added PHP 8 to composer.json

Added support for instagram tv #401

Source code(tar.gz)
Source code(zip)
v4.3.0(Nov 3, 2020)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v3.4.9(Oct 27, 2020)
Added support for PHP 8 #395

Fixed Facebook/Instagram oembed #398

Source code(tar.gz)
Source code(zip)
v4.2.7(Sep 23, 2020)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.2.6(Aug 28, 2020)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.2.5(Aug 1, 2020)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v4.2.4(Jul 6, 2020)

See CHANGELOG
Source code(tar.gz)
Source code(zip)
v3.4.8(Jul 3, 2020)

Fixed instagram urls using instagr.am domain #370 #357
Source code(tar.gz)
Source code(zip)
v3.4.7(Jun 29, 2020)

Fixed Twitter 429 status code error #369 #337
Source code(tar.gz)
Source code(zip)

Owner

Oscar Otero

Web designer and developer 🦄

GitHub http://oscarotero.com/embed/demo

This script scrapes the HTML from different web pages to get the information from the video and you can use it in your own video player.

XVideos PornHub RedTube API This script scrapes the HTML from different web pages to get the information from the video and you can use it in your own

57 Dec 16, 2022

Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses.

Blackfire Player Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services,

485 Dec 31, 2022

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.

0 Sep 14, 2021

On-Page SEO Crawler Tool with Interface

upzon I developed this project with PHP & MYSQL and python. If you have basic python and php knowledge, it is quite simple to use this program. I'm us

5 Oct 27, 2021

Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extrac

9.1k Jan 1, 2023

A configurable and extensible PHP web spider

Note on backwards compatibility break: since v0.5.0, Symfony EventDispatcher v3 is no longer supported and PHP Spider requires v4 or v5. If you are st

1.3k Dec 28, 2022

PHP Scraper - an highly opinionated web-interface for PHP

PHP Scraper An opinionated & limited way to scrape the web using PHP. The main goal is to get stuff done instead of getting distracted with xPath sele

327 Dec 30, 2022

A browser testing and web crawling library for PHP and Symfony

A browser testing and web scraping library for PHP and Symfony Panther is a convenient standalone library to scrape websites and to run end-to-end tes

2.7k Dec 31, 2022

Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extrac

9.1k Jan 4, 2023

The most integrated web scraper package for Laravel.

Laravel Scavenger The most integrated web scraper package for Laravel. Top Features Scavenger provides the following features and more out-the-box. Ea

134 Jan 4, 2023

A program to scrape online web-content (APIs, RSS Feeds, or Websites) and notify if search term was hit.

s3n Search-Scan-Save-Notify A program to scrape online web-content (APIs, RSS Feeds, or Websites) and notify if search term was hit. It is based on PH

11 Nov 8, 2022

Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web scraping and testing solution.

Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web scraping and testing solution. It can be used for a wide range of purposes, from extracting and indexing structured data to monitoring and automated testing. Available for PHP 7.3, 7.4, 8.0.

68 Dec 27, 2022

Library for Rapid (Web) Crawler and Scraper Development

Library for Rapid (Web) Crawler and Scraper Development This package provides kind of a framework and a lot of ready to use, so-called steps, that you

60 Nov 30, 2022

Roach is a complete web scraping toolkit for PHP

?? Roach A complete web scraping toolkit for PHP About Roach is a complete web scraping toolkit for PHP. It is heavily inspired (read: a shameless clo

1.1k Jan 3, 2023

Get info from any web service or page

Embed PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web

1.9k Jan 4, 2023

Damn Vulnerable Web Services is an insecure web application with multiple vulnerable web service components that can be used to learn real world web service vulnerabilities.

416 Dec 17, 2022

DiscordLookup | Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more...

DiscordLookup Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more... Website Getting Help Tool

69 Dec 23, 2022

Get info from any web service or page

Related tags

Overview

Embed

Online demo

Installation

Usage

Parallel multiple requests

Document

Metas

OEmbed

LinkedData

Other APIs

Extending Embed

PSR

Adapters

Detectors

Settings

Comments

Releases(v4.4.7)

v4.4.7(Dec 12, 2022)

v4.4.6(Oct 2, 2022)

v4.4.5(Sep 6, 2022)

v3.4.18(Jul 15, 2022)

v4.4.4(Apr 13, 2022)

v4.4.3(Mar 13, 2022)

v4.4.2(Feb 13, 2022)

v4.4.1(Feb 6, 2022)

v4.4.0(Jan 8, 2022)

v4.3.5(Oct 10, 2021)

v4.3.4(Jun 22, 2021)

v4.3.3(Jun 22, 2021)

3.4.17(May 30, 2021)

v3.4.16(May 7, 2021)

v4.3.2(Apr 4, 2021)

v3.4.15(Apr 2, 2021)

v3.4.14(Apr 1, 2021)

v4.3.1(Mar 21, 2021)

v3.4.13(Dec 24, 2020)

v3.4.12(Dec 16, 2020)

v3.4.11(Dec 15, 2020)

v3.4.10(Dec 1, 2020)

v4.3.0(Nov 3, 2020)

v3.4.9(Oct 27, 2020)

v4.2.7(Sep 23, 2020)

v4.2.6(Aug 28, 2020)

v4.2.5(Aug 1, 2020)

v4.2.4(Jul 6, 2020)

v3.4.8(Jul 3, 2020)

v3.4.7(Jun 29, 2020)

Owner

Oscar Otero

This script scrapes the HTML from different web pages to get the information from the video and you can use it in your own video player.

Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses.

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.

On-Page SEO Crawler Tool with Interface

Goutte, a simple PHP Web Scraper

A configurable and extensible PHP web spider

PHP Scraper - an highly opinionated web-interface for PHP

A browser testing and web crawling library for PHP and Symfony

Goutte, a simple PHP Web Scraper

The most integrated web scraper package for Laravel.

A program to scrape online web-content (APIs, RSS Feeds, or Websites) and notify if search term was hit.

Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web scraping and testing solution.

Library for Rapid (Web) Crawler and Scraper Development

Roach is a complete web scraping toolkit for PHP

Get info from any web service or page

Damn Vulnerable Web Services is an insecure web application with multiple vulnerable web service components that can be used to learn real world web service vulnerabilities.

DiscordLookup | Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more...

Laravel-Weather let you get weather info from qweather.com

Simple PHP package to get any Facebook Page posts

Laravel Abdal Detector - Find info about IP , OS and web browser from your client