Packagist crawler

Overview

packagist-crawler

packagist.orgをクロールして、全てのpackage.jsonをダウンロードします。 ダウンロードし終わったあとでstaticなweb serverで配信すれば、packagist.orgのミラーを作ることができます。

Requirement

  • PHP > 5.3
  • ext-curl
  • ext-hash
  • ext-json
  • ext-zlib
  • ext-PDO
  • ext-pdo_sqlite

Install

$ git clone https://github.com/hirak/packagist-crawler
$ cd packagist-crawler
$ composer install

Download!

$ php parallel.php

(...few minutes...)

$ ls cache/
p/
packages.json

Configuration

  • config.default.php
  • config.php

このどちらかのファイルがあると、挙動を変えることができます。 修正したいときはconfig.default.phpをconfig.phpにコピーして、 config.phpの方をカスタマイズしてください。


return (object)array(
    'cachedir' => __DIR__ . '/cache/',
    //'cachedir' => '/usr/share/nginx/html/',
    //'cachedir' => '/usr/local/apache2/htdocs/',
    'packagistUrl' => 'https://packagist.org',
    'maxConnections' => 4,
    'lockfile' => __DIR__ . '/cache/.lock',
    'expiredDb' => __DIR__ . '/cache/.expired.db
);

cachedir

ダウンロードしたpackages.jsonを格納するディレクトリです。

packagistUrl

ダウンロード元のpackagist.orgのURLです。 デフォルトではオリジンからダウンロードしますが、 既に存在する他のミラーサイトを指定することができます。

maxConnections

並列ダウンロードの並列数です。 増やした方が速くダウンロードできますが、 オリジンに負荷をかけるので適当なところにしてください。

expiredDb

ファイル更新によって古くなったjsonが記録されています。

License

著作権は放棄するものとします。 利用に際して制限はありませんし、作者への連絡や著作権表示なども必要ありません。 スニペット的にコードをコピーして使っても問題ありません。

ライセンスの原文

CC0-1.0 (No Rights Reserved)

You might also like...
Repman - PHP Repository Manager: packagist proxy and host for private packages

Repman - PHP Repository Manager Repman is a PHP repository manager. Main features: free and open source works as a proxy for packagist.org (speeds up

Package Repository Website - try https://packagist.com if you need your own -

Packagist Package Repository Website for Composer, see the about page on packagist.org for more. This project is not meant for re-use. It is open sour

WordPress Packagist — manage your plugins with Composer

WordPress Packagist This is the repository for wpackagist.org which allows WordPress plugins and themes to be managed along with other dependencies us

Private Packagist API Client

Private Packagist API Client Table of Contents Private Packagist API Client Table of Contents Requirements Install Basic usage of private-packagist/ap

Creates Packagist.org mirror site.

Packagist Mirror Creates your own packagist.org mirror site. Requirements PHP ^7.1.3 Installation Clone the repository Install dependencies: php compo

🐋📦✂️📋📦 Docker image of packagist mirror

Docker for Packagist Mirror This project allows you to easily create and update a mirror of the packagist having as dependency only the docker. It is

📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer
📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer

📦 Packagist Mirror ❤️ Recommended by packagist.org ❤️ Announcement: Composer 2 is now available! This mirror is for Composer 1; Composer 2 is very fa

An easy way to get vendor and package data from Packagist via API calls
An easy way to get vendor and package data from Packagist via API calls

Laravel Packagist Laravel Packagist (LaravelPackagist) is a package for Laravel 5 to interact with the packagist api quickly and easily. Table of cont

Magento 2 Email Catcher or Email Logger Module. Available At Packagist.

Magento 2 Module Experius email catcher / - logger ``experius/module-emailcatcher`` Main Functionalities Installation Versions Enable email catcher C

Comments
  • Composer 2.0 support

    Composer 2.0 support

    Hey Hirak,

    Are you planning on adding support for composer 2.0 which adds a metadata-url directive to packages.json which points composer to a p2 directory with the packagist format: /p2/%package%.json

    For example: https://packagist.org/p2/laravel/tinker.json

    Background info: https://github.com/composer/composer/blob/master/CHANGELOG.md https://github.com/composer/composer/blob/master/UPGRADE-2.0.md#metadata-url

    Regards, Levi

    opened by Levivb 1
  • sha256 check簡略化

    sha256 check簡略化

    現在のロジックでは、ダウンロード後に全てのjsonファイルのchecksumを検証して、ダウンロードに失敗したものは再ダウンロードするようにしている。

    しかしながら47000もあるjsonファイルを全てチェックすると、ディスクに負荷がかかる。今のように一日一度のクロールなら大した手間ではないが、もっとクロール頻度を上げるとこの負荷が問題になるかもしれない。

    クロール時に新しくダウンロードしたjsonだけチェックするようにして、チェック済みのjsonはスキップするようにすれば軽くなりそう。

    opened by hirak 1
  • 古いファイルの掃除方法を再検討

    古いファイルの掃除方法を再検討

    JSONが更新された場合、古くなって必要なくなったファイルは即座に消しているのが現状の仕様。

    しかし、同期の頻度をもっと上げて、例えば1分おきに同期をとる場合、まだ古いファイルをcomposerでダウンロード中のユーザーが居るかもしれないので、いきなり消すと迷惑をかけてしまう。

    古いファイルはある程度溜めて、例えば24h経過したものだけ消すなどの工夫が必要。

    opened by hirak 1
  • Increasing disk space size

    Increasing disk space size

    It's possible that I'm letting something go (and wrong), I realized that the cache is growing in the last weeks and is already over 50GB, probably not normal to grow more than 1GB per day, I let something pass out?

    opened by webysther 0
Releases(0.0.1)
Owner
Hiraku NAKANO
PHP Programmer at Mercari Inc
Hiraku NAKANO
On-Page SEO Crawler Tool with Interface

upzon I developed this project with PHP & MYSQL and python. If you have basic python and php knowledge, it is quite simple to use this program. I'm us

null 5 Oct 27, 2021
Library for Rapid (Web) Crawler and Scraper Development

Library for Rapid (Web) Crawler and Scraper Development This package provides kind of a framework and a lot of ready to use, so-called steps, that you

crwlr.software 60 Nov 30, 2022
Extractor (scraper, crawler, parser) of products from Allegro

Extractor (scraper, crawler, parser) of products from Allegro

Daniel Yatsura 1 May 11, 2022
📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer

?? Packagist Mirror ❤️ Recommended by packagist.org ❤️ Announcement: Composer 2 is now available! This mirror is for Composer 1; Composer 2 is very fa

Webysther Nunes 175 Dec 30, 2022
:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

QueryList QueryList is a simple, elegant, extensible PHP Web Scraper (crawler/spider) ,based on phpQuery. API Documentation 中文文档 Features Have the sam

Jaeger(黄杰) 2.5k Dec 27, 2022
On-Page SEO Crawler Tool with Interface

upzon I developed this project with PHP & MYSQL and python. If you have basic python and php knowledge, it is quite simple to use this program. I'm us

null 5 Oct 27, 2021
Library for Rapid (Web) Crawler and Scraper Development

Library for Rapid (Web) Crawler and Scraper Development This package provides kind of a framework and a lot of ready to use, so-called steps, that you

crwlr.software 60 Nov 30, 2022
Extractor (scraper, crawler, parser) of products from Allegro

Extractor (scraper, crawler, parser) of products from Allegro

Daniel Yatsura 1 May 11, 2022
Doogle is a search engine and web crawler which can search indexed websites and images

Doogle Doogle is a search engine and web crawler which can search indexed websites and images, and then use keywords to be searched later. Written pri

Zepher Ashe 9 Jan 1, 2023
Simple static Composer repository generator - For a full private Composer repo use Private Packagist

Satis Simple static Composer repository generator. Run from source Satis requires a recent PHP version, it does not run with unsupported PHP versions.

Composer 2.9k Jan 3, 2023