Misplaced Pages

YaCy

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Yacy) Peer-to-peer search engine
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article contains promotional content. Please help improve it by removing promotional language and inappropriate external links, and by adding encyclopedic text written from a neutral point of view. (January 2019) (Learn how and when to remove this message)
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "YaCy" – news · newspapers · books · scholar · JSTOR (May 2014) (Learn how and when to remove this message)
(Learn how and when to remove this message)
YaCy
[REDACTED]
Original author(s)Michael Christen
Developer(s)YaCy community
Initial release2003; 22 years ago (2003)
Stable release1.940_202412022212 / 2 December 2024; 37 days ago (2024-12-02)
Repositorygithub.com/yacy/yacy_search_server
Written inJava
Operating systemCross-platform
Size104-113 MB
TypeOverlay network, Search engine
LicenseGPL-2.0-or-later
Websiteyacy.net/en/

YaCy (pronounced “ya see”) is a free distributed search engine built on the principles of peer-to-peer (P2P) networks, created by Michael Christen in 2003. The engine is written in Java and distributed on several hundred computers, as of September 2006, so-called YaCy-peers.

Each YaCy-peer independently crawls through the Internet, analyzes and indexes found web pages, and stores indexing results in a common database (so-called index) which is shared with other YaCy-peers using principles of peer-to-peer. This decentralized approach ensures privacy and eliminates the need for a central server.

Compared to semi-distributed search engines, the YaCy network has a distributed architecture. All YaCy-peers are equal and no central server exists. It can be run either in a crawling mode or as a local proxy server, indexing web pages visited by the person running YaCy on their computer. Several mechanisms are provided to protect the user's privacy. Search functions are accessed by a locally run web server which provides a search box to enter search terms, and returns search results in a format similar to popular search engines.

System components

YaCy search engine is based on four elements:

Crawler
A search robot that traverses between web pages, analyzing their content.: The crawler is responsible for fetching web pages from the internet. Each peer in the YaCy network can crawl and index websites. The crawling process involves:
  • Discovery: Finding new web pages to index by following links.
  • Fetching: Downloading the content of web pages.
  • Parsing: Extracting relevant information such as text, metadata, and links from the downloaded pages.
Indexer
It creates a reverse word index (RWI), i.e., each word from the RWI has its list of relevant URLs and ranking information. Words are saved as word hashes.
Search and administration interface
Made as a web interface provided by a local HTTP servlet with a servlet engine.
Data storage
Used to store the reverse word index database utilizing a distributed hash table.
Homepage of YaCy

Search-engine technology

YaCy network
  • YaCy is a complete search appliance with user interface, index, administration, and monitoring.
  • YaCy harvests web pages with a web crawler. Documents are then parsed, and indexed and the search index is stored locally. If your peer is part of a peer network, then your local search index is also merged into the shared index for that network.
    • A search is started, then the local index contributes with a global search index from peers in the YaCy search network.
  • The YaCy Grid is a second-generation implementation of the YaCy peer-to-peer search. A YaCy Grid installation comprises microservices that communicate using the Master Connect Program (MCP).
  • The YaCy Parser is a microservice that can be deployed using Docker. When the Parser Component is started, it searches for and connects to an MCP. By default, the local host is searched for an MCP, but you can configure one yourself.

YaCy platform architecture

Web search showing results of the different components YaCy uses

YaCy uses a combination of techniques for the networking, administration, and maintenance of indexing the search engine, including blacklisting, moderation, and communication with the community. Here is how YaCy performs these operations:

  • Community components
    1. Web forum
    2. Statistics
    3. XML API
  • Maintenance
    1. Web Server
    2. Indexing
    3. Crawler with Balancer
    4. Peer-to-Peer Server Communication
  • Content organization
    1. Blacklisting and Filtering
    2. Search interface
    3. Bookmarks
    4. Monitoring search results

Distribution

YaCy is available in packages for Linux, Windows, and Macintosh, and also as a Docker image; it can also be installed on other operating systems either by manually building it, or using a tarball. YaCy requires Java 8, OpenJDK 8 is recommended.

The Debian package can be installed from a repository available at the subdomain of the project's website, but is not yet maintained in the official Debian package repository.

See also

References

  1. "Ich entwickle eine P2P-basierende Suchmaschine. Wer macht mit?". Heise Online (in German). 2003-12-15. Retrieved 2018-05-09.
  2. "Apache Server at download.yacy.net Port 443". 2024-06-01. Retrieved 2024-08-27.
  3. "Apache Server at release.yacy.net Port 443". 2024-10-07. Retrieved 2024-10-07.
  4. "YaCy takes on Google with open source search engine". The Register. 2011-11-29. Retrieved 2012-04-16.
  5. "YaCy: It's About Freedom, Not Beating Google". PC World. 2011-12-03. Retrieved 2012-04-16.
  6. "Home - YaCy". yacy.net. Retrieved 2024-07-01.
  7. "FAQ - YaCy". yacy.net. Retrieved 2024-07-04.
  8. "YaCy Technology Architecture". YaCy.net. Archived from the original on 2012-02-05. Retrieved 2012-02-14.
  9. "Demo - YaCy". yacy.net. Retrieved 2024-08-12.
  10. GitHub: YaCy Grid Crawler, YaCy Search Engine, 2021-02-28, pp. yacy / yacy_grid_crawler, retrieved 2021-03-11
  11. GitHub: YaCy Grid Parser, YaCy Search Engine, 2021-02-28, pp. The YaCy Grid is the second-generation implementation of YaCy, retrieved 2021-03-11
  12. GitHub: YaCY Search, YaCy Search Engine, 2021-02-28, pp. yacy / yacy-search forked from cream/yacy-search, retrieved 2021-03-11
  13. "forum.yacy.de". Retrieved 6 June 2017.
  14. "Download - YaCy". yacy.net. Retrieved 2021-07-27.
  15. "En:DebianInstall". YaCyWiki. Retrieved 6 October 2019.
  16. "Dev:TaskSharing". YaCyWiki. Retrieved 6 October 2019.
  17. "#452422 - RFP: yacy -- distributed web crawler and search engine". Debian Bug report logs. Retrieved 2 May 2020.

Further reading

YaCy at LinuxReviews

External links

Distributed search engines
Distributed web search
Distributed web crawlers
italics = defunct
Web search engines
Active
Dedicated
Metasearch engines
Defunct
or
Inactive
Categories:
YaCy Add topic