Cleora AI

Click to visit website
About
Cleora AI is a general-purpose open-source model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data. It can embed heterogeneous undirected graphs, heterogeneous undirected hypergraphs, text and other categorical array data, and any combination of the above. Key competitive advantages of Cleora: more than 197x faster than DeepWalk ~4x-8x faster than PyTorch-BigGraph (depends on use case) quality of results outperforming or competitive with other embedding frameworks like PyTorch-BigGraph, GOSH, DeepWalk, LINE can embed extremely large graphs & hypergraphs on a single machine
Platform
Keywords
Task
Features
• efficient
• star decomposition of hyper-edges creation of pairwise graphs for all pairs of entity types embedding of each graph
• extreme parallelism and performance
• dim-wise independence
• cross-dataset compositionality
• stable
• updatable
• inductive
FAQs
What should I embed?
Any entities that interact with each other, co-occur or can be said to be present together in a given context.
How should I construct the input?
What works best is grouping entities co-occurring in a similar context, and feeding them in whitespace-separated lines using `complex::reflexive` modifier is a good idea.
Can I embed users and products simultaneously, to compare them with cosine similarity?
No, this is a methodologically wrong approach, stemming from outdated matrix factorization approaches. What you should do is come up with good product embeddings first, then create user embeddings from them.
What embedding dimensionality to use?
The more, the better, but we typically work from _1024_ to _4096_. Memory is cheap and machines are powerful, so don't skimp on embedding size.
How many iterations of Markov propagation should I use?
Depends on what you want to achieve. Low iterations (3) tend to approximate the co-occurrence matrix, while high iterations (7+) tend to give contextual similarity.
How do I incorporate external information, e.g. entity metadata, images, texts into the embeddings?
Just initialize the embedding matrix with your own vectors coming from a VIT, setence-transformers, of a random projection of your numeric features.
My embeddings don't fit in memory, what do I do?
Cleora operates on dimensions independently. Initialize your embeddings with a smaller number of dimensions, run Cleora, persist to disk, then repeat.
Is there a minimum number of entity occurrences?
No, an entity `A` co-occuring just 1 time with some other entity `B` will get a proper embedding, i.e. `B` will be the most similar to `A`.
Are there any edge cases where Cleora can fail?
Cleora works best for relatively sparse hypergraphs. If all your hyperedges contain some very common entity `X`, e.g. a _shopping bag_, then it will degrade the quality of embeddings
How can Cleora be so fast and accurate at the same time?
Not using negative sampling is a great boon. By constructing the (sparse) Markov transition matrix, Cleora explicitly performs all possible random walks in a hypergraph in one big step (a single matrix multiplication).
Pricing Plans
Free
Free Plan• Unlimited public/private repositories
• Dependabot security and version updates
• 2,000 CI/CD minutes/month
• 500MB of Packages storage
• Issues & Projects
• Community support
• GitHub Copilot Access
• GitHub Codespaces Access
Team
USD4.00 / per user/month• Everything included in Free, plus...
• Access to GitHub Codespaces
• Protected branches
• Multiple reviewers in pull requests
• Draft pull requests
• Code owners
• Required reviewers
• Pages and Wikis
• Environment deployment branches and secrets
• 3,000 CI/CD minutes/month Free for public repositories Use execution minutes with GitHub Actions to automate your software development workflows. Write tasks and combine them to build, test, and deploy any code project on GitHub. Minutes are free for public repositories. Learn more about billing 3,000 minutes/month Free for public repositories 2GB of Packages storage Free for public repositories Host your own software packages or use them as dependencies in other projects. Both private and public hosting available. Packages are free for public repositories. 2GB Free for public repositories Web-based support GitHub Support can help you troubleshoot issues you run into while using GitHub. Web-based support GitHub Support can help you troubleshoot issues you run into while using GitHub. GitHub Secret Protection Ensure your secrets stay secure. Mitigate risk associated with exposed secrets in your repositories, while preventing new leaks before they happen with push protection. GitHub Secret Protection Ensure your secrets stay secure. Mitigate risk associated with exposed secrets in your repositories, while preventing new leaks before they happen with push protection. GitHub Code Security Find and fix vulnerabilities in your code before they reach production. Prioritize your Dependabot alerts with automated triage rules. GitHub Code Security Find and fix vulnerabilities in your code before they reach production. Prioritize your Dependabot alerts with automated triage rules.
Enterprise
USD21.00 / per user/month• Everything included in Team, plus...
• Data residency
• Enterprise Managed Users
• User provisioning through SCIM
• Enterprise Account to centrally manage multiple organizations
• Environment protection rules
• Repository rules
• Audit Log API
• SOC1, SOC2, type 2 reports annually
• FedRAMP Tailored Authority to Operate (ATO) Government users can host projects on GitHub Enterprise Cloud with the confidence that our platform meets the low impact software-as-a-service (SaaS) baseline of security standards set by our U.S. federal government partners. Government users can host projects on GitHub Enterprise Cloud with the confidence that our platform meets the low impact software-as-a-service (SaaS) baseline of security standards set by our U.S. federal government partners. SAML single sign-on Use an identity provider to manage the identities of GitHub users and applications. SAML single sign-on Use an identity provider to manage the identities of GitHub users and applications. Advanced auditing Quickly review the actions performed by members of your organization. Keep copies of audit log data to ensure secure IP and maintain compliance for your organization. Advanced auditing Quickly review the actions performed by members of your organization. Keep copies of audit log data to ensure secure IP and maintain compliance for your organization. GitHub Connect Share features and workflows between your GitHub Enterprise Server instance and GitHub Enterprise Cloud. GitHub Connect Share features and workflows between your GitHub Enterprise Server instance and GitHub Enterprise Cloud. 50,000 CI/CD minutes/month Free for public repositories Use execution minutes with GitHub Actions to automate your software development workflows. Write tasks and combine them to build, test, and deploy any code project on GitHub. Minutes are free for public repositories. 50,000 CI/CD minutes/month Free for public repositories 50GB of Packages storage Free for public repositories Host your own software packages or use them as dependencies in other projects. Both private and public hosting available. Packages are free for public repositories. 50GB Free for public repositories Premium support With Premium, get a 30-minute SLA on Urgent tickets and 24/7 web and phone support via callback request. With Premium Plus, get everything in Premium, assigned Customer Reliability Engineer and more. Learn more about Premium Support Premium support With Premium, get a 30-minute SLA on Urgent tickets and 24/7 web and phone support via callback request. With Premium Plus, get everything in Premium, assigned Customer Reliability Engineer and more. Learn more about Premium Support
Job Opportunities
There are currently no job postings for this AI tool.
Ratings & Reviews
No ratings available yet. Be the first to rate this tool!
Featured Tools
Songmeaning
Songmeaning uses AI to reveal the stories and meanings behind song lyrics. It offers lyric translation and AI music generation.
View DetailsWhisper Notes
Offline AI speech-to-text transcription app using Whisper AI. Supports 80+ languages, audio file import, and offers lifetime access with a one-time purchase. Available for iOS and macOS.
View DetailsGitGab
Connects Github repos and local files to AI models (ChatGPT, Claude, Gemini) for coding tasks like implementing features, finding bugs, writing docs, and optimization.
View Details
nuptials.ai
nuptials.ai is an AI wedding planning partner, offering timeline planning, budget optimization, vendor matching, and a 24/7 planning assistant to help plan your perfect day.
View DetailsMake-A-Craft
Make-A-Craft helps you discover craft ideas tailored to your child's age and interests, using materials you already have at home.
View Details
Pixelfox AI
Free online AI photo editor with comprehensive tools for image, face/body, and text. Features include background/object removal, upscaling, face swap, and AI image generation. No sign-up needed, unlimited use for free, fast results.
View Details
Smart Cookie Trivia
Smart Cookie Trivia is a platform offering a wide variety of trivia questions across numerous categories to help users play trivia, explore different topics, and expand their knowledge.
View Details