Talking Machines API
The Talking Machines application serves as a central store for machine learning and artificial intelligence data.
There are many data feeds available in the machine learning space that can be found through public research. But the data is inconsistent, and this makes it difficult to incorporate it in your applications.
This application currently consumes data from around 10 of the most popular sources in the Machine Learning space through regular cron jobs. Once in this system, our team has the ability to curate this content through a robust CMS powered by Drupal 8.
Other applications on the internet need only point at this API to gain all the best data from all across the machine learning space.
API consumption is provided by an extension of the Drupal feeds module.
The feeds module is used to create content or entities within a system based on feed data on the internet. Source data is mapped and massaged to destination properties on entities in this system.
The Drupal 8 version of the feeds module is currently in a development release, so it has many bugs. These bugs are remediated by custom code and the overall feeds functionality is thoroughly extended to meet the requirements of this application.
The Talking Machines API is provided by an extension of the Drupal JSON API module.
The JSON API is a specification for building clean APIs that can be understood by other applications anywhere on the internet built around the same specification. The JSON API Drupal module is a turn key implementation of the JSON API built on Drupal. So the module only needs to be installed to expose a starter kit API.
The Drupal module JSON API Extras is used to extend the API and modify it suit our unique requirements.
The entire Talking Machines API is built as a configuration of Drupal without a single letter of custom code. And this absence of custom code will allow it to scale infinitely and stay consistent with the Drupal community.
The data in the feeds consumed by this application is inconsistent. Example inconsistencies:
- Some feeds provide full article bodies and some have teaser descriptions.
- Most feeds don't have any images directly in the feed.
- Some feeds provide author data as comma separate links and some provide author data as plain text.
These inconsistencies are addressed by custom code that extends the Drupal feeds module and by a custom Dom Crawler.
In Drupal 7, there were a few Dom crawling solutions that extended feeds like Feeds Crawler. But because Drupal 8 is still in its early phases, there is currently no public solution available for Dom Crawling.
So a new module is currently under development on this application that could possibly be offered as a public contributed module on drupal.org in the future. It is called "Talking Machines Feeds Crawler" (tm_feeds_crawler) and is built as an extension of:
- Symfony DomCrawler
- Symfony Browser Kit
- Guzzle HTTP Client
- Goutte Web Scraper
- Other areas of Drupal and the open source world
Symfony Dom Crawler is used to parse data that is publicly available in the feeds and to perform custom logic per feed.
Example DOM crawling sequence:
- Click link that is associated with feed item.
- Crawl in to article body found on the given page.
- Extract images found in the article body that match a certain pattern in their URL.
- Map images to proper image field on the article content type.
Instead of writing custom code to craft these DOM crawling sequences, a UI is provided to build these conditions and actions.
Custom code is the enemy of scale. As a global practice and whenever possible, these differences in configuration are brought in to the UI so the same code can be re-purposed to solve future challenges.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.