Talking Machines API

The Talking Machines application serves as a central store for machine learning and artificial intelligence data.

There are many data feeds available in the machine learning space that can be found through public research. But the data is inconsistent, and this makes it difficult to incorporate it in your applications.

This application currently consumes data from around 10 of the most popular sources in the Machine Learning space through regular cron jobs. Once in this system, our team has the ability to curate this content through a robust CMS powered by Drupal 8.

Other applications on the internet need only point at this API to gain all the best data from all across the machine learning space.

API Consumption: Drupal feeds module

API consumption is provided by an extension of the Drupal feeds module.

The feeds module is used to create content or entities within a system based on feed data on the internet. Source data is mapped and massaged to destination properties on entities in this system. 

The Drupal 8 version of the feeds module is currently in a development release, so it has many bugs. These bugs are remediated by custom code and the overall feeds functionality is thoroughly extended to meet the requirements of this application.

API Publishing: Drupal JSON API module

The Talking Machines API is provided by an extension of the Drupal JSON API module.

The JSON API is a specification for building clean APIs that can be understood by other applications anywhere on the internet built around the same specification. The JSON API Drupal module is a turn key implementation of the JSON API built on Drupal. So the module only needs to be installed to expose a starter kit API.

The Drupal module JSON API Extras is used to extend the API and modify it suit our unique requirements.

The entire Talking Machines API is built as a configuration of Drupal without a single letter of custom code. And this absence of custom code will allow it to scale infinitely and stay consistent with the Drupal community.

Symfony Dom Crawler extension

The data in the feeds consumed by this application is inconsistent. Example inconsistencies:

  • Some feeds provide full article bodies and some have teaser descriptions.
  • Most feeds don't have any images directly in the feed.
  • Some feeds provide author data as comma separate links and some provide author data as plain text.

These inconsistencies are addressed by custom code that extends the Drupal feeds module and by a custom Dom Crawler.

In Drupal 7, there were a few Dom crawling solutions that extended feeds like Feeds Crawler. But because Drupal 8 is still in its early phases, there is currently no public solution available for Dom Crawling.

So a new module is currently under development on this application that could possibly be offered as a public contributed module on drupal.org in the future. It is called "Talking Machines Feeds Crawler" (tm_feeds_crawler) and is built as an extension of:

  1. Symfony DomCrawler
  2. Symfony Browser Kit
  3. Guzzle HTTP Client
  4. Goutte Web Scraper
  5. Other areas of Drupal and the open source world

Symfony Dom Crawler is used to parse data that is publicly available in the feeds and to perform custom logic per feed.

Example DOM crawling sequence:

  1. Click link that is associated with feed item.
  2. Crawl in to article body found on the given page.
  3. Extract images found in the article body that match a certain pattern in their URL.
  4. Map images to proper image field on the article content type.

Instead of writing custom code to craft these DOM crawling sequences, a UI is provided to build these conditions and actions.

Custom code is the enemy of scale. As a global practice and whenever possible, these differences in configuration are brought in to the UI so the same code can be re-purposed to solve future challenges. 

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.

Feed: API Overview

Items imported
2598
Last import
11/29/2024 - 18:31
Next import
11/29/2024 - 18:31
Title
arXiv: Corr
Items imported
4082
Last import
11/29/2024 - 18:30
Next import
11/29/2024 - 18:30
Items imported
1115
Last import
11/29/2024 - 18:31
Next import
11/29/2024 - 18:31
Title
Machine Learning Masonry
Items imported
82
Last import
11/29/2024 - 18:31
Next import
11/29/2024 - 18:31
Title
Machine Learning Weekly
Items imported
11
Last import
11/07/2019 - 16:45
Next import
11/07/2019 - 16:45
Title
MIT Machine Learning
Items imported
358
Last import
11/29/2024 - 18:31
Next import
11/29/2024 - 18:31
Title
Microsoft News Machine Learning
Items imported
41
Last import
11/29/2024 - 18:30
Next import
11/29/2024 - 18:30
Title
Stats and Articles
Items imported
32
Last import
11/29/2024 - 18:31
Next import
11/29/2024 - 18:31
Title
FastML
Items imported
21
Last import
11/29/2024 - 18:30
Next import
11/29/2024 - 18:30
Title
arXiv: Stat
Items imported
1247
Last import
11/29/2024 - 18:30
Next import
11/29/2024 - 18:30
Title
Term: Tag: AITopics: Concept
Items imported
84
Last import
04/18/2018 - 17:26
Next import
04/18/2018 - 18:26
Items imported
14
Last import
04/18/2018 - 17:27
Next import
04/18/2018 - 18:27
Items imported
29
Last import
04/18/2018 - 17:27
Next import
04/18/2018 - 18:27
Items imported
7
Last import
11/29/2024 - 18:30
Next import
11/29/2024 - 18:30
Title
A Cast TM Podcast
Importing status
On
Items imported
110
Next import
11/29/2024 - 18:30
Last import
11/29/2024 - 18:30

Normalized API Endpoints

Title
JSON Combined
Content Types
Article
Event
Job
Podcast
API Format
RSS
API Access
Public
Title
JSON Job
Content Types
Job
API Format
JSON
API Access
Public
Title
RSS Job
Content Types
Job
API Format
RSS
API Access
Public
Title
JSON Podcast
Content Types
Podcast
API Format
JSON
API Access
Public
Title
RSS Podcast
Content Types
Podcast
API Format
RSS
API Access
Public
Title
JSON Event
Content Types
Event
API Format
JSON
API Access
Public
Title
RSS Event
Content Types
Event
API Format
RSS
API Access
Public
Title
JSON Article
Content Types
Article
API Format
JSON
API Access
Public
Title
RSS Article
Content Types
Article
API Format
RSS
API Access
Public