Import.io is the best tool for scraping structured data off the web. Alteryx is the best tool for self-service data analysis. It doesn’t take long to realize that we should put the two together.
Using the Import.io API, the InterWorks Import.io connector allows you to read structured data from the web and bring it directly into your Alteryx workflow.
Note: If you want to achieve similar results with Tableau, my colleague Robert Rouse built a handy Import.io web connector of his own.
Setup
To install the InterWorks macro, unzip the attached folder to the location on your computer where you want the macros to be permanently saved. I suggest: Documents\My Alteryx Macros\InterWorks Macros Then, run the installer wizard Install.yxwz and choose Install. Once you restart Alteryx, the macros will now be available on your toolbar just like any other tool:
You will also need to set up a free account at http://import.io. Once you’ve signed up, the account page will show your API Key:
Automatically Extract Web Data
Import.io provides easy access to web data through their Magic API. On the homepage, enter any URL and the Magic API will attempt to extract structured data. As an example, try using the InterWorks People page:
Import.io will return a structured table of the InterWorks employee directory. To do the same thing in Alteryx, use the InterWorks connector, enter your API Key, select Magic API in the tool configuration and enter the URL. The tool will the return the JSON data extracted from the webpage right into your Alteryx module:
Using the Connector API in Alteryx
The Magic API works wonders, but sometimes it is unable to find exactly what you are looking for. In this case, you may need to train an extractor or connector. An extractor allows you to build a custom tool to scrape data from similarly structured web pages. A connector is an extractor with a macro attached. This will allow you to record actions such as using page searches before extracting data. Building a connector is easy; but for this example, we will use one that has already been created.
Import.io has provided an open connector that allows users to extract data from the California Superior Court after searching for a particular case title and party:
To use this connector in Alteryx, enter your API key, the Connector ID and the input variables (one name:value pair per row):
Run the module to return the search results into your Alteryx module.
Try It Out
We’re excited to bring together the simple data scraping capabilities with our favorite data analytics tools. InterWorks is always looking for feedback and ways that we can improve our tools to better help our clients. Create a connector, try out the macro and tell us what you think!