Content Transforms

Content Transform plugins can be used to modify the format and extract data from content pulled by a content source. Internally they then accept content as a text string or as an array and must follow a Content Source plugin that is compatible with the data the transform accepts.

Content Transform plugins then provide content as a text string or as an array and must be followed by a Content Display plugin that is compatible with the data the transform provides.

UCP will warn in the edit dialog and provide error information if a transform is incompatible with the source or display.

Current content sources fall broadly into categories:

  1. Translating or modifying a text string.
  2. Extracting an array of data from a text string.
  3. Filtering and modifying arrays of data
  4. Null transform, pass on whatever is provided.

On this Page

[ Universal Content Puller ]

Content Transform Plugins

Content Transform Plugins are used to modify or transform content between source and display.

Functionality can be extended by adding plugin classes for additional Content Transform Plugins. Content Transform Plugins are simple classes that provide the functionality to output content in different ways. They should inherit from ContentTransformPluginBase. Details are provided by comments in the code.

Plugins can be added by placing the plugin classes at packages/anyPackageName/src/JtF/UCP/ContentTransforms/Plugins/PluginName or application/src/JtF/UCP/ContentTransforms/Plugins/PluginName. Plugins can also be similarly placed beneath the plugin type's namespace declared in a package controller's AutoloaderRegistries.

[ Universal Content Puller ]

Array Hacker

Manipulate array data.

Facilitates slicing, transposing and flattening array data from any source that provides content as an array.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

Cache With Transform

Cache the content coming into the transform.

Cache any string or array data from the content source or from the previous stage of a transform pipeline.
Accepts content as any of string or array. Provides content as accepted.

Cached content is matched to page requests by a cache key. Data such as user groups, routing and query parameters can be built into the key and hence used to differentiate cache returns for factors such as content filtering, pagination and user (group) permissions.

In general, the Cache With Transform key should be left as permissive as possible so as to achieve higher cache hit rates.

Consider that caching could actually slow things down. It is best applied where a complex content source does not provide its own cache or when a pipeline of transforms involves particularly expensive processing. In most other situations, saving and then fetching the cached data can easily consume more time than simply re-processing it for each page request.

Also consider that the Concrete CMS full page cache will usually be more effective than caching within Universal Content Puller.

[ Universal Content Puller ]

Convert Encoding

Convert text encoding.

Sometimes a source provides text in an unwanted encoding. Encoding can be converted from/to any encoding supported by the php installation. In general, the destination encoding should be utf-8 (default) or a precursor such as ASCII. If you don't know the source encoding, the 'auto' option can make a best guess.
Accepts content as any of string or array. Provides content as accepted.

Encoding conversion is usually best made on a string of text before other transforms. Prior to php7.2, encoding can only be converted for strings. For php7.2+, encoding of arrays can also be converted. Where encoding cannot be converted, the original is returned.

[ Universal Content Puller ]

First Row to Keys

Use the first row of an array to create keys for subsequent rows.

Facilitates subsequent transform or display of array data by assigning the first row as keys to items in subsequent rows.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

HTML Repair

Repair HTML by applying htmLawed.

Sometimes a source provides broken HTML. HTMLawed can repair some kinds of broken HTML such as missing closing tags or mis-matched tags.
Accepts content as a string. Provides content as a string.

[ Universal Content Puller ]

Key Filter

Remove picked columns from an array.

At any level of an array, keys are used to pick the keys or indexes to remove. Level is the level of the array index, counting in from the outermost index. 1 for outer index. There is no automatic level. Within a filtered level, elements for keys filtered will be removed from the array.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

Key Mapper

Map the column keys in arrays.

At any level of an array, existing keys are replaced with the mapped keys. For multi-dimensional arrays, Level is the level of the array index, counting in from the outermost index. 1 for outer index. A level of 0 applies to any level, where any matching key at any level will be replaced.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

Key Picker

Pick columns from an array.

At any level of an array, keys are used to pick the keys or indexes to return in the sequence required. Level is the level of the array index, counting in from the outermost index. 1 for outer index. There is no automatic level. Within a picked level, elements for keys not picked will be removed from the array.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

Key Regex

Apply a regex to modify keys in arrays.

At any level of an array, the regex is applied to existing keys and replacements made. For multi-dimensional arrays, Level is the level of the array index, counting in from the outermost index. 1 for outer index. A level of 0 applies to any level, where any matching key at any level will be replaced.
Accepts content as an array. Provides content as an array.

If applying this transform to a large content source, it is highly recommended that you enable paginate with source or precede this transform in a Pipeline with a transform that provides paginate with transform such as Array Hacker.

[ Universal Content Puller ]

List Selector

Extract a list from HTML, XML or JSON using css selectors.

Extracts from HTML or XML using a two levels of selector to provide a list. If you need more complex data, consider using Multi Level Selector.
Accepts content as any of string or array. Provides content as an array. When given an array of source, applies the selectors and slicing to each element of the array.

The outer selector identifies an area of the source to work with. If empty, the entire source will be used. The remove selector optionally identifies parts to remove before proceeding. The item selector picks a list of elements within the outer selector. If empty, the list will comprise all child elements of the outer selector.

[ Universal Content Puller ]

Markdown

Format Markdown into html.

Markdown is a plain text markup syntax that can be converted into HTML. Markdown Extra provides an extended syntax.
Accepts content as a string. Provides content as a string.

[ Universal Content Puller ]

Multi Selector

Advanced Extract from HTML or XML using multiple css selectors.

Extracts from HTML or XML using layers of css selectors to provide an array (multi level list) of the selected items. The remove selector optionally identifies parts to remove before proceeding.
Accepts content as any of string or array. Provides content as an array. When given an array of source, applies the selectors to each element of the array.

Use this transform to extract tables or more deeply nested data from HTML or XML, RSS feeds. Also handles JSON sources by internally translating via XML.

[ Universal Content Puller ]

NL2BR

Format by converting new lines in the text to HTML line breaks.

Applies the php nl2br() function.
Accepts content as a string. Provides content as a string.

[ Universal Content Puller ]

Pass Through

Null transform.

No Transform. The content provided by the source is passed through without any modification or transformation.
Accepts content as any of string or array. Provides content as accepted.

[ Universal Content Puller ]

Pipeline

Build a complex transform as a sequence of up to 6 other transforms.

Read the full help before attempting to use this content transform.
Accepts and provides content according to the assembled transform pipeline.

Before using this plugin to build a pipeline of transforms, first consider if you can achieve the functionality you require with a single transform. A single transform is both more efficient and does not suffer from current limitations of this pipeline transform.

Limitations are:

  1. Validation of provide/accept compatibility is not as thorough as with the simple source/transform/display workflow.
  2. Multiple transforms are slower to execute.
  3. Import from JSON settings containing a pipeline transform does not fully work.
  4. Transforms in the pipeline cannot be sorted.
Nevertheless, there may be situations where the practical solution is to use this Pipeline transform. The on-page rendering of the Pipeline fully works as long as you take care of provides/accepts compatibility. In edit mode you need to proceed with caution and do not rely on Import of JSON containing a pipeline.

Each transform in the pipeline will be evaluated in sequence. The first transform accepts content from the Content Source. Content is then modified and passed on by each successive transform. The last transform in the pipeline provides content to the Content Display.

[ Universal Content Puller ]

Remove Duplicate Values

Find and remove duplicate array values.

Removing items from an array can skew the display of tables.

Remove duplicate values from an array. When duplicates are removed, the key of the first value will be preserved. For multi-dimensional arrays, Level is the level of the array index, counting in from the outermost index. 1 for outer index. A level of 0 applies to any level, where any matching value at any level will be replaced.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

Selector

Extract from HTML using css selector.

Extracts from HTML or XML using a css selector to provide just the selected part of the source content.
Accepts content as any of string or array. Provides content as accepted. When given an array of source, applies the selectors to each element of the array.

Sometimes pulled content has unwanted classes that interfere with ideal rendering, for example, the classes 'container' and/or 'row' from a theme grid may be unwanted when pulled into an area that already has these classes. To fix such glitches you can list classes to remove. The html elements will remain, just the classes removed from class attributes.

[ Universal Content Puller ]

Table From CSV

Parse CSV into a table.

Parse text on the premise that it is CSV data for a table.
Accepts content as a string. Provides content as an array.

If the content display plugin supports pagination, enabling Paginate with transform will allow the Table From CSV to manage pagination before the content is passed to the display. Paginate at source takes priority. Do not check both!

[ Universal Content Puller ]

Table From HTML

Parse HTML into a table.

Parse text on the premise that it is HTML and contains one or more tables. When there is more than one table, the required table can be selected by index starting at 1.
Accepts content as a string. Provides content as an array.

[ Universal Content Puller ]

Table From JSON

Parse JSON into a table.

Decode JSON text on the premise that it provides a table.
Accepts content as a string. Provides content as an array.

[ Universal Content Puller ]

Table From Text Lines

Parse lines of text into a table.

Split text into a table based solely on line breaks. A multiple new line starts a new row. A single new line starts a new column.
Accepts content as a string. Provides content as an array.

[ Universal Content Puller ]

Table Sorter

Sort any table of data.

Sort a table by selected row or column.
Accepts content as an array. Provides content as an array.

[ Universal Content Puller ]

Value Filter

Find and remove array items.

Removing items from an array can skew the display of tables.

Existing values are matched and the entry removed. For multi-dimensional arrays, Level is the level of the array index, counting in from the outermost index. 1 for outer index. A level of 0 applies to any level, where any matching value at any level will be replaced.
Accepts content as any of string or array. Provides content as accepted.

Match mode can be a simple match, a case invariant match or a regular expression. In regular expression mode you need to supply the full pattern including wrapping characters and any modifiers such as /word/i or {word}u. To remove empty values, simply leave the Find colum empty.

[ Universal Content Puller ]

Value Replace

Find and replace in strings and in array values.

For strings and at any level of an array, existing values are searched for matches and the matched text replaced. For multi-dimensional arrays, Level is the level of the array index, counting in from the outermost index. 1 for outer index. A level of 0 applies to any level, where any matching value at any level will be replaced.
Accepts content as any of string or array. Provides content as accepted.

Replacement mode can be a simple match, a case invariant match or a regular expression. In regular expression mode you need to supply the full pattern including wrapping characters and any modifiers such as /word/i or {word}u.

[ Universal Content Puller XX Sources ]

SQL Extract

Extract data from table using an SQL expression.

This transform creates a SQLite database on-the-fly, populates it, and then runs the SQL expression. For large volumes of data this becomes increasingly inefficient. Where speed is an issue, it a better solution is to source the data from a database.

Incoming data should be an array with named columns. Un-named columns will be assigned the names 'Column_1', 'Column_2' etc for the purpose of referencing in the SQL expression.
Provides content as an array.

The SQL expression is processed by SQLite and should conform to the SQLite syntax. It should SELECT from a table named `Data`. For example SELECT * FROM `Data` ...

If the content display plugin supports pagination, enabling Paginate with Transform will allow the transform to manage pagination before the content is passed to the display.

To facilitate this, the query string should include the SQL 'LIMIT {{limit}} OFFSET {{offset}}'. UCP will then fill in the offset and limit with the relevant values. Behind the scenes, the value inserted in '{{limit}}' will be slightly larger than the page size to facilitate detection of the last page and orphans.

The query for the total should use COUNT(...) to count the maximum number of rows. It will usually include the same parameters as the main query, but should not include pagination placeholders '{{limit}}' or '{{offset}}'.

Take care the syntax for counting distinct items in SQLite is different to that of MySQL.

[ Universal Content Puller XX Sources ]

SQL Extract with Form

Extract data from table using an SQL expression making use of GET or POST parameters from the request or page or user attributes.

This transform is an extension of the SQL Extract transform with the addition of further parameters pulled in from an on-page form or other provision of GET or POST parameters, or the values of page or user attributes.
Provides content as an array.

The SQL expression is processed by SQLite and should conform to the SQLite syntax. It should SELECT from a table named `Data`. For example SELECT * FROM `Data` ...

If the content display plugin supports pagination, enabling Paginate with Transform will allow the transform to manage pagination before the content is passed to the display.

To facilitate this, the query string should include the SQL 'LIMIT {{limit}} OFFSET {{offset}}'. UCP will then fill in the offset and limit with the relevant values. Behind the scenes, the value inserted in '{{limit}}' will be slightly larger than the page size to facilitate detection of the last page and orphans.

The query for the total should use COUNT(...) to count the maximum number of rows. It will usually include the same parameters as the main query, but should not include pagination placeholders '{{limit}}' or '{{offset}}'.

Take care the syntax for counting distinct items in SQLite is different to that of MySQL.

In addition to the pagination placeholders 'LIMIT {{limit}} OFFSET {{offset}}', the select statement to retrieve data should also use '?' placeholders for parameters to the query. These placeholders are then mapped to query string or post data values. By using placeholders, the user entered values are fully escaped to prevent SQL injection. If pagination with transform is configured, the query to count the total should use the same '?' placeholders.

Parameters may be any form input name or url parameter name. In addition, special parameters are:

  • CCM_CID - The cID of the current page
  • CCM_URL - The URL of the current page
  • CCM_CATTR_attribute_handle - A page attribute that must evaluate to a scalar value
  • CCM_UID - The uID of the current user
  • CCM_EMAIL - The email address of the current user
  • CCM_UATTR_attribute_handle - A user attribute that must evaluate to a scalar value

[ Universal Content Importer ]

Block Extractor

Extract blocks from HTML.

Extracts a list of blocks from HTML using a mixture of css selectors and heuristics, transforming into CIF XML format.
Accepts content as any of string or array. Provides content as accepted. When given an array of source, applies the extraction to each element of the array.

Extraction of blocks and CIF data generation can be sensitive to the Block Extractor settings. The Provides data as setting will normally be set to CIF XML string. Use other options for Provides data as with the Serialze Paginate display to preview intermediate stages of data extraction and gather diagnostics/feedback to fine tune the settings.

[ Universal Content Importer ]

Document File Extractor

Extract a list of document files.

Uses a selector and a remove selector to home in on part of the HTML or XML source, then lists document files found. Use this to list .pdf, .doc and other files linked directly from the source.
Accepts content as any of string or array. Provides content as an array. When given an array of source, applies the selectors and slicing to each element of the array.

The selector identifies an area of the source to work with. If empty, the entire source will be used. The remove selector optionally identifies parts to remove before proceeding.

[ Universal Content Importer ]

Image Extractor

Extract a list of images.

Uses a selector and a remove selector to home in on part of the HTML or XML source, then lists images found.
Accepts content as any of string or array. Provides content as an array. When given an array of source, applies the selectors and slicing to each element of the array.

The selector identifies an area of the source to work with. If empty, the entire source will be used. The remove selector optionally identifies parts to remove before proceeding.

Additional Pages

About this Sidebar

Creating a sidebar for a group of pages without messing about with stacks is an easy use-case for Universal Content Puller.

This sidebar is edited once, within the main addon page for Universal Content Puller.

It is then pulled into all UCP sub-pages using a UCP block.

The Content Source is Parent Page, set to pull the Sidebar area from 2 pages from the top. The Content Transform is Selector, set to remove container and row classes that, when unnecessarily nested, could mess up the Bootstrap grid. The Content Display is Plain, which just outputs the transformed text.

In the advanced settings, sanitization is disabled as we trust the source page and don't want to strip out any formatting or functionality from the pulled sidebar.