Apache Stanbol Enhancer
Enhancement Chain: default all 3 engines available < List of Enhancement Chains >
- tika ( optional , TikaEngine)
- langdetect ( required , LanguageDetectionEnhancementEngine)
- tagme ( required , TagmeEngine)
You can enable, disable and deploy new engines using the OSGi console.
Paste some text below and submit the form to let the Enhancement Chain default enhance it:
Stanbol is analysing your content...
Stateless REST analysis
This stateless interface allows the caller to submit content to the Stanbol enhancer engines and get the resulting enhancements formatted as RDF at once without storing anything on the server-side.
The content to analyze should be sent in a POST request with the mimetype specified in
the Content-type
header. The response will hold the RDF enhancement serialized
in the format specified in the Accept
header:
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ --data "The Stanbol enhancer can detect famous cities such as \ Paris and people such as Bob Marley." http://wit.istc.cnr.it:9090/engines
The list of mimetypes accepted as inputs depends on the deployed engines. By default only
text/plain
content will be analyzed
Stanbol enhancer is able to serialize the response in the following RDF formats:
application/json
(JSON-LD)application/rdf+xml
(RDF/XML)application/rdf+json
(RDF/JSON)text/turtle
(Turtle)text/rdf+nt
(N-TRIPLES)
Additional supported QueryParameters:
uri={content-item-uri}
: By default the URI of the content item being enhanced is a local, non de-referencable URI automatically built out of a hash digest of the binary content. Sometimes it might be helpful to provide the URI of the content-item to be used in the enhancements RDF graph.uri
request parameterexecutionmetadata=true/false
: Allows the include of execution metadata in the response. Such data include the ExecutionPlan as provided by the enhancement chain as well as information about the actual execution of that plan. The default value isfalse
.
Example
The following example shows how to send an enhancement request with a custom content item URI that will include the execution metadata in the response.
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ --data "The Stanbol enhancer can detect famous cities such as \ Paris and people such as Bob Marley." \ "http://wit.istc.cnr.it:9090/engines?uri=urn:fise-example-content-item&executionmetadata=true"
MultiPart ContentItem support
This extension adds support for MultiPart ContentItems to the RESTful API of the Stanbol Enhancer. (see also STANBOL-481)
outputContent=[mediaType]
: Allows to specify the Mimetypes of content included within the Response of the Stanbol Enhancer. This parameter supports wild cards (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version). This parameter can be used multiple times.
Responses to requests with this parameter will be encoded asmultipart/from-data
. If the "Accept" header of the request is not compatible tomultipart/from-data
it is assumed as a #400 BAD_REQUEST
. The selected content variants will be included in a content part with the name "content" and the Mimetypemultipart/alternate
.omitParsed=[true/false]
: Makes only sense in combination with theoutputContent
parameter. This allows to exclude all content included in the request from the response. A typical combination isoutputContent=*/*&omitParsed=true
. The default value of this parameter isfalse
outputContentPart=[uri/'*']
: This parameter allows to explicitly include content parts with a specific URI in the response. Currently this only supports ContentParts that are stored as RDF graphs.
See the developer documentation for ContentItems for more information about ContentParts.
Responses to requests with this parameter will be encoded asmultipart/from-data
. If the "Accept" header of the request is not compatible tomultipart/from-data
it is assumed as a #400 BAD_REQUEST
. The selected content parts will be included as MIME parts. The URI of the part will be used as name. Such parts will be added after the "metadata" and the "content" (if present).omitMetadata=[true/false]
: This allows to enable/disable the inclusion of the metadata in the response. The default isfalse
.
TypicallyomitMetadata=true
is used when users want to use the Stanbol Enhancer just to get one or more ContentParts as an response. Note that Requests that use anAccept: {mimeType}
header ANDomitMetadata=true
will directly return the content verison of{mimeType}
and NOT wrap the result asmultipart/from-data
rdfFormat=[rdfMimeType]
: This allows for requests that result inmultipart/from-data
encoded responses to specify the used RDF serialization format. Supported formats and defaults are the same as for normal Enhancer Requests.
multipart/from-data
can also be used as Content-Type
for requests to parsed multiple content variants or pre-existing metadata
(such as user tags). See the documentation provided by
STANBOL-481
for details on how to represent content items as Multipart MIME.
Examples
The following examples show some typical usages of the MultiPart ContentItem RESTful API. For better readability the values of the query parameters are not URLEncoded.
Return Metadata and transformed Content versions
curl -v -X POST -H "Accept: multipart/from-data" \ -H "Content-type: text/html; charset=UTF-8" \ --data "<html><body><p>The Stanbol enhancer \ can detect famous cities such as Paris and people such \ as Bob Marley..</p></body></html>" \ "http://wit.istc.cnr.it:9090/engines?outputContent=*/*&omitParsed=true&rdfFormat=application/rdf%2Bxml"
This will result in an Response with the mime type
"Content-Type: multipart/from-data; charset=UTF-8; boundary=contentItem"
and the Metadata as well as the plain text version of the parsed HTML document
as content.
--contentItem Content-Disposition: form-data; name="metadata"; filename="urn:content-item-sha1-76e44d4b51c626bbed38ce88370be88702de9341" Content-Type: application/rdf+xml; charset=UTF-8; Content-Transfer-Encoding: 8bit <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" [..the metadata formatted as RDF+XML..] </rdf:RDF> --contentItem Content-Disposition: form-data; name="content" Content-Type: multipart/alternate; boundary=contentParts; charset=UTF-8 Content-Transfer-Encoding: 8bit --contentParts Content-Disposition: form-data; name="urn:metaxa:plain-text:2daba9dc-21f6-7ea1-70dd-a2b0d5c6cd08" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit John Smith was born in London. --contentParts-- --contentItem--
This request will directly return the text/plain version
curl -v -X POST -H "Accept: text/plain" \ -H "Content-type: text/html; charset=UTF-8" \ --data "<html><body><p>The Stanbol enhancer \ can detect famous cities such as Paris and people such \ as Bob Marley.</p></body></html>" \ "http://wit.istc.cnr.it:9090/engines?omitMetadata=true"
The response will be of type text/plain
and return the string
"John Smith was born in London."
.
Execution Plan
The Executionpaln formally describes how ContentItems parst to the
Stanbol Enhancer are processes by an Enhancement Chain. Such information are
also included in enhancement results as part of the ExectionMetadata (see
also the executionmetadata=true/false
parameter)
Users that need to retrieve the ExecutionPlan used by an enhancement endpoint can do this by sending a GET request with an accept header of any supported RDF serialisation to "{enhancement-endpoint}/ep":
curl -H "Accept: application/rdf+xml" http://wit.istc.cnr.it:9090/engines/ep
Example:
clicking here to get the metadata for the currently active enhancement chains
... waiting for results ...