Wednesday, March 6, 2013

Alfresco's Auditing System

Preamble

I just had to think about the monitoring of Alfresco. Things like 'Which user logged in how often' or 'Which document was opened how often' were required. My first idea was to develop the following system:
  • Alfresco Share does communicate with the Repository layer via Web Scripts. So every action should cause an HTTP request.
  • A proxy in front of the Alfresco repository which filters requests those are relevant  (E.G. the AuthenticationService)
  • Each matching request should be logged to a database. So the database contains just the HTTP request text and the request parameters.
  • Then it is possible to use the request log to create specific reports
By further investigating the requirements the question was raised if Alfresco not yet has such a functionality. This is the reason why this article spots some light on the Alfresco Auditing feature. From the first view, this feature includes the previous mentioned idea. I can define Extractors and Generators based on a by path filtering (whereby the path seems to refer to the RESTful service which is used). Further investigations may answer the question if the auditing is suitable to fit the above mentioned requirements.
  • Auditing needs to be enabled.
  • Configure filters.
  • There are DataProducers, DataExtractors and DataGenerators
  • It is possible to define custom AuditApplications
  • The Auditservice is used to retrieve the audit data.
  • The RecordValue element depends on a  DataExtractor by specifying which data trigger and which data source to use
So now let's try to get behind it

Configuration
  • To enable auditing you can set 'audit..enabled=true' in the alfresco-global.properties file. The web script under '/api/audit/control' then gives you further details about the state. To enable specific audit applications (see section below) you can set 'audit.${application id}.enabled = true'.
  • Set logging in audit-log.properties: org.alfresco.repo.audit.AuditComponentImpl=DEBUG
Filters

Filters are applied to events. The DataProducer is identified by a root path. A DataProducer calls a recordAuditValues method which uses the root path and a audit map. The map contains the information which is relevant for auditing purposes. So if the root path is "/alfresco-access/transactio" then the map contains the values 'action' (E.G. MOVE), 'node' (The target node of the action), 'move/from/node', 'move/to/node', 'sub-actions' and so on.

It is now possible to define filters in the alfresco configuration file. The format is:
  • audit.filter.${application part of the root path}.${sub path of the root path}.${property in audit map} = ${; seperated list of regular expressions for values those should match}
So an example is to audit every log-ins of the user jblogs and every user who has an user id which begins with 'd'.
  •  audit.filter.alfresco-access.login.user=jblogs; d.*
Additionally it's required to enable the audting for a specifc filter
  • audit.filter.alfresco-access.login.enabled=true
DataProducers

There are several data producers out of the box available. The documentation says that the 'org.alfresco.repo.audit.access.AccessAuditor' does not resolve any event in detail (preview and download is one single event) whereby the 'AuditMethodInterceptor' producer records seperated events. There is property in the configuration 'audit.alfresco-access.sub-actions.enabled' which seems to be used to tell Alfresco which DataProducer should be used.

DataGenerators

A data generator produces output without any input. So data is produced when a specifc path is set as active. The AuthenticatedUserDataGenerator generates data as soon as a person gets authenticated. So the data generator is responsible for generating data dependent on specific events. Such a generator is not the same a DataProducer. It seems that a producer is used to implement the 'Which events should produce data?' and the DataGenerator is used to implement the 'Which data should generated?'.

A data generator has a registered name or fully qualified class name. In the first case you can access it via the Spring bean id (audit-services-context.xml) in the last case you can reference it via its class name directly. (class or registeredName properties if defining them in the application configuration)

The documentation says that the 'AccessAuditor' generator writes entries like:

${application part of the root path}.${sub path of the root path}.${property in audit map}= ${value in audit map}.
 
DataExtractors

It is a component which uses input data to produce some output. As a DataGenerator you can define it in your application configuration by using its registered name or fully qualified class name. Alfresco provides the SimpleValueExtractor (org.alfresco.repo.audit.extractor.SimpleValueDataExtractor). This default extractor just returns the input without any transofrmation. Another examle is the NodeNameDataExtractor which is able to extract the cm:name value of a node. So in summary the extractor is used to implement the 'How to store the previously generated data?'

Path mappings

We already mentioned the root path and we also know that our audit entry map contains paths as part of the keys. The path mapping can be used to rewrite these paths. So let's assume you want '/ecgaudit/login' as the path in you entry map instead '/alfresco-api/post/AuthenticationService/authenticate' then you can define the following path mapping:

<PathMappings>
  <PathMap source="/alfresco-api/post/AuthenticationService/authenticate" target="/ecgaudit/login"
</PathMappings>

To following the path conventions, please keep in mind that the first part of the path is the application id.

Audit Applications

How exactly auditing behaves depends on the audit application. There is one application provided by Alfresco which is named 'alfresco-access'.

You can add new audit application configurations to <tomcat>/shared/classes/alfresco/extension/audit directory. Just create an XML file ${application id}.xml inside it.

Here an example application from the Alfresco Wiki:

  <Application name="DOD5015" key="DOD5015">
        <AuditPath key="login">
            <AuditPath key="args">
                <AuditPath key="userName">
                    <RecordValue key="value" dataExtractor="simpleValue"/>
                </AuditPath>
            </AuditPath>
            <AuditPath key="no-error">
                <GenerateValue key="fullName" dataGenerator="personFullName"/>
            </AuditPath>
            <AuditPath key="error">
                <RecordValue key="value" dataExtractor="nullValue"/>
            </AuditPath>
        </AuditPath>
    </Application>

Audit trail

As mentioned before there is the 'alfresco-access' application. So the default entries in the trail are coming from this application. Database tables are used to store the audit trail. The following columns are visible in the trail: 'user name', 'application', 'method' (which is similar to 'action'), timestamp, entry (as explained before such an entry contains a map of values. How the entry looks like depends on the data generation and extraction).

AuditService

 The audit service implements the following interface: http://dev.alfresco.com/resource/docs/java/repository/org/alfresco/service/cmr/audit/AuditService.html . Eye catching is that you have to specify an audit query. A query is handled in an AuditQueryCallback (http://dev.alfresco.com/resource/docs/java/repository/org/alfresco/service/cmr/audit/AuditService.AuditQueryCallback.html). The callback has to implement the 'handleAuditEntry' method which gets passed the following parameters:

  • entryId
  • applicationName
  • user
  • time
  • the entry map
 Additionally it seens to be possible to just access the above defined applications RESTfully. If I understood it right, then the audit query is just the following HTTP call.
  • http://localhost:8080/alfresco/service/api/audit/query/${Application id}?${Parameters}
The documentation mentions the following parameters:

  • verbose = true | false
  • limit = ${The number of last n values to return}
  • forward = true | false
  • toId = ${Which id-s should be included. This is not the node id, but the entry id}
Summary

In summary I think that the auditing feature is capable to fit my initial requirements. It is highly customizable. So it should be possible to extend it with own Producers, Generators and Extractors based on the specific requirements. It provides indeed a suiteable framework but by adding some complexity. What should be kept in mind is that this level of complexity may affect the performance of the system in a negative way, but this needs further evaluation. I will start by using the default audit application to get my hands on it.