Smoking Areas - Singapore - area smoking
The following example rule applies redact-number to values selected by the XPath expression //balance. The matched values will be replaced by decimal values in the range 0.0 to 100000.00, with two digits after the decimal point. The rule generates replacement values such as 3.55, 19.79, 82.96.
The mask-deterministic function supports applying a salt to masking value generation via the following options. You can use them individually or together.
Install your implementation in the modules database associated with your App Server using normal document insertion methods, such as the xdmp:document-insert XQuery function, the xdmp.documentInsert Server-Side JavaScript function, or any of the document insertion features of the Node.js, Java, or REST Client APIs.
MarkLogic uses rule-based redaction. A redaction rule tells MarkLogic how to locate the content within a document that will be redacted and how to modify that portion. A rule expresses the business logic, independent of the documents to be redacted.
The procedure in this section demonstrates how to use Query Console and XQuery to install a module in the modules database. You can also use Server-Side JavaScript and the Java, Node.js, and REST Client APIs for this task.
The following example rule applies random masking to nodes selected by the XPath expression //name. The replacement value will be 10 characters long because of the length option.
Use this built-in to replace a value with a random masking value. A given input produces different output each time it is applied. The original value is not derivable from the masked value. Random masking can be useful for obscuring relationships across records.
Body Worn Cameras ensure that police or security officers follow procedures and treat the public with dignity and respect. These cameras have also resulted in a reduced amount of violence toward police officers.
Redacting the username portion of an email address replaces the username with NAME. Redacting the domain portion of an email address replaces the domain name with DOMAIN. Thus, full redaction on the email address jsmith@example.com produces the replacement value NAME@DOMAIN.
Redaction does not secure your documents within the database. For example, even if you redact a document when it is read, applications can still search or modify the content unless you properly secure the content with features such as document permissions and Element Level Security.
The following table illustrates the effect of applying redact-us-phone with various input values and configuration parameters. For a complete example, see Example: Using the Built-In Redaction Functions.
Redaction rules applied to JSON documents have no such restrictions. However, if you apply rules to a mix of XML and JSON documents, limit your rules to the supported XPath subset.
Use the following parameters to configure the behavior of this function. Set parameters in the options section of a rule.
Note that you could theoretically write the function to expect the parent object as input and have the redaction rule use an XPath expression such as /name/parent::node(). However, such a rule path is invalid if the rule is ever applied to an XML document, so we traverse up to the parent node inside the redaction function instead of in the rule. For more details, see Limitations on XPath Expressions in Redaction Rules.
The results includes both documents modified by the redaction rules and unmodified documents that did not match any rules or were not changed by the redaction functions.
If you run the script again, the values for the street names will not change because they are redacted using mask-deterministic. The values for the countries will change with each run since they are redacted using mask-random.
Use the following procedure to install the custom function into the Modules database with the URI /redaction/redact-xml-name.xqy. These instructions use XQuery and Query Console, but you can use any document insertion interface.
Regular expression patterns can contain characters that require escaping in your rule definitions. The following contains a few examples of problem characters. This is not an exhaustive list.
Before you can use a redaction rule, it must be installed as a document in the schema database associated with the database containing the documents to be redacted.
Use the rdt:redact XQuery library function to create redacted in-memory copies of documents on MarkLogic Server. This function is best suited for testing and debugging your rules or for redacting a small number of documents. To extract large sets of redacted documents from MarkLogic, use the mlcp command line tool instead.
Redaction is the process of eliminating or obscuring portions of a document as you read it from the database. For example, you can use redaction to eliminate or mask sensitive personal information such as credit card numbers, phone numbers, or email addresses from documents. This chapter describes redaction features you can use when reading a document from the database.
This is necessary because rdt.redact function returns a Sequence of in-memory document nodes. To save the redacted content in the expected form, we access the first node in the Sequence with fn.head, and then dereference it using the .root property so that match.document again contains the root node under the document node.
The following example applies the redaction rules in the collections with URIs pii-rules and hipaa-rules to the documents in the collection personnel:
Use this built-in to mask values that conform to one of the following patterns. These patterns correspond to typical representations for US Social Security Numbers (SSNs). The character N in these patterns represents a single digit in the range 0 - 9.
Before you can apply a rule, you must install it in the Schemas database as part of a rule collection. For details, see Installing Redaction Rules.
The table below describes some of the techniques you can use to redact your content. The details of what to redact and what techniques to apply depend on the requirements of your application. For details, see Choosing a Redaction Strategy.
Use this built-in to mask values that represent a dateTime value. You can use this function to mask dateTime value in one of two ways:
To install the XML rules, copy the following script into Query Console and run it against the Schemas database. For a detailed example of installing rules with Query Console, see Example: Getting Started With Redaction.
#.
The Java Client API, Node.js Client API, and Node.js Client API include the capability to install modules in the modules database. See one of the following topics for details on how to install a module using one of the Client APIs.
#.
MarkLogic provides several built-in redaction functions for use in your redaction rules. To use one of these functions, create a rule with a method child XML element or JSON property of the following form.
For more details, see Built-in Redaction Function Reference. For a complete example, see Example: Dictionary-Based Masking.
The following example applies the redaction rules in the collections with URIs pii-rules and hipaa-rules to the documents in the collection personnel:
When you apply redact-datetime with a picture option, the content selected by your rule path must serialize to text whose leading characters conform to the picture string. If there are other leading characters in the serialized content, redaction fails with an error.
The rules are inserted with a URI of the following form, where name is the XML element local name or JSON property name of the node selected by the rule. (The URI suffix depends on the rule format you install.)
These requirements are not specific to working with the root object node. Any time you have a node as input and want to modify it as a native JavaScript type, you need to use toObject. Similarly, you must always return a node, not a native JavaScript value.
You can apply the example custom redaction rule with mlcp by running a command similar to the following. The command exports the redacted documents to ./mlcp-output. This directory must not already exist.
Follow these steps to install the example rules in XML format using XQuery. If you prefer to use JSON rules, see Install the JSON Rules. For a detailed example of installing rules with Query Console, see Example: Getting Started With Redaction.
The custom function expects to receive a node as input and options that include a new-name key specifying the replacement name value.
An XPath expression such as /name/parent::node() would select the anonymous parent object, but it will cause an error if the rule is ever applied to an XML document. Since we have a mixed XML and JSON document set, we choose write the rule and the custom function to use the name property as the redaction target.
When you use the mask-deterministic built-in redaction function without a salt, two rules with equivalent options always produce the same output for the same input. You can use a salt to introduce masking value variance across rules, rule sets, or clusters. When you use a salt, each masking value is still deterministic in that the same input produces the same output. However, the same input with different salts produces different output.
The following rule specifies that values in an id XML element or JSON property that match the pattern will be replaced with the text NN-NNNNNNN. Notice the escaped characters in the pattern.
The following diagram illustrates high level redaction flow and the separation of responsibilities between the rule administrator and the rule user:
The XPath expression in the path XML element or JSON property of a rule is restricted to the subset of XPath supported by XSLT when the rule is applied to XML documents. Therefore, you must restrict your rule paths when redacting a mixture of XML and JSON context. For more details, see Limitations on XPath Expressions in Redaction Rules.
If you apply these rules to the following documents, both produce the same masking value by default for the input John Smith:
In addition, the final redacted result for a given reflects the result of at most one rule. If you have multiple rules that select the same node, they will all run, but the final document produced by redaction reflects the result of at most one of these rules.
Use a command line similar to the following to export the redacted documents from the Documents database. Both dictionary-based rules will be applied to the sample documents.
Progress and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings. Any other trademarks contained herein are the property of their respective owners.
Every rule is inserted into two collections, an all collection and a collection that identifies the built-in used by the rule. For example, /rules/redact-alias.json, which uses the mask-random built-in, is inserted in the collections all and random. This enables you to apply the rules together or selectively.
Redaction functions must return a node, not a simple value. In this case, we need to return a JSON text node that will replace the original input node. You cannot construct a text node from a native JavaScript object, so the function uses a NodeBuilder to construct the return node.
The expected result of applying this rule is that any text in the value of a node named summary that matches the pattern of a US phone number will be replaced. The replacement value uses the # number to replace all but the last 4 digits. For example, a value such as 123-456-7890 is redacted to
#-
Any redaction function that can receive input from both XML and JSON must be prepared to handle multiple node types. For example, the same XPath expression might select an element node in XML, but an object node in JSON.
This section walks you through a simple example of defining, installing, and applying a redaction rule. The example uses the built-in redaction functions redact-email and redact-us-phone.
The pre-defined redaction functions that support dictionary-based masking do so through a dictionary option that accepts a dictionary URI as its value.
Follow this procedure to apply the example custom redaction function using Query Console and rdt.redact. Make sure you have already have installed the custom redaction module, rule, and sample documents.
This section discusses applying redaction rules once rule collections have been installed on MarkLogic. The following topics are covered:
If the built-in accepts configuration parameters, specify them in the options child XML element or JSON property of the rule. For syntax, see Defining Redaction Rules. For parameter specifics and examples, see the reference section for each built-in.
The user who applies the rules must have read permission on the source documents, the rule documents, and the rule collection. For more details, see Security Considerations.
However, notice the //alias example above, which selects individual alias array items in the JSON example, rather than the entire array. If you want to redact the entire array value, you need a rule with a JSON-specific path selector. For example, a rule path such as //array-node('alias') selects the entire array in the JSON documents, resulting in a value such as the following for the alias property:
Redaction is a kind of read transformation, intended for use when exporting documents from the database. Redaction does not secure your content within the database. For example, users with sufficient document permissions can still search, read, and update documents containing the information you wish to redact. Use security features such as Element Level Security, document permissions, and URI privileges for real-time security. For more details, see the Security Guide.
If you use the sample documents from Preparing to Run the Examples, running the script will have the following effect on the search result matches:
Placing rule documents in a protected collection enables you to control who can add documents to or remove documents from the collection. Rule administrators will usually have update permissions on a protected rule collection. Rule users will not have any special permissions on a protected rule collection. A protected collection must be explicitly created before you can add documents to it. To learn more about protected collections, see Collections and Security in the Search Developer's Guide.
The following table illustrates the effect of applying redact-ipv4 with various configuration options. For a complete example, see Example: Using the Built-In Redaction Functions.
The following table shows the result of redacting the XML sample document. Notice that the telephone number in the summary element has been partially redacted by the redact-us-phone function. Also, the id element has been completely hidden by the conceal function. The affected parts of the content are highlighted in the table.
You can also create your own XQuery or Server-Side JavaScript redaction functions and define rules that apply them. A user-defined function is identified in the method XML element or JSON property by function name, URI of the implementing module, and the module namespace URI (if your function is implemented in XQuery). For details, see User-Defined Redaction Functions.
The following table illustrates the effect of applying mask-deterministic to several different types of nodes. For an end-to-end example, see Example: Using the Built-In Redaction Functions.
Use either of the following procedures to install rules that exercise the dictionaries. One rule is defined using XML, and the other rule is defined using JSON.
Before you can use a redaction dictionary, you must install it in the schemas database associated with the database that contains the content to be redacted. This must be the same database in which you install your redaction rules.
The salt and extend-salt options introduce rule and/or cluster-specific randomness to the generated masking values. Each masking value is still deterministic when salted: The same input produces the same output. However, the same input with different salts produces different output. For details, see Salting Masking Values for Added Security.
You must use a require statement to bring the redaction functions into scope in your application. These functions are implemented by the XQuery library module /MarkLogic/redaction.xqy. For example:
Document permissions enable you to control who can read, create, or update rule documents and redaction dictionaries. A rule administrator will usually have read and update permissions on such documents. Rule users will usually only have read permissions on rule documents and redaction dictionaries. To learn more about document permissions, see Protecting Documents in the Security Guide.
A redaction function implements the logic of a given redaction rule, such as determining whether or not a node needs to be modified, generating a replacement value, or hiding a value or node. You can use one of the built-in redaction functions or create a user-defined redaction function.
Note that the replacement pattern can contain back references to portions of the matched text. A back reference enables you to capture portions of the matched text and re-use them in the replacement value. See the example at the end of this section.
(Note, if you installed both the XQuery/XML and JavaScript/JSON custom redaction examples, /personnel/person1.xml will also be redacted to displayJohn Doe.)
Use the following procedure to install the rule in the schemas database associated with your content database. Some discussion of the rule follows the procedure.
The order in which rules are applied is undefined. You cannot rely on the order in which rules within a rule collection are run, nor on the ordering of rules across multiple rule collections used in the same redaction operation.
Then the masking values generated by the two rules differ as shown below. An attacker cannot deduce the relationship between the redacted value (89d7499b154a8b81c17f) and the input value (John Smith) without also knowing the salt.
The following table illustrates the effect of applying mask-random to several different types of nodes. For an end-to-end example, see Example: Using the Built-In Redaction Functions.
#.
The input documents have the following structure. The birthdate property is used to determine whether or not to redact the name property.
Regardless of the redaction method you use, you select a set of documents to be redacted and one or more rule collections to apply to those documents.
The following example rule redacts values using the random method. The format option specifies that the masking value be of the form YYYY-MM-DD, and that the masking values be in the year range 1900 to 1999, inclusive. The format of the value to be redacted does not matter.
#.
Follow these steps to apply the example rules using XQuery and Query Console. All the rules will be applied to the sample documents.
If the built-in functions do not meet the needs of your application, you can create your own redaction function using XQuery or Server-Side JavaScript. For example, you might need a user-defined function to implement conditional redaction such as redact the name if the customer is a minor. For more details, see User-Defined Redaction Functions.
Use the rdt.redact JavaScript function to create redacted in-memory copies of documents on MarkLogic Server. This function is best suited for testing and debugging your rules or for redacting a small number of documents. To extract large sets of redacted documents from MarkLogic, use the mlcp command line tool instead.
If any of the rule collections passed to rdt.redact is empty, an RDT-NORULE exception is thrown. This protects you from accidentally failing to apply any rules, leading to unredacted content. An exception is also thrown if any of the rule collections contain non-rule documents, if any of the rules are invalid, or if the path expression for a rule selects something other than a node.
If you apply both rule collections to a set of documents, you cannot know or rely on the order in which ruleA1, ruleA2, and ruleB1 are applied to any selected id node. In addition, the output only reflect the changes to //id made by one of ruleA1, ruleA2, and ruleB1.
Change the example command line as needed to match your environment. The output directory (./results) must not already exist.
A protected collection cannot be used to control who can read or modify the contents of documents in the collection; you must rely on document permissions for this control. Protected collections also cannot be used to control who can see which documents are in the collection.
Use the procedure in this section to install the sample documents into the Documents database using XQuery and Query Console. Though this example uses XQuery, you do not need to be familiar with XQuery to successfully complete the exercise.
The following requirements apply. If these requirements are not met, you will get an RDT-INVALIDDICTIONARY error when you use the dictionary.
The following example is a trivial dictionary containing four entries of various types. For a complete example, see Example: Dictionary-Based Masking.
The following example redacts SSNs selected by the path expression //id. The parameters specify that last 4 digits of the SSN are preserved and the remaining digits are replaced with the character X.
When you complete the steps in this section, your Documents database will contain the following documents. The collection names are shown in parentheses after the URI in the following list.
The following procedure installs two rules, each of which uses one of the dictionaries installed in Install the Dictionaries:
It is important that you design and implement security policies that properly protect your rules, as well as your content.
You can install rules using any document insert technique. This example uses XQuery and Query Console. You do not need to be familiar with XQuery to complete this exercise. For other rule installation options, see Installing Redaction Rules.
The following rule uses a back reference in the pattern to leave the first 2 digits of the id intact. The pattern in the previous example has been modified to have parentheses around the sub-expression for the first block of digits ((\d{2}). The parentheses capture that block of text in a variable that is referenced in the replacement string as $1.
The value range defined by a redact-number rule must be valid for the data type. For example, the following set of options is invalid because the specified range does not express a meaningful integer range from which to generate values:
You can use the following parameters to configure the behavior of this function. Set parameters in the options section of a rule.
Redaction rules applied to XML documents are restricted to the subset of XPath supported by XSLT. For example, you cannot use backward axes such as parent::*. The supported subset is defined in https://www.w3.org/TR/xslt#patterns.
In most cases, the entire value of the node is replaced by the redacted value, even if the original contents are complex, such as the //address example, above.
Use the following procedure to install the rule in the schemas database associated with your content database. Some discussion of the rule follows the procedure.
The redacted documents will be displayed in Query Console. For a discussion of the expected results, see Review the Results.
The rest of this section demonstrates some of the XML and JSON document model differences to be aware of. For a more detailed discussion of XPath over JSON, see Traversing JSON Documents Using XPath.
The mlcp command line tool will provide the highest throughput, but you may find rdt:redact or rdt.redact convenient when developing and debugging rules.
Use this built-in to mask a value with a consistent masked value. That is, with deterministic masking, a given input always produces the same output. The original value is not derivable from the masked value.
To illustrate the effects of the various character option settings, assume a length option of 10 and the following input targeted for redaction:
You can apply the example custom redaction rule with mlcp by running a command similar to the one below. The command exports the redacted documents to ./mlcp-output. This directory must not already exist.
If all the rules in the input rule collections are valid, the validation function returns the URIs of all validated rules. Otherwise, an exception is thrown when the first validation error is encountered.
The redacted IP address is normalized to contain characters for the maximum number of digits. That is, an IP address such as 123.4.56.7 is masked as
The following table illustrates the effect of applying redact-email with various levels of redaction. For a complete example, see Example: Using the Built-In Redaction Functions.
Use this built-in to mask values that match a regular expression. The regular expression and the replacement text are configurable.
An attacker could use a similar salt-less rule to generate a lookup table that indicates John Smith redacts to 6c50dad68163a7a079db. That knowledge can be used to reverse engineer redacted output.
When a pattern match is found, every redacted digit is replaced with the same character. For example, a value such as 123-45-6789 might become XXX-XX-XXXX, depending on the rule configuration.
Use the following parameters to configure the behavior of this function. Set parameters in the options section of a rule.
If any of the rule collections passed to rdt:redact is empty, an RDT-NORULE exception is thrown. This protects you from accidentally failing to apply any rules, leading to unredacted content.
Define your function in an XQuery or JavaScript library module. Install the module in the modules database associated with the App Server through which redaction will be applied. For details, see Installing a User-Defined Redaction Function.
This example only uses XQuery and XML. You can write a custom a function to handle both XML and JSON, but you might find it more convenient to use XQuery for XML and Server-Side JavaScript for JSON. For an equivalent JavaScript/JSON example, see Example: Custom Redaction Using JavaScript.
Once you install one or more rule documents in the Schemas database and assign them to a collection, you can redact documents in the following ways:
(Note, if you installed both the XQuery/XML and JavaScript/JSON custom redaction examples, /personnel/person3.json will also be redacted to display Jane Doe.)
(Note, if you installed both the XQuery/XML and JavaScript/JSON custom redaction examples, /personnel/person1.xml will also be redacted to displayJohn Doe.)
You can use rdt.ruleValidate to test the validity of your rules before calling rdt.redact. For details, see Validating Redaction Rules.
For more details on using mlcp with Redaction, see Redacting Content During Export or Copy Operations in the mlcp User Guide.
The options element contains a single element, new-name, that is used as the replacement value for any redacted name elements:
However, note that a path such as //alias, above, conceals each array item in the JSON sample, rather than concealing the entire array. This is because the alias path step matches each array item individually; for details, see Defining Rules Usable on Multiple Document Formats and Traversing JSON Documents Using XPath.
For example, when you redact John Smith, must the resulting value be two words or one? Must the word length of the original input be preserved, or must it be normalized to something such as FIRSTNAME LASTNAME?
Use the following parameters to configure the behavior of this function. Set parameters in the options section of a rule.
Follow these steps to apply the example rules using XQuery and Query Console. All the rules will be applied to the sample documents.
Note the presence of rule/@xml:lang. The @lang value zxx is not a valid language. Rather, zxx is a special value that tells MarkLogic not to tokenize, stem, and index this element. Though you are not required to include this setting in your rules, it is strongly recommended that you do so because rules are configuration information and not meant to be searchable.
The following table shows the result of redacting the JSON sample document. Notice that the telephone number in the summary property has been partially redacted by the redact-us-phone function. Also, the id property has been completely hidden by the conceal function.The affected parts of the content are highlighted in the table.
The following example rule redacts dateTime values using the parsed method. The picture option specifies that only input values of the form YYYY-MM-DD are redacted. The format option specifies that the masking value is of the form MM-DD-YYYY, with the day portion replaced by the literal value NN.
In most cases, the entire selected node is concealed, even if the original contents are complex, such as the //address example, above.
Use any MarkLogic document insertion APIs to insert rules into the schema database, such as the xdmp:document-insert XQuery function, the xdmp.documentInsert Server-Side JavaScript function, or the document creation features of the Node.js, Java, or REST Client APIs. You can assign rules to a collection at insertion time or as a separate operation.
Use the following procedure to install the custom function into the Modules database with the URI /redaction/redact-xml-name.sjs. These instructions use Server-Side JavaScript and Query Console, but you can use any document insertion interface. Discussion of the function follows the procedure.
Set permissions on your rule documents to constrain who can access or modify the rules. For more details, see Security Considerations.
If the built-in redaction functions do not address the needs of your application, you can implement a user-defined redaction function in XQuery or Server-Side JavaScript. Follow these steps to deploy and apply a user-defined function:
The redacted documents will be displayed in Query Console. For a discussion of the expected results, see Review the Results.
Unless otherwise noted, the examples in this chapter are based on the same set of source documents. The source document set consists of two XML documents and two JSON documents with similar structure. They include some complex element and property values, such as child XML elements or JSON objects, and JSON arrays.
The redacted documents will be exported to ./dict-results. The //street and //country values will reflect values from the street and country dictionaries, respectively.
The input node parameter is the node selected by the XPath expression in a rule using your function. The options parameter can be used to pass user-defined data from the rule into your function. Your function will return a node (redacted or not) or nothing.
#-7890. For more details, see redact-us-phone.
In South Africa we understand the need for excellent security products and services. We have high expectations of the people that protect us (the police force, metro officers and private security companies) and demand that they carry out their responsibilities to the best of their abilities. We want to know that if we are in trouble, the companies that we pay lots of money to each month are best equipped to help us.
MarkLogic predefines a redaction-user role. This role (or equivalent privileges) is required to validate rules and redact documents. That is, you must have this role to use the XQuery functions rdt:redact and rdt:rule-validate, the JavaScript functions rdt.redact and rdt.ruleValidate, or the -redaction option of mlcp.
The rdt:redact and rdt.redact functions are suitable for debugging redaction rules or redacting small sets of documents.
Apart from the regular applications, such as Police & Armed Response, other organisations have used body cameras effectively and reduced operating costs.
The rdt.redact function expects a document node as input, whereas match.document is the root node under the document node, such as a JSON object-node or XML element node. In the context of DocumentsSearch.map, the node in match.document is an in-database node, not an in-memory construct, so we can access the enclosing document node using fn.root, as shown above.
Deterministic masking can be useful for preserving relationships across records. For example, you could mask the names in a social network, yet still be able to trace relationships between people (X knows Y, and Z knows Y).
The options property contains a single child, newName. This value is used as the replacement value for any redacted name elements:
The function uses the birthdate element to compute the age. If the age is less than 18, then the text in the name element is redacted.
The custom function is identified in the rule by exported function name and the URI of the implementation installed in the modules database:
A rule definition can include additional data, such as a description or options. For details, see XML Rule Syntax Reference or JSON Rule Syntax Reference.
If you apply these rules to example documents from Preparing to Run the Examples, you will see the ssn XML element and JSON property values such as the following:
Similarly, setting extend-salt to collection means that an attacker who has access to one rule set cannot generate a lookup table that can be used to reverse engineer redacted values generated by a different rule set.
The output is a Sequence of document nodes, where each document is the result of applying the rules in the rule collections. A Sequence is an Iterable. For example, you can process your results with a for-of loop similar to the following:
Redaction is best suited for granular data hiding when you're exporting content from the database. For granular, real-time, in-application information hiding use Element Level Security; for more details, see Element Level Security in the Security Guide. For document-level access control, use security features such as document permissions and URI privileges. For more details on these and other security features in MarkLogic, see the Security Guide.
The output is a sequence of document nodes, where each document is the result of applying the rules in the rule collections. The results includes both documents modified by the redaction rules and unmodified documents that did not match any rules or were not changed by the redaction functions.
This example operates on JSON documents that include personal profile data such as name, address, and date of birth. A custom Server-Side JavaScript redaction function is used to redact the name if the person is less than 18 years old. A rule-specific option value controls the replacement text.
Use this built-in to mask values that conform to the pattern of an email address. The function assumes an email has the form name@domain.
The following table contains module templates suitable for defining your own conforming module. For a complete example, see Example: Custom Redaction Using JavaScript or Example: Custom Redaction Using XQuery.
Applying all the example rules redacts most XML elements and JSON properties of the sample documents. Recall that the following rules are applied to each element or property:
An exception is also thrown if any of the rule collections contain non-rule documents, if any of the rules are invalid, or if the path expression for a rule selects something other than a node. You can use rdt:rule-validate to test the validity of your rules before calling rdt:redact.
A redaction rule expressed in XML has the following form. All rule elements must be in the default namespace http://marklogic.com/xdmp/redaction and must not use namespace prefixes. For JSON syntax, see JSON Rule Syntax Reference.
The answer is simple. Body Worn Camera technology. Police and private security companies in the USA, UK and other countries around the world have implemented the use of Body Worn Cameras to assist them in serving their communities and customers to the best of their ability.
Follow the steps in this section to apply the rules in the collection gs-rules to the sample documents. This example applies the rules using Query Console. You can also use the mlcp command line tool to apply rules; for more details, see Applying Redaction Rules.
Rule validation does not check the rule path for conformance to this limitation because it cannot know if the rule will ever be applied to an XML document. If you apply a rule to an XML document with an invalid path, the exception RDT-INVALIDRULEPATH is raised.
If you have not already done so, install the sample documents from Preparing to Run the Examples. This example assumes they are installed in the Documents database.
By default, extend-salt option is set to cluster-id and the salt option is empty. This means that equivalent rules applied on the same cluster will generate the same output, but the same values would not be generated on a different cluster.
This section contain an example that demonstrates how to install a redaction dictionary and use it with built-in redaction functions. The examples rules perform the following redactions:
For more details on using mlcp with Redaction, see Redacting Content During Export or Copy Operations in the mlcp User Guide.
(Note, if you installed both the XQuery/XML and JavaScript/JSON custom redaction examples, person3.json will also be redacted to displayJane Doe.)
The following built-in redaction functions are installed with MarkLogic. These functions meet the needs of most applications. These functions are discussed in detail in Built-in Redaction Function Reference. Examples are included with each function description.
A user-defined function can be implemented in XQuery or Server-Side JavaScript. Your implementation must conform to one of the following interfaces:
When you use dictionary-based masking, a given input will always map to the same redaction dictionary entry. If you modify the dictionary, then the dictionary mapping will also change.
Use either of the following procedures to install example dictionaries. The procedure installs two dictionaries: A dictionary of country names, defined in XML, and a dictionary of street addresses, defined in JSON.
For a similar JavaScript/JSON example of defining and installing a rule that uses a custom function, see Example: Custom Redaction Using JavaScript.
Once you determine the privacy requirements of your application, you can select an appropriate built-in redaction function or create one of your own.
MarkLogic supports redaction through the mlcp command line tool and an XQuery library module in the rdt namespace. You can also use the library module with Server-Side JavaScript.
The mlcp command line tool is the recommended interface because it can efficiently apply redaction to large numbers of documents when you export them from the database or copy them between databases. To learn more about mlcp, see the mlcp User Guide.
For example, the mask-deterministic and mask-random built-in redaction functions support a dictionary option, so you can draw values from a dictionary with a rule similar to the following:
If you run the script again, the values for the street names will not change because they are redacted using mask-deterministic. The values for the countries will change with each run since they are redacted using mask-random.
These instructions assume you will use the pre-installed App Server on localhost:8000 and the Documents database, which is configured to use the Schemas database. This example uses XQuery and Query Console to install the rule, but you can use any document insertion interface.
In most cases, the entire value of the node is replaced by the redacted value, even if the original contents are complex, such as the //address example, above.
The redaction workflow enables you to protect the business logic captured in a redaction rule independent of the documents to be redacted. For example, the user who generates redacted documents need not have privileges to modify or create rules, and the user who creates and administers rules need not have privileges to read or modify the content to be redacted.
Change the example command line as needed to match your environment. The output directory (./dict-results) must not already exist.
You can use the following parameters to configure the behavior of this function. Set parameters in the options section of a rule.
Required. The specification of the redaction function to apply to content matching path. This element must have one of the forms shown below.
When you complete these steps, your Documents database will contain the following documents. The documents are also inserted in a collection named gs-samples for easy reference.
The following table illustrates the effect of applying conceal to several different types of nodes. For an end-to-end example, see Example: Using the Built-In Redaction Functions.
When you do not use a dictionary, the replacement value is either a randomly generated or repeating set of characters, depending on whether you choose random or deterministic masking. A redaction dictionary enables you to source replacement values from a pre-defined set of values instead.
Follow these steps to apply the example rules using Server-Side JavaScript and Query Console. All the rules will be applied to the sample documents.
When applied to a JSON document, the node replaced by redaction can be either a text node or a number node, depending on whether or not you use the format option. With no explicit formatting, redaction produces a number node for JSON. With explicit formatting, redaction produces a text node. For example, redact-number might affect the value of a JSON property named key as follows:
The following table summarizes the built-in redaction functions and expected input parameters. Refer to the section on each function for more details and examples.
The redaction feature covered in this chapter is a read transformation you can apply to XML and JSON documents. A redacted document usually has selected portions removed, replaced, or obscured when it is read from the database. For example, you might use redaction to eliminate email addresses or obscure all but the last 4 digits of credit card numbers when exporting a document from MarkLogic.
A rule document can only contain one rule and must not contain any non-rule data. A rule collection can contain multiple rule documents, but must not contain any non-rule documents. Every rule document must be associated with at least one collection because rules are specified by collection to redaction operations.
Follow these steps to install the example rules in JSON format using Server-Side JavaScript. If you prefer to use XML rules, see Install the XML Rules. For a detailed example of installing rules with Query Console, see Example: Getting Started With Redaction.
For security purposes, use document permissions to carefully control who can read or modify your dictionary. For more details, see Security Considerations.
Use a command line similar to the following to export the redacted documents from the Documents database. All the rules will be applied to the sample documents.
The following example command applies the rules in the collections with URIs pii-rules and hipaa-rules to documents in the database directory /employees/ on export.
Some pre-defined redaction functions that mask content can extract the masking value from a redaction dictionary. This section covers the following topics related to using a dictionary for a masking source:
Required. The specification of the redaction function to apply to content matching path. The function child element is required. The module and module-namespace child elements only used to specify a user-defined redaction function, as shown below.
You can use the following parameters to configure the behavior of this function. Set parameters in the options section of a rule.
The path expression in the rule selects the name property for redaction. Since the custom function uses the birthdate sibling property of name to control the redaction, it would be more natural in some ways to apply the rule to the parent object. However, the parent object is anonymous, so it cannot be addressed by name in an XPath expression.
The custom function expects to receive a JSON node corresponding to the node that is a candidate for redaction. This node must be a child of an object that also has a birthdate property. This code snippet implements this check:
The procedure outlined here makes the following assumptions. You will need to modify the procedure and example code to match your environment and application requirements.
The input documents have the following structure. The birthdate element is used to determine whether or not to redact the name element.
The redaction feature includes built-in redaction functions for common redaction tasks such as obscuring social security numbers and telephone numbers. You can also plug in your own redaction functions.
You can define redaction rules in XML or JSON. The format of a rule (XML or JSON) has no effect on the type of document to which it can be applied.
The following table outlines the impact of various salt and extend-salt option combinations, assuming all other options are the same.
For simplicity, this example only uses JavaScript and JSON. You can also write a custom a function to handle both XML and JSON. For a similar XQuery/XML example, see Example: Custom Redaction Using JavaScript.
When you complete this exercise, your schemas database will contain one rule defined in XML one rule defined in JSON. The rules are inserted in a collection named gs-rules. The XML rule uses the redact-us-phone built-in redaction function. The JSON rule uses the conceal built-in redaction function.
Validation confirms that your rule(s) and rule collection(s) conforms to the expected structure and does not rely on any non-existent code, such as an undefined redaction function.
You must understand the interactions between XPath and the document model to ensure proper selection of nodes by a redaction rule. The XML and JSON document models differ in ways that can be surprising if you are not familiar with the models. For example, a simple path expression such as //id might match an element in an XML document, but all the items in an array value in JSON.
The pattern and replacement text are applied to the input values as if by calling the fn:replace XQuery function or the fn.replace Server-Side JavaScript function.
The following table illustrates the effect of applying redact-us-ssn with various input values and configuration parameters. For a complete example, see Example: Using the Built-In Redaction Functions.
The following table lists some common tasks around administering and using redaction rules, the actor who usually performs this task, and the relevant security features available in MarkLogic. The security features are discussed in more detail below the table.
To illustrate the effects of the various character option settings, assume a length option of 10 and the following input targeted for redaction:
Note that a successfully validated rule can still cause runtime errors. For example, rule validation does not include dictionary validation if your rule uses dictionary-based masking. Similarly, validation does not verify that the XPath expression in a rule conforms to the limitations described in Limitations on XPath Expressions in Redaction Rules.
Each rule in this example exercises a different built-in redaction function. Each rule also operates on a different XML element or JSON property value of the sample documents to prevent overlap among the rules. Never apply collection of rules that act on the same document components.
Rule documents and rule collections are potentially sensitive information. Carefully consider the access controls and security requirements applicable to your redaction rules and rule collections.
Follow these steps to apply the example rules using XQuery and Query Console. All the rules will be applied to the sample documents.
The following example specifies that the user-defined redaction function redact-name will be applied to nodes matching the XPath expression //name. For more details and examples, see User-Defined Redaction Functions.
The following example rule applies deterministic masking to nodes selected by the XPath expression //name. The replacement value will be 10 characters long because of the length option.
The following example rule redacts IP addresses selected by the path expression //ip. The character parameter specifies the digits of the redacted IP address are replaced with X.
What about the security professionals themselves? They are putting themselves in danger day after day to protect us. There are numerous cases and stories about police officers being attacked. How do they ensure that they are adequately protected?
The built-in redaction functions compensate for differences in the JSON and XML document models in most cases, so they behave in a consistent way regardless of document type. If you write your own redaction functions, you might need to make similar adjustments.
This example exercises all the built-in redaction functions using the sample documents from Preparing to Run the Examples. You can choose to work with either an XML rule set or a JSON rule set. The rules are equivalent in both rule sets.
The following table illustrates the effect of applying redact-number with various option combinations. For an end-to-end example, see Example: Using the Built-In Redaction Functions.
The documents are inserted into collections so they can easily be selected for redaction. The personnel collection contains all the samples. The xml-people collection includes only the XML samples. The json-people collection includes only the JSON samples.
Your redaction operation will fail if any of the rule collections contain an invalid rule or no rules. You can use the rdt:rule-validate XQuery function or the rdt.ruleValidate JavaScript function to verify your rule collections before applying them. For details, see Validating Redaction Rules.
If you run one of the following examples in Query Console using your schema database as the context database, a rule document is inserted into the database and assigned to two collections, pii-rules and security-rules.
The following procedure installs two rules, each of which uses one of the dictionaries installed in Install the Dictionaries:
The expected result of applying this rule is to remove nodes named id. For example, if //id selects an XML element or JSON property, the element or property does not appear in the redacted output. Note that, if //id selects array items in JSON, the items are eliminated, but the id property might remain, depending on the structure of the document. For more details, see conceal.
If you use the sample documents from Preparing to Run the Examples, running the script will create 4 files in the directory ./mlcp-output.
The redaction function uses the birthdate element to compute the age. If the age is less than 18, then the text in the name element is redacted. The value of the newName property in the options object is used as the replacement text.
When a pattern match is found, every redacted digit is replaced with the same character. For example, a value such as 123-456-7890 might become XXX-XXX-XXXX, depending on the configuration of the rule.
The following example masks telephone numbers selected by the path expression //ph. The parameters specify that last 4 digits of the telephone number are preserved and the remaining digits are replaced with the character X.
The evidence that a Body Worn Camera collects is invaluable when protecting your business or personnel. Being able to see exactly what a police or private security officer saw at an incident scene is of paramount importance. Having the opportunity to rewind and rewatch an event is incredibly helpful when trying to understand exactly what happened.
Use this built-in to mask values that conform to the pattern of an IP address. This function only redacts IPv4 addresses. That is, a value is redacted if it conforms to the following pattern, where N represents a decimal digit (0-9).
These instructions assume you will use the pre-installed App Server on localhost:8000 and the Documents database, which is configured to use the Schemas database. This example uses Server-Side JavaScript and Query Console to install the rule, but you can use any document insertion interface.
Rules must be installed in the schemas database associated with your content database. Rules must also be part of a collection before you can use them. This section installs rules in the Schemas database, which is the default schemas database associated with the Documents database.
The following example redacts text which has one of the following forms, where N represents a single digit in the range 0-9.
You can apply redaction rules when using the mlcp export and copy commands. Use the -redaction option to specify one or more rule collections to apply to the documents as they are read from the source database. The redaction is performed by MarkLogic on the source host.
Follow this procedure to apply the example custom redaction function using Query Console and rdt:redact. Make sure you have already installed the custom redaction module, rule, and sample documents.
This example walks you through installing and applying a custom redaction function. Two versions of the example are available, one that it JSON/JavaScript centric and another that is XML/XQuery centric. This artificial split is made to keep the example simple. You can mix XML and JSON freely with both XQuery and Server-Side JavaScript.
Deterministic masking can preserve relationships between values and facilitate searches, which can be either beneficial or undesirable, depending on the application.
A key component of a redaction rule is a redaction function specification. This function is what modifies the input nodes selected by the rule. MarkLogic provides several built-in redaction functions that you can use in your rules. For example, there are built-in redaction functions for redacting Social Security numbers, telephone numbers, and email addresses. You can also define your own redaction functions.
You can write a single XPath expression that selects nodes in both XML and JSON documents, but if you do not understand the document models thoroughly, it might not select the nodes you expect. Keep the following tips in mind:
You can use the rdt:rule-validate XQuery function or the rdt.ruleValidate Server-Side JavaScript function to test your rule collections for validity before using them. Validate your rules before deploying them to production because an invalid rule or an empty rule collection will cause a redaction operation to fail.
If you need to use namespace prefixes in the path XPath expression, define the namespace prefix binding by adding a namespaces component to your rule. For example, the following rule snippet uses an emp namespace prefix in its path value, and then defines a binding between the emp prefix and the namespace URI http://my/employees.
In this example, rules are installed and applied using Query Console. For a similar example based on mlcp, see Example: Using mlcp for Redaction in the mlcp User Guide.
The following example rule specifies that the built-in redaction function redact-us-ssn will be applied to nodes matching the XPath expression //ssn. The redact-us-ssn function accepts a level parameter that specifies how much of the SSN to mask (full or partial). Use the options section of the rule definition to specify the level.
The procedure in this section demonstrates how to use Query Console and Server-Side JavaScript to install a module in the modules database. You can also use XQuery or the Java, Node.js, and REST Client APIs for this task.
This example operates on XML documents that include personal profile data such as name, address, and date of birth. A custom XQuery redaction function is used to redact the name if the person is less than 18 years old. A rule-specific option value controls the replacement text.
The procedure outlined here makes the following assumptions. You will need to modify the procedure and example code to match your environment and application requirements.
Use this built-in to mask values that conform to one of the following patterns. These patterns correspond to typical representations for US telephone numbers. The character N in these patterns represents a single digit in the range 0 - 9.
The redacted streets values will be the same each time you export the documents because they are redacted using mask-deterministic. The redacted country values will change each time you export the documents because they are redacted using mask-random.
This function differs from the mask-random function in that it provides finer control over the masking value. Also, mask-random always generates a text node, while redact-number generates either a number node or a text node, depending on the configuration.
If you use the sample documents from Preparing to Run the Examples, running the script will create 4 files in the directory ./mlcp-output. These files will reflect the following effects relative to the input documents:
The following table illustrates the effect on the sample documents /redact-ex/person1.xml. The redacted values you observe will differ from those shown if the rule generates a value, rather than masking an existing value.
For a similar XQuery/XML example of defining and installing a rule that uses a custom function, see Example: Custom Redaction Using XQuery.
However, notice the //alias example above, which selects individual alias array items in the JSON example, rather than the entire array. If you want to redact the entire array value, you need a rule with a JSON-specific path selector. For example, a rule path such as //array-node('alias') selects the entire array in the JSON documents, resulting in a value such as the following for the alias property:
Recall that the sample documents are rooted at a element, so the rule selects the entire contents by using /person as the path value. This enables the redaction function to easily examine /person/birthdate, as well as modify /person/name.
If you want to redact the entire array value, you need a rule with a JSON-specific path selector, such as //array-node('alias'). For more details, see Defining Rules Usable on Multiple Document Formats.