In feature representation calculation, the relative importance of each feature determines its assigned weight. Features deemed more important receive higher weights, significantly influencing the final feature representation. Since these weights are computed based on a comprehensive analysis of the entire input feature set, applying the weight matrix \(\textbf{A}\) to the feature matrix \(\textbf{VX}\) results in an integrated feature representation. This representation not only preserves information from individual features but also incorporates insights from other features, capturing broader contextual dependencies on a more macro level.

The client trains the local image pairs to achieve the specular detection from visual data , which (i) efficiently avoids overestimation and improves representational ability; (ii) provides the improved detection accuracy compared to existent methods.

The numerical evaluation is conducted via a TensorFlow framework, on a desktop equipped with a 12 vCPU Intel(R) Xeon(R) Platinum 8255C CPU @2.50GHz, 40GB RAM, and Nvidia GPU RTX 2080 Ti.

He, C. et al. Camouflaged object detection with feature decomposition and edge reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 22046–22055 (2023).

We numerically evaluate our FL-AttGAN method and its four competitors using SD113, SD213 and RD13 datasets. The results show that FL-AttGAN not only displays superior highlight removal visually without introducing shadow interference like its four competitors, but it also excels in quantitative analyses across multiple metrics, including PSNR, SSIM, recall, and precision.

Liu, G. & Guo, J. Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019).

Specular highlight removal ensures the acquisition of high-quality images, which finds its important applications in stereo matching, text recognition and image segmentation. In order to prevent the leakage of images containing personal information, such as identification card (ID) photos, clients often train specular highlight removal models using local data resulting in a lack of precision and generalization of the trained model. To address this challenge, this paper introduces a new method to remove highlight in images using federated learning (FL) and attention generative adversarial network (AttGAN). Specifically, the former builds a global model in the central server and updates the global model by aggregating model parameters of clients. This process does not involve the transmission of image data, which enhances the privacy of clients; the later combining attention mechanisms and generative adversarial network aims to improve the quality of highlight removal by focusing on key image regions, resulting in more realistic and visually pleasing results. The proposed FL-AttGAN method is numerically evaluated, using SD1, SD2 amd RD datasets. The results show that the proposed FL-AttGAN outperforms existent methods.

Suo, J. et al. Fast and high quality highlight removal from a single image. IEEE Trans. Image Process. 25, 5441–5454 (2016).

where i and j represent size of image, the \(\phi\) is the feature maps of pre-trained VGG and \(\psi (\cdot )=\phi (\cdot ){\phi (\cdot )}^\mathsf{{T}}\).

The data that support the findings of this study are openly available in the public data repository at the website of SD1 (https://drive.google.com/file/d/1xpb7TbUC5JOSUhtOWdKmr2hG-VJLMIC3/view), SD2 (https://drive.google.com/file/d/1RuCd2MA_pVdQMu0NtzDCu1zCPNMsxDys/view) and RD (https://drive.google.com/file/d/1m1Ab8RMc1Booyizbyvgw8y8B709vfYeu/view).

With the rapid development of neural networks, domestic and foreign scholars use neural networks to detect highlight areas of images containing different information. The work17 trains the residual convolutional neural network (CNN) and uses weakly labeled data to locate and remove specular highlights in endoscopic images. In recent years, GAN have been well developed and used in many fields. For example, drawing inspiration from the prey–predator dynamic, He et al. develop an adversarial training framework to address the issue of limited accuracy in camouflaged object detection 18. This framework focuses on the dynamic confrontation between the generator and the detector. As the generator continuously introduces new challenges, the detector should enhance its performance to meet the challenges posed by the generator. Zhu et al. propose a method to remove highlights in facial images19. Following the structure of conditional GAN, they take face images with specular highlights as conditions and predict the corresponding images without highlights. At the same time, a novel mask loss is introduced through highlight detection, aiming to make the network focus more on highlight areas. These methods usually remove specular highlights from medical images and specific object images, but cannot handle images with text. To solve this challenging problem, the work13 utilizes dual-network structure, namely a highlight detection network and a highlight removal network to remove images with text information.

Zhu, T. et al. Highlight removal in facial images. In Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16–18, 2020, Proceedings, Part I 3 422–433 (Springer, 2020).

For each client, we integrate an attention mechanism into the GAN-based framework, enabling the model to focus on highlighted areas and their surrounding detailed features, thereby enhancing the model’s representational capability. This effectively removes highlights while preserving image details.

Zhou, F., Ye, Y. & Song, Y. Image segmentation of rectal tumor based on improved u-net model with deep learning. J. Signal Process. Syst. 94, 1145–1157 (2022).

Mothukuri, V. et al. A survey on security and privacy of federated learning. Future Gener. Comput. Syst. 115, 619–640 (2021).

Furthermore, to simulate the multi-user highlight removal model training task via federated learning, we distributed the training samples across 10 devices. Each device was allocated a different number of images with highlights.

For the highlight removal network, we employ an encoder–decoder architecture as the core framework, with skip connections integrated between the encoder and the decoder. We add self-attention mechanism in the feature extraction stage of the encoder to help the model focus on highlight areas, which is particularly important to maintain the accuracy and authenticity of the generated images. The advanced encoder is responsible for capturing the information of the image, gradually reducing the spatial dimension and increasing the feature depth through a series of convolutional layers and pooling layers. The decoder increases the dimensionality of the input image features and gradually constructs a highlight-free image.

Mail Only C/O RVL, Parkins, Moor Park House, Clifford Lister Business Centre Bawtry Road, Wickersley Rotherham South Yorkshire S66 2BL United Kingdom Mailing address only. No cash sales or visitor permitted at these premises.

Although experiments have proven the feasibility of these methods, they still have problems that need to be solved: (i) current methods lack adequate characterization capabilities. That is to say, they fall short in fully removing highlights, leading to the omission of crucial details from the original image. (ii) the training of deep learning networks usually requires a large amount of data, but the data collected by each client is limited, resulting in the lack of generalization capabilities of the local model; (iii) there is a risk that images containing private information may be disclosed during the collection and processing.

Cai, S. et al. Gan-based image-to-friction generation for tactile simulation of fabric material. Comput. Graph. 102, 460–473 (2022).

(3) discriminator: Based on the patchGAN structure, our discriminator \({\text{G}}\) focuses on discriminating local regions (“patch”) of the image rather than the entire image. This approach allows the discriminator to more precisely distinguish the differ between the generated highlight-free image \(\hat{\textbf{{I}}}{^{n}}\) and real highlight-free image \({\textbf{{I}}}{^{n}}\), enhancing the model’s sensitivity to details.

In our local model, AttGAN, we employ attention mechanisms and adversarial strategies to enhance the precision of highlight removal. Specifically, the attention mechanism focuses more precisely on highlighted areas, effectively removes highlights while preserving the original image details; the adversarial strategy centers on refining the generator’s output through a consistent feedback loop from the discriminator, which allows the generator to learn from its mistakes and progressively enhance its ability to produce visually convincing and realistic images, thereby achieving high-performance highlight removal.

Song, J. & Ye, J. C. Federated cyclegan for privacy-preserving image-to-image translation. Preprint at http://arxiv.org/abs/2106.09246 (2021).

In addition to privacy concerns, the accuracy of the highlight removal model on each client also needs to be improved. Recently, the deep learning method11,12 has been well developed for removing highlight, but they encounter the following challenge: existent methods have insufficient characterization capabilities. Tacking generative adversarial network (GAN)13 as an example, it cannot completely eliminate highlights, resulting in the loss of key information of the original image. To address this problem, we integrate an attention mechanism into the GAN-based framework. This enhancement allows the model to focus more effectively on highlighted areas and their surrounding detailed features. The attention mechanism captures spatial contextual information, focuses precisely on highlighted areas and effectively removes them while preserving the original details of the image.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

In addition to the intuitive comparison of the effects of highlight removal using five different methods, we also compare the text recognition performance of images processed by these methods. As depicted in Fig. 4, it is clearly observed that images with highlights that are not processed exhibit low text recognition performance, with the parts obscured by highlights failing to be normally recognized, detecting only 14 pieces of text information. Images processed with other methods achieve limited highlight removal, detecting 15 pieces of text data. Meanwhile, FL-AttGAN, utilizing its attention mechanism to focus on the highlight areas, successfully detects 16 pieces of text data completely.

(2) highlight removal network The highlight removal network \({\text{R}}\) is employed to eliminate the specular highlight and restore the textual content. Therefore, we take the generated mask \(\hat{\textbf{{M}}}{^{n}}\) as input to generate the corresponding highlight-free image \(\hat{\textbf{I}}^{n}\):

Subsequently, the attention weights matrices \(\textbf{A}\) are calculated by softmax function to get the attention distribution of the feature map \(\textbf{z}_l^{\textsf{T}}\).

FL is a new type of artificial intelligence (AI) that makes great contributions to data privacy and security by keeping data localized, reducing attack vectors, and supporting advanced privacy-enhancing technologies9. A key feature of FL is that data remains on local devices, eliminating the need to transmit sensitive information over the network to a central server. This aligns closely with data protection regulations such as the General Data Protection Regulation (GDPR)24, which emphasize data minimization and localization. Unlike centralized data storage that is vulnerable to cyberattacks, distributed data storage under the FL framework makes such attacks less rewarding and more difficult. Furthermore, FL can integrate cutting-edge encryption techniques, including secure multi-party computation (SMPC) and homomorphic encryption, which ensure that model updates transmitted across the network remain secure. These methods prevent unauthorized access and eavesdropping, safeguarding the privacy of the data throughout the process.

The server uses federated averaging algorithm to aggregate the local parameters to produce new global parameters for round \(t\in \left\{ 1,...,T\right\}\);

In encoder, the feature map \(\textbf{z}_l^{\textsf{T}}\) output by the l-th layer is input into the attention mechanism to mine the correlation in visual features. The attention mechanism mainly utilizes three learnable parameter matrices: \(\textbf{W}_Q\), \(\textbf{W}_K\) and \(\textbf{W}_V\). Initially, these matrices linearly convert the input \(\textbf{z}_l^{\textsf{T}}\) into corresponding attention feature matrices named \(\textbf{QX}\), \(\textbf{KX}\), and \(\textbf{VX}\).

For the global model, we apply the FL framework. Firstly, it defines the global removal network at a central server. Each client then trains the local model and transmits the updated model parameters to the central server for aggregation, rather than the raw data itself. The use of federated learning greatly reduces the risk of data breaches and ensures that sensitive information remains on the client’s device.

Following the standard convolutional neural network, each downsampling layer of the discriminator consists of a convolution module, batch normalization, and activation function. The downsampling layer reduces the spatial dimension of the image through convolution operations, which helps the model capture more abstract and high-level features. Batch normalization is applied to standardize the activation values of each layer’s input, which not only accelerates the training process but also enhances model stability and mitigates the issue of covariate shift. We use Leaky ReLU as the activation function. Different from the standard ReLU activation function, it allows a non-zero gradient when the input is negative, which can promote the activation of neurons.

Zheng, Y., Gao, Y. Specular highlight removal by federated generative adversarial network with attention mechanism. Sci Rep 14, 23472 (2024). https://doi.org/10.1038/s41598-024-74229-3

The attention mechanism is a powerful concept in machine learning, particularly in the field of natural language processing and computer vision. At its core, the attention mechanism allows a model to focus on specific parts of the input data when making predictions, rather than processing all input data with equal importance. This selective focus enables the model to prioritize more relevant or important information, thereby improving the overall performance of tasks such as translation, image captioning, and text summarization.

Qiao, Z. et al. Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 13528–13537 (2020).

Given this setup, we suppose that N pairs of images are collected, including images with specular highlight \({\mathscr {S}} {\hspace{1.0pt}} = \{\textbf{S}{^{1}}{,} \ldots ,{\textbf{S}}{^{N}}\}\), the corresponding highlight-free images \({\mathscr {I}}{\hspace{1.0pt}} {\hspace{1.0pt}} = \{\textbf{I}{^{1}}, \ldots ,\textbf{I}{^{N}}\}\) and binary mask images indicating the location of highlight \({\mathscr {M}}{\hspace{1.0pt}} {\hspace{1.0pt}} = \{\textbf{M}{^{1}}{,} \ldots ,{\textbf{M}}{^{N}}\}\).

Before presenting FL-AttGAN, we first define relevant notations and preprocess the RGB images and friction coefficients. We consider a set of clients \(c\in \left\{ 1,2,\dots ,C\right\}\) and use \(\mathscr {D}^c\) to denote the local dataset of client c. The complete dataset containing all surface material data is composed of data from each client c, which can be denoted by \(\mathscr {D}\triangleq \bigcup _c{\mathscr {D}^c}\).

Inspired by these efforts, we offer a novel solution to the challenge of removing highlights from images by integrating attention mechanisms into GAN. Our attention-directed GAN not only reduces the presence of highlights, but also ensures that the reconstructed image retains its original texture and detail, enabling a more effective highlight removal process.

This work was supported by the Guangdong Provincial Department of Education under the 2023 Key Research Projects for Ordinary Higher Education Institutions.

Goddard, M. The eu general data protection regulation (gdpr): European regulation that has a global impact. Int. J. Mark. Res. 59, 703–705 (2017).

The RD dataset is a real-world dataset comprising 2025 image pairs: each pair consists of an image with text-aware specular highlights, the corresponding highlight-free image, and a binary mask image indicating the location of the highlights. The image content includes ID cards and driver’s licenses, which contain substantial textual information. In the data collection process, a transparent plastic film was placed on the images, and lighting was turned on. The camera then captured the images to obtain those with highlights. Correspondingly, highlight-free images were obtained by turning off the lighting. By adjusting the position of the plastic film, images with varying levels of illumination were acquired. The dataset is randomly divided into training, validation, and testing sets with a ratio of 8:1:1.

Covert hideaway strobe lights are a super compact, self-contained LED strobe light that is suitable for multiple applications. These directional vehicle warning lights can be used for surface mounting or internal mount within composite head lights, cornering lamps, tail lights, and many other light assemblies. The self-contained LED hideaway design utilizes an in-line driver/flasher that simplifies installation and excludes the need of having an external flasher or power supply.

Hou, S. et al. Text-aware single image specular highlight removal. In Pattern Recognition and Computer Vision: 4th Chinese Conference, PRCV 2021, Beijing, China, October 29–November 1, 2021, Proceedings, Part IV 4 115–127 (Springer, 2021).

In Fig. 3, we visually compare the highlight removal effectiveness of our FL-AttGAN against four competitors, using several images as examples. We can observe that for images with highlights, these often obscure and impact text legibility, making accurate identification challenging even for the human eye. Although images processed by other methods show some restoration of text clarity and successful highlight removal, each image exhibits partial shadows, which may interfere with subsequent text recognition by the Paddle OCR network. In contrast, images processed by FL-AttGAN not only effectively remove highlights but also avoid introducing new shadows that could impede text recognition. This advantage is due to the attention mechanism within FL-AttGAN, which can effectively identify and focus on areas of high brightness in the image. Training with generative adversarial networks enhances the capability to detect and remove highlighted regions, thus improving the overall image clarity for optical character recognition tasks.

Al-Ars, Z. et al. Almarvi system solution for image and video processing in healthcare, surveillance and mobile applications. J. Signal Process. Syst. 91, 1–7 (2019).

Li, S. et al. Whu-stereo: A challenging benchmark for stereo matching of high-resolution satellite images. IEEE Trans. Geosci. Remote Sens. 61, 1–14 (2023).

Klinker, G. J., Shafer, S. A. & Kanade, T. Using a color reflection model to separate highlights from object color. In Image Understanding Workshop: Proceedings of a Workshop Held at Los Angeles, California, February 23-25, 1987, vol. 2, 614 (Morgan Kaufmann Publishers, 1987).

He, C. et al. Weakly-supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping. Adv. Neural Inf. Process. Syst. 36, 1 (2024).

Meka, A. et al. Lime: Live intrinsic material estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 6315–6324 (2018).

Each element within \(\hat{{\textbf{M}}}^{n}\) has a value between 0 and 1, with higher values indicating a greater likelihood that the corresponding region of image \({{\textbf{S}}}^{n}\) is influenced by specular highlights. Since \({\textbf{{S}}}{^{n}}\) and \(\hat{\textbf{{M}}}{^{n}}\) have identical dimensions, the highlight detection network adopts a fully convolutional structure (FCN), featuring three downsampling layers and three upsampling layers. Each upsampling layer is followed by three convolutional layers, while each downsampling layer is followed by two convolutional layers.

Table 3 presents the performance results of the highlight removal network with federated learning, compared to the highlight removal network without federated learning as shown in Table 2. The performance metrics indicate a slight decrease with federated learning. However, the results in Table 2 assume that all data resides on a single client, whereas, in practice, the data is distributed across multiple clients. By employing federated learning algorithms, we can leverage data from multiple clients to train a global model. The performance in this federated learning scenario is comparable to that in the ideal scenario where data is assumed to be uniformly distributed on a single client.

In this paper, we propose a highlight removal method based on federated learning (FL) and attention generative adversarial network (AttGAN). This method (FL-AttGAN) can provide an improved performance, since: (i) federated learning avoids the transfer of raw data, and constructs a global model solely through the aggregation of client’s model parameters, thereby protecting sensitive image data; (ii) the local AttGAN incorporates both attention mechanisms and adversarial strategies, which ensures that the local model effectively removes highlights as accurately as possible. The main contributions are summarized as follows:

Furthermore, we conduct an evaluation of the proposed local highlight removal network AttGAN by comparing it to three existent methods using the SD1, SD2, and RD datasets. The first of these methods is the UNet method 28, the second is the CycleGAN method 29, and the third is the HQG-Net method 30. Additionally, to provide a comprehensive comparison, we evaluated the SSIM, PSNR, recall, and precision of highlight images (denoted as “Light”) on the same datasets, which serve as benchmarks. The results are presented in Table 2. From the PSNR and SSIM results, it can be observed that, compared to state-of-the-art methods, AttGAN effectively improves the quality of highlight-removed images. This improvement can be attributed to its specially designed network architecture, the attention mechanism for feature capture, and the enhanced quality of generated images achieved by the discriminator. Consequently, this leads to an improvement in the precision and recall of text recognition. However, the precision does not show a significant difference, which is likely due to the inherently high precision of PaddleOCR; once the text information is successfully detected and recalled, high-performance text recognition is achieved.

The network architecture is divided into three principal sub-networks: the highlight detection network (\({\text{H}}\)), the highlight removal network (\({\text{R}}\)), and the discriminator (\({\text{G}}\)). \({\text{H}}\) utilizes a fully convolutional architecture, featuring three layers each of downsampling and upsampling. Each downsampling layer is followed by two convolutional layers, while each upsampling layer is succeeded by three convolutional layers. \({\text{R}}\) is configured as an encoder-decoder network with skip connections, including two downsampling layers, one attention block to focus on salient features, four residual blocks to enhance feature propagation without loss, and two upsampling layers to reconstruct the output. The discriminator \({\text{G}}\) comprises a convolutional layer followed by five downsampling layers with a kernel size of 5 and stride of 2, with spectral normalization applied throughout to stabilize the training process.

Although these methods can remove highlights, they have many shortcomings: (i) they are only effective under specific conditions, thus limiting their practicality in different environments and materials; (ii) when the image has complex textures or the highlight area is too large, they will encounter significant challenges and cannot Remove highlights completely.

We perform a numerical evaluation on the proposed FL-AttGAN using the SD1, SD2 and RD datasets 13. The SD1 dataset comprises 3679 original images containing text from supermarkets and streets, while the SD2 dataset includes 2025 original images of identity and driver’s licenses, which contain extensive textual information that necessitates heightened privacy protection. Utilizing the 3D computer graphics software Blender’s Cycles rendering engine, 27,700 sets of images were automatically generated, each set featuring specular highlights, alongside corresponding non-specular images and specular masks. Highlight shapes such as circular, triangular, elliptical, and annular were included, with lighting intensities randomly set within the [40,70] range to simulate real-world lighting conditions. The CTPN text detection model identified text regions where specular highlights were specifically applied to emphasize the text areas in the images. The SD1 dataset randomly divides samples into training, validation, and testing sets in an 8:1:1 ratio. The SD2 dataset follows the same partitioning strategy.

Kong, W. et al. A practical solution for non-intrusive type ii load monitoring based on deep learning and post-processing. IEEE Trans. Smart Grid 11, 148–160 (2019).

We train the FL-AttGAN network for highlight removal across 10 local devices, configuring the system to operate with local epochs \(E=3\). The learning rates \(\alpha\), \(\beta\), and \(\gamma\) for the highlight detection network, highlight removal network, and discriminator, respectively, are all set to \(0.0001\), and a batch size of \(b=4\). Training extended over 10,000 communication rounds. Utilizing the TensorFlow framework, the network processes all images at a resolution of \(512 \times 512\).

All authors contributed to this study, including the methodology, experiments, and analysis. All authors read and approved the final manuscript.

The clients use local data and initial global parameters to train the local AttGAN for certain steps of gradient updates and then communicate the trained local parameters to the server.

In this paper, we have developed the FL-AttGAN method for highlight removal, providing a novel approach for multi-user text highlight removal under the premise of big data security. Unlike existing methods,we employ an attention mechanism that can more precisely focus on highlight areas and their surrounding detail features, thereby enhancing the model’s representational capability. This approach effectively removes highlights while preserving the details of the image. Furthermore, we employ federated learning to reduce the risk of data breaches and ensure that sensitive information remains on the client’s device. Evaluations conducted on the SD1, SD2 and RD datasets demonstrate that our method achieves optimal results in image highlight removal.

The detection loss is used to evaluate the error between the generated mask \(\hat{\textbf{{M}}}{^{n}}\) and true mask \({\textbf{{M}}}{^{n}}\).

Su, P., Zhao, H. & Wang, Y. A novel model based on big data environment for text content security recognition. J. Signal Process. Syst. 1, 1–14 (2024).

In this study, we introduce a dual-stage framework designed to identify and eliminate specular highlights from text images. The complete architecture is depicted in Fig. 2.

In the era of unprecedented data growth, the demand for image data has surged across various sectors, particularly sensitive ones such as digital forensics1, medical diagnostics2, and surveillance3. However, the acquisition of high-quality images is consistently challenged by real-world complexities, such as highlight illumination, which can significantly degrade the quality in highlighted regions. This degradation adversely affects the performance of computer vision tasks that require high-quality inputs, such as stereo matching4, text recognition5,6 and image segmentation7,8. Consequently, developing models that effectively remove highlights is essential. Because image data frequently contains personal information (e.g., identification cards, IDs), safeguarding data privacy is of paramount importance. Limited by laws, regulations, trade secrets, and personal privacy, clients are constrained to use only local data to train their highlight removal models, leading to a “data island” phenomenon. Models trained in this manner not only lack generalization ability but also are less accurate due to the limited amount of training data.

where \(\mu _f\) and \(\mu _k\) are the f-th layer feature map of CTPN and the k-th layer feature map of the DenseNet, respectively.

Funke, I. et al. Generative adversarial networks for specular highlight removal in endoscopic images. In Medical Imaging 2018: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10576, 8–16 (SPIE, 2018).

Highlight detection methods have been well developed in recent years. Klinker et al. use the color space analysis to divide color pixels into diffuse pixels, highlight pixels and saturated pixels14. To separate these components, they explore convex polygon fitting techniques and establish a connection between the color space and the dichromatic illumination model (DIM), fitting the highlight component into the dichromatic plane. Based on this model, work15 proposes an analytical solution for highlight removal based on the \(\text{L}_2\) chromaticity definition and DIM. This method involves few complex calculations and therefore removes highlights from images quickly and efficiently. But these methods usually only work on a single image. Su et al. propose a method to remove specular highlights from multi-view facial images16. They exploit Lambertian consistency to provide non-negative constraints on light and shadow in all directions, and further facilitate highlight removal by using orthogonal subspace projections.

(4) parameter determination: To optimize the performance of the highlight detection network, highlight removal network and discriminator, we consider a mini-batch of B and establish following loss functions as constraints.

After T communication rounds, the server aggregates a global AttGAN model that is capable of generating highlight-free images.

He, C. et al. Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects. Preprint at http://arxiv.org/abs/2308.03166 (2023).

This iterative process ensures that the model is collaboratively trained across all clients without sharing sensitive local data, reducing the risk of data breaches.Further detailed procedure of FL-AttGAN is presented in Algorithm 1.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Shafaghi, H. et al. A fast and light fingerprint-matching model based on deep learning approaches. J. Signal Process. Syst. 95, 551–558 (2023).

In recent years, an increasing number of studies have combined FL with machine learning, deep learning, and data mining to protect images10. In order to achieve private transmission of facial images between different users, Yang et al. proposed a facial-image privacy protection method based on FL and ensemble models25. The clients obtain a local face recognition model through local training. And the server integrates the clients’ face recognition model. Experiments show that this method generates high-quality, transferable, and private face images more efficiently than other existent methods. To address confidentiality and privacy issues in shared medical data, the work26 uses the public repository the cancer genome atlas (TCGA) dataset to simulate a distributed environment and applies a differentially private FL framework to analyze histopathology images.

(1) highlight detection network The highlight detection network \({\text{H}}\) processes the image \({{\textbf{S}}}{^{n}}\) with specular highlights, and generates a mask \(\hat{\textbf{{M}}}{^{n}}\) that identifies highlight regions, where \(n\in \{1,2,...,N\}\).

Barrachina, J. A. et al. Comparison between equivalent architectures of complex-valued and real-valued neural networks-application on polarimetric sar image segmentation. J. Signal Process. Syst. 95, 57–66 (2023).

The server initializes and defines the AttGAN model, which comprises a highlight detection network \({\text{H}}\) and highlight removal network \({\text{R}}\) and a discriminator \({\text{G}}\), and broadcasts the initial global network parameters \(\textbf{w}_{0,h}\), \(\textbf{w}_{0,r}\) and \(\textbf{w}_{0,g}\) to all participating.

We conduct an evaluation of the proposed FL-AttGAN by comparing it to four other methods using the same federated framework on the SD1, SD2, and RD datasets. Among these methods, the first is the federated GAN method 31 (denoted as “FL-GAN”). The second method is the federated CycleGAN method 29 (denoted as “FL-CycleGAN”). The third method is the federated UNet method 28 (denoted as “FL-UNet”). The fourth method combines federated learning with HQG-Net 30 (denoted as “FL-HQG-Net”). Additionally, for comparison, the SSIM, PSNR, Recall, and Precision of highlight images (denoted as “Light”) are also tested on the same datasets, serving as benchmarks.

In this work, we focus on recovering text that is occluded by highlights. To this end, we apply pre-trained text detection and recognition models to supervise text recovery.

In recent years, many researchers have explored the potential of attention mechanisms to solve a variety of difficult problems, further extending their application in natural language processing (NLP), computer vision 20 and other fields. He et al. propose a feature decomposition and edge reconstruction model for camouflage object detection 21. The model utilizes a frequency attention module and a guide-based feature aggregation module to mine subtle cues that distinguish foreground and background. The work 22 proposes a novel end-to-end network with attention mechanism for automatic facial expression recognition. The introduction of the attention mechanism can make neural networks more focus on useful functions. In NLP, the work23 proposes a novel unified architecture that includes bidirectional long short-term memory, attention mechanism, and convolutional layer. The architecture captures the local features of the phrase as well as the global sentence semantics.

Furthermore, we detect text quality by calculating the metrics recall and precision, and utilize PSNR and SSIM to evaluate visual quality.

Yang, J. et al. Transferable face image privacy protection based on federated learning and ensemble models. Complex Intell. Syst. 7, 2299–2315 (2021).

All data used in this study were sourced from publicly available databases13, and all information is generated by software, involving no real personal data. Therefore, there are no ethical or privacy concerns associated with this research.

By using these loss functions, the parameters of the networks \(\textbf{w}_h^c\), \(\textbf{w}_r^c\) and \(\textbf{w}_g^c\) are updated by stochastic gradient descents (SGD).

As a distributed machine learning framework, federated learning (FL) enables multiple clients to collaboratively train models while adhering to stringent privacy protection, data security, and regulatory requirements9. This framework is extensively utilized in fields such as healthcare, the internet of things, and intelligent transportation. Specifically, FL builds a global model on a central server and starts the local training process on the client side. The client sends updated local parameters to the central server for aggregation after training. Finally, a global model with generalization ability is obtained. In this process, there is no need to upload local data to the server, thus ensuring privacy security and avoiding the risk of sensitive data being attacked10. Additionally, the efficiency of data utilization is greatly improved by client cooperation.

where \(\alpha\), \(\beta\) and \(\gamma\) are the learning rates of highlight detection network, highlight removal network and discriminator, respectively.

where act is the activation function. The output \(\textbf{A}_{out}\) is used as the input for feature extraction of the next convolutional layer to enhance the feature extraction effect.

He, C. et al. Hqg-net: Unpaired medical image enhancement with high-quality guidance. IEEE Trans. Neural Netw. Learn. Syst. (2023).

To further quantify the performance of generation, we employ PSNR, SSIM, Recall, and Precision to evaluate the highlight removal and text detection performance for different methods in the SD1, SD2 and RD datasets, with the results presented in Table 3. By observation, consistent with the results of visual analysis, our FL-AttGAN approaches demonstrate advantages over other methods across the four evaluation metrics on the three datasets, particularly in FSNR and Recall rates, which are the metrics most affected by highlights. In terms of precision, the slight advantage is due to the inherently high accuracy of paddleOCR; once we successfully detect and recall the text information, high-performance text recognition is achieved. Furthermore, FL-UNet demonstrates the lowest performance, attributed to its structure having only a single generator without additional mechanisms for performance enhancement.

First, we assess the performance of FL-AttGAN in the task of highlight removal. In the absence of federated learning, we compare the proposed local highlight removal network AttGAN with several variants: AttGenetor, which removes the discriminator network (\({\text{G}}\)); a GAN network without the attention mechanism; and a generator network that excludes both the discriminator and the attention mechanism. The results of the highlight removal performance are shown in Table 1. The results indicate that each module of AttGAN is indispensable. The generator without the attention mechanism and discriminator exhibits the lowest performance metrics in terms of Precision, Recall, PSNR, and SSIM. The inclusion of the discriminator improves these four evaluation metrics, while the addition of the attention mechanism (in the variant AttGenetor) plays an even more significant role. When both the discriminator and the attention mechanism are employed in the AttGAN approach, the best overall evaluation metrics are achieved. This confirms that the attention mechanism effectively captures important features, and the discriminator significantly enhances the quality of the generated images.