📝 Physical Invisible Backdoor Based on Camera Imaging 🔭 "By using the camera imaging process and leveraging the CFA interpolation algorithm and the camera fingerprint feature, the proposed method can implement a physical invisible backdoor attack without changing nature pixels of the images." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.07428v1 #arxiv https://creative.ai/system/media_attachments/files/111/071/845/016/249/555/original/0f581ac9a76ae8e7.jpg https://creative.ai/system/media_attachments/files/111/071/845/100/453/176/original/210e0fd9d96ffd0c.jpg https://creative.ai/system/media_attachments/files/111/071/845/163/616/272/original/78742d691dcbbb84.jpg https://creative.ai/system/media_attachments/files/111/071/845/224/782/382/original/32bdf85e2445e1af.jpg
📝 Masked Diffusion with Task-Awareness for Procedure Planning in Instructional Videos 🔭 "The introduced mask acts akin to a task-oriented attention filter, enabling the diffusion/denoising process to concentrate on a subset of action types that are pertinent to the given task." [gal30b+] 🤖 #CV ⚙️ https://github.com/ffzzy840304/Masked-PDPP 🔗 https://arxiv.org/abs/2309.07409v1 #arxiv https://creative.ai/system/media_attachments/files/111/071/549/817/427/736/original/b33c9fc202f5dea6.jpg https://creative.ai/system/media_attachments/files/111/071/549/870/341/872/original/03f227fadfb3966a.jpg https://creative.ai/system/media_attachments/files/111/071/549/933/645/294/original/fe8eab4106812a57.jpg
📝 Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance 🔭 "By estimating Dirichlet concentration parameters for singletons, comprehensive subjective opinions, including confusion and ignorance, could be achieved via further evidence combinations, which enables flexible visual recognition with uncertainty quantification." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.07403v1 #arxiv https://creative.ai/system/media_attachments/files/111/071/195/895/528/430/original/8521d2b1410adc43.jpg https://creative.ai/system/media_attachments/files/111/071/195/956/616/153/original/d4fc2d1788278a85.jpg https://creative.ai/system/media_attachments/files/111/071/196/081/021/533/original/a2c1673a0b828141.jpg https://creative.ai/system/media_attachments/files/111/071/196/144/923/567/original/17d0f4948fb2617a.jpg
📝 Judging a Video by Its Bitstream Cover 🔭 "Develops a bitstream-based classifier that uses the MPEG-1/2/4 bitstream of a video to classify them into distinct categories such as Sport, Music Video, and Animation." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.07361v1 #arxiv https://creative.ai/system/media_attachments/files/111/070/960/061/006/342/original/74e2a1fe95230b2c.jpg https://creative.ai/system/media_attachments/files/111/070/960/124/793/210/original/8359147ae6889fc1.jpg https://creative.ai/system/media_attachments/files/111/070/960/275/064/408/original/c12c69c3e6f54d59.jpg https://creative.ai/system/media_attachments/files/111/070/960/350/112/899/original/4049b29e1d102738.jpg
📝 Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection 🔭 "MMHL is comprised of supervised and self-supervised loss functions which utilize semantic features from different modalities and reduce the distance between RGB and thermal features, respectively, during saliency map generation." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.07297v1 #arxiv https://creative.ai/system/media_attachments/files/111/070/665/197/004/891/original/8c2d680dc2440f00.jpg https://creative.ai/system/media_attachments/files/111/070/665/265/148/458/original/0bdd98c60c6ffc3a.jpg https://creative.ai/system/media_attachments/files/111/070/665/331/558/427/original/4448f388a5844867.jpg https://creative.ai/system/media_attachments/files/111/070/665/405/402/709/original/ac545cb71fa1f7b2.jpg
📝 Unbiased Face Synthesis with Diffusion Models: Are We There Yet? 🔭🧠 "Consists of several qualitative and quantitative measures, including embedding-based metrics and user studies, to audit the characteristics of generated faces conditioned on a set of social attributes." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.07277v1 #arxiv https://creative.ai/system/media_attachments/files/111/070/370/308/001/348/original/93ad2759578a4be1.jpg https://creative.ai/system/media_attachments/files/111/070/370/386/856/264/original/fba4e5943e2c8126.jpg https://creative.ai/system/media_attachments/files/111/070/370/470/908/439/original/dce1931a5f910fc9.jpg https://creative.ai/system/media_attachments/files/111/070/370/562/017/919/original/ceffeea5a5681383.jpg
📝 LCReg: Long-Tailed Image Classification with Latent Categories Based Recognition 🔭 "Learns a set of class-agnostic latent features shared by both head and tail classes, and then use semantic data augmentation on the latent features to implicitly increase the diversity of the training sample." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.07186v1 #arxiv https://creative.ai/system/media_attachments/files/111/070/075/445/960/573/original/23c8b3f689fe0ac5.jpg https://creative.ai/system/media_attachments/files/111/070/075/508/686/924/original/61f65f5cd12d7e8f.jpg https://creative.ai/system/media_attachments/files/111/070/075/586/292/168/original/42491090266aff51.jpg
📝 Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch 🧠🔭 "DiffAug first mines sufficient prior semantic knowledge about the neighborhood to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge, while ensuring that the augmented and original data share a smoothed latent space." [gal30b+] 🤖 #LG #CE #CV ⚙️ https://github.com/zangzelin/DiffAug 🔗 https://arxiv.org/abs/2309.07909v1 #arxiv https://creative.ai/system/media_attachments/files/111/069/780/110/041/416/original/87de421ab477c80a.jpg https://creative.ai/system/media_attachments/files/111/069/780/199/868/301/original/c18c6fe7f9457edd.jpg https://creative.ai/system/media_attachments/files/111/069/780/267/517/617/original/74461425fd4fe0a4.jpg https://creative.ai/system/media_attachments/files/111/069/780/328/602/393/original/9ecff75f5c0e944a.jpg
📝 Hardening RGB-D Object Recognition Systems Against Adversarial Patch Attacks 🔭 "Finds that RGB features make the functions learned by the network more complex and, thus, more sensitive to small perturbations, compared to depth features, which have been proved to be more invariant to small transformations." [gal30b+] 🤖 #CV #CR ⚙️ https://github.com/Trusted-AI/adversarial-robustness-toolbox/ 🔗 https://arxiv.org/abs/2309.07106v1 #arxiv https://creative.ai/system/media_attachments/files/111/069/486/804/016/368/original/d8758964763bcdbe.jpg https://creative.ai/system/media_attachments/files/111/069/486/883/529/348/original/397e5cacc99c6ec9.jpg https://creative.ai/system/media_attachments/files/111/069/486/958/608/683/original/4dac0596d0a9a944.jpg https://creative.ai/system/media_attachments/files/111/069/487/031/925/256/original/9a61ba9a1f285e88.jpg
📝 FAIR: Frequency-Aware Image Restoration for Industrial Visual Anomaly Detection 🔭 "FAIR is a novel self-supervised image restoration task that restores images from their high-frequency components, which enables precise reconstruction of normal patterns while mitigating unfavorable generalization to anomalies." [gal30b+] 🤖 #CV ⚙️ https://github.com/liutongkun/FAIR 🔗 https://arxiv.org/abs/2309.07068v1 #arxiv https://creative.ai/system/media_attachments/files/111/069/014/874/646/522/original/ca0c7236cc5f3df7.jpg https://creative.ai/system/media_attachments/files/111/069/014/936/129/353/original/324c1ababcfa20a1.jpg https://creative.ai/system/media_attachments/files/111/069/014/994/662/446/original/b49aded915575c9e.jpg https://creative.ai/system/media_attachments/files/111/069/015/052/093/066/original/a59fca6d01a9f9ed.jpg
📝 Aggregating Long-Term Sharp Features via Hybrid Transformers for Video Deblurring 🔭 "A window-based local Transformer is employed for exploiting features from neighboring frames with cross attention, and a global multi-scale Transformer is utilized to aggregate long-term sharp features." [gal30b+] 🤖 #CV ⚙️ https://github.com/shangwei5/STGTN 🔗 https://arxiv.org/abs/2309.07054v1 #arxiv https://creative.ai/system/media_attachments/files/111/068/778/981/350/924/original/679c0a24cccc5032.jpg https://creative.ai/system/media_attachments/files/111/068/779/035/864/156/original/c045293f1cce34be.jpg https://creative.ai/system/media_attachments/files/111/068/779/125/648/905/original/85a1bb0bf9040d7c.jpg https://creative.ai/system/media_attachments/files/111/068/779/178/957/523/original/3a7dde3e3bf4139e.jpg
📝 Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning 🔭 "A margin-based Prototypical Contrast Learning embedding network reaps the benefits of prototype-data (cluster quality enhancement) and implicit data-data (fine-grained representations) interaction while providing substantial cluster supervision to the embedding network and the generator." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06987v1 #arxiv https://creative.ai/system/media_attachments/files/111/068/425/025/710/612/original/4451f3ebfe689c84.jpg https://creative.ai/system/media_attachments/files/111/068/425/111/954/500/original/dabc710c444664a2.jpg https://creative.ai/system/media_attachments/files/111/068/425/172/370/165/original/09b27994a389da11.jpg https://creative.ai/system/media_attachments/files/111/068/425/248/166/865/original/dfafcd0b4175cb52.jpg
📝 Differentiable JPEG: The Devil Is in the Details 🔭 "A novel differentiable JPEG approach is proposed, overcoming the limitations of existing methods: Differentiable wrt the input image, the JPEG quality, the quantization tables, and the color conversion parameters." [gal30b+] 🤖 #CV #MM ⚙️ https://github.com/necla-ml/Diff-JPEG 🔗 https://arxiv.org/abs/2309.06978v1 #arxiv https://creative.ai/system/media_attachments/files/111/068/189/392/660/380/original/a4309c4f3e6aaf00.jpg https://creative.ai/system/media_attachments/files/111/068/189/452/588/617/original/a630a23cf1e84cc7.jpg https://creative.ai/system/media_attachments/files/111/068/189/510/222/692/original/1ce6342063bff45c.jpg https://creative.ai/system/media_attachments/files/111/068/189/577/173/964/original/bcceb990cd72e8c2.jpg
📝 Neural Network-Based Coronary Dominance Classification of RCA Angiograms 🔭 "Employs convolutional neural network ConvNext and Swin transformer for 2D image (frames) classification along with a majority vote for cardio angiographic view classification." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06958v1 #arxiv https://creative.ai/system/media_attachments/files/111/068/071/217/697/404/original/5a052a82d3c7274f.jpg https://creative.ai/system/media_attachments/files/111/068/071/274/519/477/original/46ced6b924e05c24.jpg https://creative.ai/system/media_attachments/files/111/068/071/330/330/969/original/44c9bc1a4fe2ebe4.jpg https://creative.ai/system/media_attachments/files/111/068/071/391/888/767/original/4646be27f8c4f8bf.jpg
📝 DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models 🔭 "DreamStyler optimizes a multi-stage textual embedding with a context-aware text prompt, resulting in prominent image quality and flexibility to accommodate a range of style references." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06933v1 #arxiv https://creative.ai/system/media_attachments/files/111/067/894/224/260/617/original/cc7de3bf3e362749.jpg https://creative.ai/system/media_attachments/files/111/067/894/349/591/312/original/65a455ffca967b94.jpg https://creative.ai/system/media_attachments/files/111/067/894/412/617/112/original/9e5d21ace9ca7c77.jpg https://creative.ai/system/media_attachments/files/111/067/894/519/752/442/original/3533eed6382a4c13.jpg
📝 Hydra: Multi-Head Low-Rank Adaptation for Parameter Efficient Fine-Tuning 🔭 "Proposes a multi-branch adaption method named Hydra for fine-tuning large language models, which leverages both parallel and sequential adaptation methods simultaneously to combine the benefits of both." [gal30b+] 🤖 #CV ⚙️ https://github.com/extremebird/Hydra 🔗 https://arxiv.org/abs/2309.06922v1 #arxiv https://creative.ai/system/media_attachments/files/111/067/599/465/898/447/original/9a67dc74843797b0.jpg https://creative.ai/system/media_attachments/files/111/067/599/523/105/724/original/71cb40524aba6c0f.jpg https://creative.ai/system/media_attachments/files/111/067/599/578/586/565/original/cb0de4240f71e918.jpg https://creative.ai/system/media_attachments/files/111/067/599/637/465/725/original/627e9f0a0a78ca9e.jpg
📝 Keep It SimPool: Who Said Supervised Transformers Suffer From Attention Deficit? 🔭🧠 "Simpool replaces global average pooling by spatial similarity attention to improve performance and provide attention maps delineating object boundaries without explicit losses or modifying the architecture, whether supervised or self-supervised." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/billpsomas/simpool 🔗 https://arxiv.org/abs/2309.06891v1 #arxiv https://creative.ai/system/media_attachments/files/111/067/186/512/645/317/original/7b0176085e27acac.jpg https://creative.ai/system/media_attachments/files/111/067/186/567/794/569/original/bb1cf09340beaaa9.jpg
📝 Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization 🔭 "We disentangle high-dimensional video features into multiple components, which are explicitly trained to encode non-overlapping semantics with mutual information maximization loss (MIM) to ensure that useful task-relevant information is extracted from the original features." [gal30b+] 🤖 #CV #MM ⚙️ https://github.com/yyyooooo/DMI/ 🔗 https://arxiv.org/abs/2309.06877v1 #arxiv https://creative.ai/system/media_attachments/files/111/066/891/510/617/380/original/077d58e884e0b60c.jpg https://creative.ai/system/media_attachments/files/111/066/891/578/774/204/original/acd5d26cc09d2c5a.jpg https://creative.ai/system/media_attachments/files/111/066/891/631/362/378/original/935fcc1c9e114bb9.jpg https://creative.ai/system/media_attachments/files/111/066/891/687/267/913/original/a13f4c0cc1fa949a.jpg
📝 TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification 🔭 "Works by training a linear classifier on text representations generated by a pre-trained language model (GPT-3), conditioned by task-specific prompts." [gal30b+] 🤖 #CV ⚙️ https://github.com/jmiemirza/TAP 🔗 https://arxiv.org/abs/2309.06809v1 #arxiv https://creative.ai/system/media_attachments/files/111/066/655/677/238/287/original/98b33a67bbc27ec8.jpg https://creative.ai/system/media_attachments/files/111/066/655/745/469/090/original/b29de7ab55571630.jpg
📝 Motion-Bias-Free Feature-Based SLAM 🔭 "Proposes a set of modifications that remedy the motion bias problem in SLAM by improving the feature matching process, the data association, and the outlier rejection of the pose-graph optimizer." [gal30b+] 🤖 #CV ⚙️ https://github.com/alejandrofontan/ORB_SLAM2_Deterministic 🔗 https://arxiv.org/abs/2309.06792v1 #arxiv https://creative.ai/system/media_attachments/files/111/066/419/777/332/944/original/a484a3a70f5db1e0.jpg https://creative.ai/system/media_attachments/files/111/066/419/843/237/478/original/47cb20b788263f3a.jpg
📝 Remote Sensing Object Detection Meets Deep Learning: A Meta-Review of Challenges and Advances 🔭 "This review summarizes the development process of RSOD and identifies five main challenges in RSOD including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06751v1 #arxiv https://creative.ai/system/media_attachments/files/111/066/124/757/425/548/original/b68d6783be8a0c51.jpg https://creative.ai/system/media_attachments/files/111/066/124/845/088/711/original/8188312d3951a22a.jpg https://creative.ai/system/media_attachments/files/111/066/124/952/215/257/original/670446972632f7ce.jpg https://creative.ai/system/media_attachments/files/111/066/125/040/139/524/original/acacc020c6112aaa.jpg
📝 GelFlow: Self-Supervised Learning of Optical Flow for Vision-Based Tactile Sensor Displacement Measurement 🔭 "Employs a multi-scale feature pyramid to handle large deformations by constructing a multi-scale feature pyramid from the input image, which is more suitable for vision-based tactile sensor images." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06735v1 #arxiv https://creative.ai/system/media_attachments/files/111/065/947/981/502/125/original/c6975bc4b329e0b0.jpg https://creative.ai/system/media_attachments/files/111/065/948/044/745/470/original/781747161ce5c2d9.jpg https://creative.ai/system/media_attachments/files/111/065/948/099/794/701/original/562ac096f81883f1.jpg
📝 Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense 🔭🧠 "Works by using a non-parametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash, etc." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.06724v1 #arxiv https://creative.ai/system/media_attachments/files/111/065/771/027/672/064/original/874fb691eadb3937.jpg https://creative.ai/system/media_attachments/files/111/065/771/124/105/104/original/23d7f21ed9554c08.jpg https://creative.ai/system/media_attachments/files/111/065/771/198/993/184/original/771dfd7f333cc95e.jpg https://creative.ai/system/media_attachments/files/111/065/771/273/163/071/original/4930bd44ae095da8.jpg
📝 Deep Attentive Time Warping 🔭 "Predicts all local correspondences between two time series based on metric learning and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task." [gal30b+] 🤖 #CV ⚙️ https://github.com/matsuo-shinnosuke/deep-attentive-time 🔗 https://arxiv.org/abs/2309.06720v1 #arxiv https://creative.ai/system/media_attachments/files/111/065/416/928/165/150/original/e01204ed2d9f5ce9.jpg https://creative.ai/system/media_attachments/files/111/065/416/995/929/535/original/a49dab7d01061e37.jpg https://creative.ai/system/media_attachments/files/111/065/417/060/287/494/original/d5d2697cc2ebe364.jpg https://creative.ai/system/media_attachments/files/111/065/417/124/827/533/original/2f459db9b648cc5d.jpg
📝 MPI-Flow: Learning Realistic Optical Flow with Multiplane Images 🔭 "A learning-based MPI-Flow framework is proposed, which generates highly realistic optical flow maps from real-world images and achieves state-of-the-art performance in both unsupervised and supervised learning of optical flow estimation models." [gal30b+] 🤖 #CV ⚙️ https://github.com/Sharpiless/MPI-Flow 🔗 https://arxiv.org/abs/2309.06714v1 #arxiv https://creative.ai/system/media_attachments/files/111/065/181/599/871/767/original/2017486bf1b85a0c.jpg https://creative.ai/system/media_attachments/files/111/065/181/672/267/933/original/e0cf471491767214.jpg https://creative.ai/system/media_attachments/files/111/065/181/735/554/909/original/c40255fe367f784e.jpg https://creative.ai/system/media_attachments/files/111/065/181/826/989/919/original/1e7858009c1a236b.jpg
📝 ShaDocFormer: A Shadow-Attentive Threshold Detector with Cascaded Fusion Refiner for Document Shadow Removal' to the ICASSP 2024 Online Submission System 🔭 "The architecture of ShaDocFormer includes Shadow-attentive Threshold Detector (STD) and Cascaded Fusion Refiner (CFR), where CFR takes advantage of STD to generate shadow mask." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06670v1 #arxiv https://creative.ai/system/media_attachments/files/111/064/886/187/111/076/original/a37985a4bceee29b.jpg https://creative.ai/system/media_attachments/files/111/064/886/282/823/561/original/d53a9f137517d392.jpg https://creative.ai/system/media_attachments/files/111/064/886/350/609/436/original/7738813f7479d776.jpg
📝 Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity 🔭🧠 "Introduces a training procedure that enables the exploitation of activation sparsity by inducing semi-structured sparsity through regularization and a training procedure that is aware of the structure of the GEMM operation." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.06626v1 #arxiv https://creative.ai/system/media_attachments/files/111/064/709/382/486/009/original/f9c4dab67cc54dc5.jpg https://creative.ai/system/media_attachments/files/111/064/709/460/165/833/original/1cf96c5cb840c7c5.jpg https://creative.ai/system/media_attachments/files/111/064/709/611/181/238/original/3bb6030b2b453c2a.jpg https://creative.ai/system/media_attachments/files/111/064/709/688/393/406/original/6c6e965e9258086e.jpg
📝 Zero-Shot Visual Classification with Guided Cropping 🔭 "Uses a zero-shot object detector to guide cropping of input images and increase the influence of object-relevant features in zero-shot classification task using CLIP." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06581v1 #arxiv https://creative.ai/system/media_attachments/files/111/064/532/454/231/283/original/fae70e60ebb97330.jpg https://creative.ai/system/media_attachments/files/111/064/532/520/685/337/original/0b24dd6218b51a22.jpg https://creative.ai/system/media_attachments/files/111/064/532/614/122/767/original/41e0b427f79ce757.jpg https://creative.ai/system/media_attachments/files/111/064/532/677/049/507/original/a1adb4c135a9b011.jpg
📝 Strong-Weak Integrated Semi-Supervision for Unsupervised Single and Multi Target Domain Adaptation 🔭 "A strong representative set with high confidence but low diversity target domain samples and a weak representative set with low confidence but high diversity target domain samples are generated and updated dynamically during training." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06528v1 #arxiv https://creative.ai/system/media_attachments/files/111/064/237/458/478/631/original/bb476568a520587e.jpg https://creative.ai/system/media_attachments/files/111/064/237/557/413/987/original/88d5fc2e482667f3.jpg https://creative.ai/system/media_attachments/files/111/064/237/617/850/247/original/acc9445a0475f712.jpg https://creative.ai/system/media_attachments/files/111/064/237/680/619/307/original/164e0a3eb566d682.jpg
📝 DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention 🔭 "Presents a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks that leverages the synergy between the two modalities." [gal30b+] 🤖 #CV #MM 🔗 https://arxiv.org/abs/2309.06511v1 #arxiv https://creative.ai/system/media_attachments/files/111/064/001/500/220/326/original/e4f11b50e0e2f98d.jpg https://creative.ai/system/media_attachments/files/111/064/001/563/011/631/original/d6dc62bb068539cf.jpg https://creative.ai/system/media_attachments/files/111/064/001/631/581/187/original/edfd4da139570ac8.jpg https://creative.ai/system/media_attachments/files/111/064/001/698/879/045/original/00509b08cd9ce46d.jpg
📝 PILOT: A Pre-Trained Model-Based Continual Learning Toolbox 🧠🔭 "PILOT implements several state-of-the-art pre-trained model-based approaches that tackle class-incremental learning, which is a continual learning setting where new classes continually arrive." [gal30b+] 🤖 #LG #CV 🔗 https://arxiv.org/abs/2309.07117v1 #arxiv https://creative.ai/system/media_attachments/files/111/063/765/560/800/717/original/5f28f4b7028b7a0c.jpg https://creative.ai/system/media_attachments/files/111/063/765/645/445/091/original/77fa0b05c821d2a4.jpg
📝 Generalizable Neural Fields as Partially Observed Neural Processes 🧠🔭 "Proposes a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task." [gal30b+] 🤖 #LG #CV 🔗 https://arxiv.org/abs/2309.06660v1 #arxiv https://creative.ai/system/media_attachments/files/111/063/470/460/731/300/original/a099670ecec37f3d.jpg https://creative.ai/system/media_attachments/files/111/063/470/529/999/124/original/f4948b09755e6534.jpg https://creative.ai/system/media_attachments/files/111/063/470/596/253/011/original/b2ee028a8f62e2d5.jpg https://creative.ai/system/media_attachments/files/111/063/470/663/270/484/original/b016ee822b9d8d58.jpg
📝 Harmonic-Nas: Hardware-Aware Multimodal Neural Architecture Search on Resource-Constrained Devices 🧠🔭 "Harmonic-NAS is a two-tier Neural Architecture Search (NAS) approach for Multimodal Neural Networks (MM-NN) for efficient inference on IoT devices." [gal30b+] 🤖 #LG #CV 🔗 https://arxiv.org/abs/2309.06612v1 #arxiv https://creative.ai/system/media_attachments/files/111/063/293/398/958/992/original/ee31ee37edf6a2ef.jpg https://creative.ai/system/media_attachments/files/111/063/293/492/811/749/original/b94cf324f5e28d28.jpg https://creative.ai/system/media_attachments/files/111/063/293/553/378/854/original/8824b255374af50a.jpg https://creative.ai/system/media_attachments/files/111/063/293/624/608/650/original/6feab7afc8a850df.jpg
📝 Exploring Non-Additive Randomness on ViT Against Query-Based Black-Box Attacks 🔭 "Proposes a novel approach of using non-additive stochasticity in Vision Transformers based models to defend against black-box attacks in the query-based scenario, which is underexplored to date." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06438v1 #arxiv https://creative.ai/system/media_attachments/files/111/063/143/466/003/363/original/3dce36c7a83eaa36.jpg https://creative.ai/system/media_attachments/files/111/063/143/531/154/807/original/34233498338d8e7b.jpg https://creative.ai/system/media_attachments/files/111/063/143/594/595/012/original/c306d230102cdd85.jpg
📝 Padding-Free Convolution Based on Preservation of Differential Characteristics of Kernels 🔭 "By making convolution over an incomplete sliding window "collapse" to a linear differential operator evaluated locally at its central pixel, which no longer requires information from the neighbouring missing pixels." [gal30b+] 🤖 #CV ⚙️ https://github.com/stfc-sciml/DifferentialConv2d 🔗 https://arxiv.org/abs/2309.06370v1 #arxiv https://creative.ai/system/media_attachments/files/111/062/966/404/657/429/original/f649b9b2cc0570ec.jpg https://creative.ai/system/media_attachments/files/111/062/966/516/718/071/original/b294a6329a839021.jpg
📝 Exploring Flat Minima for Domain Generalization with Large Learning Rates 🔭 "Observes that using a large learning rate can not only promote weight diversify but also help identify flat regions in the loss landscape, which can be used to improve the generalization of DNNs." [gal30b+] 🤖 #CV ⚙️ https://github.com/koncle/DG-with-Large-LR 🔗 https://arxiv.org/abs/2309.06337v1 #arxiv https://creative.ai/system/media_attachments/files/111/062/730/407/507/182/original/e361222be4d1bf0a.jpg https://creative.ai/system/media_attachments/files/111/062/730/465/455/892/original/e279a93343b7ffe4.jpg https://creative.ai/system/media_attachments/files/111/062/730/521/755/338/original/b10ee1ce07d8efae.jpg https://creative.ai/system/media_attachments/files/111/062/730/578/189/206/original/856470ffc2ecf7d9.jpg
📝 SAMPLING: Scene-Adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis From a Single Image 🔭 "Introduces SAMPLING, a novel view synthesis method for large-scale outdoor scenes with a single image as input and an adaptive-bins strategy for multiplane images." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06323v1 #arxiv https://creative.ai/system/media_attachments/files/111/062/612/507/877/439/original/6340da365d141ed9.jpg https://creative.ai/system/media_attachments/files/111/062/612/586/346/546/original/262e25f6db810060.jpg https://creative.ai/system/media_attachments/files/111/062/612/668/099/068/original/d1ca75479b5463eb.jpg https://creative.ai/system/media_attachments/files/111/062/612/741/469/655/original/12b731da0c5216e0.jpg
📝 Towards High-Quality Specular Highlight Removal by Leveraging Large-Scale Synthetic Data 🔭 "Proposes a three-stage network to remove specular highlights from a single image by decomposing it into the albedo, shading, and specular residue components, refining the decomposition results, and adjusting tone of the refined result to match that of the input as closely as possible." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06302v1 #arxiv https://creative.ai/system/media_attachments/files/111/062/199/571/084/106/original/42f72a9eb51cd5bf.jpg https://creative.ai/system/media_attachments/files/111/062/199/630/808/773/original/1321b5be09a9d9f2.jpg https://creative.ai/system/media_attachments/files/111/062/199/705/785/361/original/89fb0f385a42f5c4.jpg https://creative.ai/system/media_attachments/files/111/062/199/761/272/326/original/f3f21699fed159ad.jpg
📝 Self-Training and Multi-Task Learning for Limited Data: Evaluation Study on Object Detection 🔭 "Self-training and multi-task learning frameworks, despite being particularly data demanding, have potentials for data exploitation if such assumptions can be relaxed to be less restrictive and data demanding." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06288v1 #arxiv https://creative.ai/system/media_attachments/files/111/061/905/497/191/117/original/c52b7a65169c5832.jpg https://creative.ai/system/media_attachments/files/111/061/905/564/127/846/original/5fb92f9853a39b0b.jpg
📝 Fg-T2m: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model 🔭 "Contains two parts: 1) a linguistics-structure assisted module that constructs accurate and complete language feature to fully utilize text information; and 2) a context-aware progressive reasoning module that learns neighborhood and overall semantic linguistics features from shallow and deep graph neural networks." [gal30b+] 🤖 #CV #MM 🔗 https://arxiv.org/abs/2309.06284v1 #arxiv https://creative.ai/system/media_attachments/files/111/061/609/741/461/488/original/b496151402c63cb1.jpg https://creative.ai/system/media_attachments/files/111/061/609/815/143/293/original/c9eef27c1014b257.jpg https://creative.ai/system/media_attachments/files/111/061/609/867/815/574/original/6a5d46d06df839ce.jpg https://creative.ai/system/media_attachments/files/111/061/609/917/646/630/original/e787b034460941c4.jpg
📝 IBAFormer: Intra-Batch Attention Transformer for Domain Generalized Semantic Segmentation 🔭 "Proposes a novel intra-batch attention mechanism, which incorporates information from other independent samples in the same batch, enriching contextual information and diversifying training data for each attention block." [gal30b+] 🤖 #CV ⚙️ https://github.com/open-mmlab/mmsegmentation 🔗 https://arxiv.org/abs/2309.06282v1 #arxiv https://creative.ai/system/media_attachments/files/111/061/373/882/394/988/original/14e30f9f64279b1f.jpg https://creative.ai/system/media_attachments/files/111/061/373/934/557/001/original/923b00df9133f9b8.jpg https://creative.ai/system/media_attachments/files/111/061/373/992/676/282/original/9221f4c1ec33c322.jpg https://creative.ai/system/media_attachments/files/111/061/374/052/501/021/original/530cce66c85ee604.jpg
📝 OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation 🔭 "Proposes an unsupervised framework of object-centric temporal action segmentation (OTAS), which consists of three modules as shown in (Figure ): global feature extraction, self-supervised local feature extraction, and boundary selection." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06276v1 #arxiv https://creative.ai/system/media_attachments/files/111/061/314/817/558/065/original/6429defa500cc42a.jpg https://creative.ai/system/media_attachments/files/111/061/314/878/699/751/original/ef18ef9586d27832.jpg https://creative.ai/system/media_attachments/files/111/061/315/018/991/474/original/df74ce463163b3f1.jpg https://creative.ai/system/media_attachments/files/111/061/315/074/869/521/original/05c5ac2d5875ebb2.jpg
📝 Modality Unifying Network for Visible-Infrared Person Re-Identification 🔭 "A Modality Unifying Network (MUN) is proposed for cross-modality person search by generating an auxiliary modality to explore modality-shared and modality-specific representations." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06262v1 #arxiv https://creative.ai/system/media_attachments/files/111/061/078/961/634/650/original/e2a5bce6407b8da9.jpg https://creative.ai/system/media_attachments/files/111/061/079/079/548/345/original/d91eed2a5c63dd6c.jpg https://creative.ai/system/media_attachments/files/111/061/079/142/149/512/original/535ed902f2a6bfd5.jpg https://creative.ai/system/media_attachments/files/111/061/079/206/414/787/original/ee3949c5ccd298ea.jpg
📝 Use Neural Networks to Recognize Students' Handwritten Letters and Incorrect Symbols 🔭 "Given students' multiple-choice answers as the input, the image classifier predicts their answers, i-e, the classification label with the highest predicted probability is considered the correct answer." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06221v1 #arxiv https://creative.ai/system/media_attachments/files/111/060/843/018/636/776/original/d5fd24c2a7d261da.jpg https://creative.ai/system/media_attachments/files/111/060/843/082/502/775/original/4c8eaf58cea1aae2.jpg https://creative.ai/system/media_attachments/files/111/060/843/140/640/205/original/277fb78ee6890fbe.jpg https://creative.ai/system/media_attachments/files/111/060/843/198/588/794/original/b1fa6671ca466931.jpg
📝 Fast Sparse PCA via Positive Semidefinite Projection for Unsupervised Feature Selection 🔭 "By imposing PSD constraint and a regularization parameter setting strategy, it's proved that the optimal solution to a convex SPCA model is optimized on the PSD cone, which is equivalent to an orthogonal matrix." [gal30b+] 🤖 #CV ⚙️ https://github.com/liuyanfang023/KBS-RNE 🔗 https://arxiv.org/abs/2309.06202v1 #arxiv https://creative.ai/system/media_attachments/files/111/060/665/916/274/450/original/21a79854905a94ee.jpg https://creative.ai/system/media_attachments/files/111/060/666/013/423/410/original/8f3969df4ad47e38.jpg https://creative.ai/system/media_attachments/files/111/060/666/070/019/388/original/b8b311e07f86e842.jpg https://creative.ai/system/media_attachments/files/111/060/666/200/868/524/original/8a046894be5a21f5.jpg
📝 Dual-Path Temporal Map Optimization for Make-Up Temporal Video Grounding 🔭 "DPTMO extracts both query-agnostic and query-guided features to construct two proposal sets and uses specific evaluation methods for the two sets, which represent the cross-modal makeup video-text similarity and multi-modal fusion relationship, complementing each other." [gal30b+] 🤖 #CV #MM ⚙️ https://github.com/AIM3-RUC/Youmakeup 🔗 https://arxiv.org/abs/2309.06176v1 #arxiv https://creative.ai/system/media_attachments/files/111/060/548/246/694/502/original/636fe070de2592bb.jpg https://creative.ai/system/media_attachments/files/111/060/548/314/083/064/original/05c1fd28f48f06ca.jpg https://creative.ai/system/media_attachments/files/111/060/548/393/774/767/original/24d86a7780f0082a.jpg https://creative.ai/system/media_attachments/files/111/060/548/472/698/386/original/c217b32d489a03db.jpg
📝 Towards Reliable Domain Generalization: A New Dataset and Evaluations 🔭🧠 "Proposes a new domain generalization task for handwritten Chinese character recognition (HCCR) to enrich the application scenarios of DG method research, which is not studied in previous work." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.06142v1 #arxiv https://creative.ai/system/media_attachments/files/111/060/312/248/090/194/original/6ad0bd4460087c27.jpg https://creative.ai/system/media_attachments/files/111/060/312/327/038/221/original/397428046dd0aee4.jpg https://creative.ai/system/media_attachments/files/111/060/312/420/944/434/original/77652b1590d35084.jpg https://creative.ai/system/media_attachments/files/111/060/312/547/610/276/original/229088e7ecacee84.jpg
📝 Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning 🔭 "DVPT is a new PETL method, it can generate a dynamic instance-wise token for each image via a Meta-Net module, which captures the dynamic instance-wise visual features." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06123v1 #arxiv https://creative.ai/system/media_attachments/files/111/060/076/233/591/986/original/6ac16ce5187b9c7b.jpg https://creative.ai/system/media_attachments/files/111/060/076/289/122/482/original/30a6c226ba44a48c.jpg https://creative.ai/system/media_attachments/files/111/060/076/345/970/842/original/a63618d72e925d3c.jpg
📝 C-RITNet: Set Infrared and Visible Image Fusion Free From Complementary Information Mining 🔭 "First skillfully sidesteps aggregating complementary information in IVIF, and then it reasonably transfers complementary information into redundant one to integrate both the shared and complementary features from two modalities." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.06118v1 #arxiv https://creative.ai/system/media_attachments/files/111/059/899/214/452/622/original/b5a8ac2554a7a24e.jpg https://creative.ai/system/media_attachments/files/111/059/899/284/437/100/original/b04580fdff288d96.jpg https://creative.ai/system/media_attachments/files/111/059/899/371/237/735/original/089665e9922afeb1.jpg https://creative.ai/system/media_attachments/files/111/059/899/446/075/934/original/e7b2b20e30eda98e.jpg
📝 Learning From History: Task-Agnostic Model Contrastive Learning for Image Restoration 🔭 "SPNIR introduces the Self-Prior guided Negative loss to enable "learning from history", which can adaptively and automatically generate negative samples to train the target model without introducing any task-specific bias." [gal30b+] 🤖 #CV ⚙️ https://github.com/Aitical/Task-agnostic_Model_Contrastive_Learning_Image_Restoration 🔗 https://arxiv.org/abs/2309.06023v1 #arxiv https://creative.ai/system/media_attachments/files/111/059/663/407/088/844/original/8e00b9843bdd1d15.jpg https://creative.ai/system/media_attachments/files/111/059/663/469/163/793/original/0af5091d2954258d.jpg https://creative.ai/system/media_attachments/files/111/059/663/526/360/616/original/beb5a1a5f97dff8a.jpg https://creative.ai/system/media_attachments/files/111/059/663/588/668/020/original/3a78fc9b899e5637.jpg
📝 ATTA: Anomaly-Aware Test-Time Adaptation for Out-of-Distribution Detection in Segmentation 🔭🧠 "Proposes the dual-level OOD detection approach to handle domain shift and semantic shift jointly, by distinguishing whether domain shift exists in the image by leveraging global low-level features, and identifying pixels with semantic shift by utilizing dense high-level feature maps." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/gaozhitong/ATTA 🔗 https://arxiv.org/abs/2309.05994v1 #arxiv https://creative.ai/system/media_attachments/files/111/059/545/482/913/104/original/ab8985a9d3facb4c.jpg https://creative.ai/system/media_attachments/files/111/059/545/546/073/030/original/2f54d80b7bded899.jpg https://creative.ai/system/media_attachments/files/111/059/545/610/019/377/original/24291aa38924c978.jpg https://creative.ai/system/media_attachments/files/111/059/545/679/793/899/original/3959e11b702ace86.jpg
📝 Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation 🔭 "A straightforward textual template and a foreground-background segmentation algorithm are employed to generate foreground images set against isolated backgrounds, while an image captioning method and a text-to-image synthesis framework are used to produce context images." [gal30b+] 🤖 #CV ⚙️ https://github.com/gyhandy/Text2Image-for-Detection 🔗 https://arxiv.org/abs/2309.05956v1 #arxiv https://creative.ai/system/media_attachments/files/111/059/368/533/648/740/original/066e4979c5360b6e.jpg https://creative.ai/system/media_attachments/files/111/059/368/730/220/144/original/a0e8cfd8105d3284.jpg https://creative.ai/system/media_attachments/files/111/059/368/838/903/965/original/81e4970c6e85f980.jpg https://creative.ai/system/media_attachments/files/111/059/368/917/888/108/original/c0bae1aa1c7ffb15.jpg
📝 Hierarchical Conditional Semi-Paired Image-to-Image Translation for Multi-Task Image Defect Correction on Shopping Websites 🔭🧠 "A novel unified Image-to-Image model to correct multiple defects across different product types, leveraging an attention mechanism to hierarchically incorporate high-level defect groups and specific defect types to guide the network to focus on defect-related image regions." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.05883v1 #arxiv https://creative.ai/system/media_attachments/files/111/059/191/534/600/656/original/c251548b9a92f3b0.jpg https://creative.ai/system/media_attachments/files/111/059/191/595/538/646/original/99e325a115f109fe.jpg https://creative.ai/system/media_attachments/files/111/059/191/652/138/318/original/7ef71f3ccf2a7743.jpg https://creative.ai/system/media_attachments/files/111/059/191/708/153/968/original/87b87967d07bcbed.jpg
📝 Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation 🔭 "The Self-Correlation and Cross-Correlation Learning Network (SCCNe) is a few-shot remote sensing image semantic segmentation model that consists of a support branch and a query branch." [gal30b+] 🤖 #CV ⚙️ https://github.com/linhanwang/SCCNe 🔗 https://arxiv.org/abs/2309.05840v1 #arxiv https://creative.ai/system/media_attachments/files/111/058/955/562/990/825/original/4348a324a02df84e.jpg https://creative.ai/system/media_attachments/files/111/058/955/622/162/368/original/d0f1b8b09ddb3d54.jpg https://creative.ai/system/media_attachments/files/111/058/955/708/904/863/original/7e922d387598f9cb.jpg https://creative.ai/system/media_attachments/files/111/058/955/767/765/442/original/917e482341bb013f.jpg
📝 SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition 🔭 "SCD-Net is based on a novel contrastive learning framework which learns disentangled spatial and temporal clues via a constructed anchor and the proposed masking strategy with structural constraints, and can be applied for skeleton-based action recognition." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.05834v1 #arxiv https://creative.ai/system/media_attachments/files/111/058/837/579/000/657/original/4b9a0c9471bd3d93.jpg https://creative.ai/system/media_attachments/files/111/058/837/660/745/556/original/ee5a1e58f8e15bc6.jpg https://creative.ai/system/media_attachments/files/111/058/837/725/175/054/original/02902b9cbc4a026f.jpg https://creative.ai/system/media_attachments/files/111/058/837/785/383/215/original/a48ebfb62a82b02a.jpg
📝 Blendshapes GHUM: Real-Time Monocular Facial Blendshape Prediction 🔭 "Presents Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars." [gal30b+] 🤖 #CV ⚙️ https://github.com/google/ 🔗 https://arxiv.org/abs/2309.05782v1 #arxiv https://creative.ai/system/media_attachments/files/111/058/188/664/681/385/original/902f9611a00ca41f.jpg https://creative.ai/system/media_attachments/files/111/058/188/725/110/724/original/e40952c622e62e74.jpg https://creative.ai/system/media_attachments/files/111/058/188/776/962/095/original/b8942614b88ccbbf.jpg
📝 TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language 🔭 "TransferDoc is a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives which learns richer semantic concepts by unifying language and visual representations." [gal30b+] 🤖 #CV ⚙️ https://github.com/tesseract-ocr/tesseract 🔗 https://arxiv.org/abs/2309.05756v1 #arxiv https://creative.ai/system/media_attachments/files/111/058/011/898/874/739/original/49be9f11b5424907.jpg https://creative.ai/system/media_attachments/files/111/058/011/995/447/791/original/9dc8b2258cd7a2e3.jpg https://creative.ai/system/media_attachments/files/111/058/012/072/599/845/original/6f8d47bef1d5822a.jpg https://creative.ai/system/media_attachments/files/111/058/012/197/429/735/original/08f82dc2376c88f3.jpg
📝 InstaFlow: One Step Is Enough for High-Quality Diffusion-Based Text-to-Image Generation 🧠🔭 "InstaFlow is trained via a novel text-conditioned pipeline, in which reflow plays a critical role in improving the assignment between noise and images by refining the coupling between noises and images through straightening the trajectories of probability flows." [gal30b+] 🤖 #LG #CV ⚙️ https://github.com/gnobitab/InstaFlow 🔗 https://arxiv.org/abs/2309.06380v1 #arxiv https://creative.ai/system/media_attachments/files/111/057/775/795/131/745/original/566ffbcf5173bb7a.jpg https://creative.ai/system/media_attachments/files/111/057/775/898/577/943/original/cc187a6e454f1f07.jpg https://creative.ai/system/media_attachments/files/111/057/775/982/685/741/original/706dce085e4a6019.jpg https://creative.ai/system/media_attachments/files/111/057/776/064/914/533/original/be51cf40eb9e88cf.jpg
📝 Elucidating the Solution Space of Extended Reverse-Time SDE for Diffusion Models 🧠🔭 "Formulates the sampling process as an extended reverse-time SDE (ER SDE) and devise fast and training-free samplers, ER-SDE Solvers, elevating the efficiency of stochastic samplers to unprecedented levels." [gal30b+] 🤖 #LG #CV 🔗 https://arxiv.org/abs/2309.06169v1 #arxiv https://creative.ai/system/media_attachments/files/111/057/540/039/211/175/original/86c02c6ee3b07c33.jpg https://creative.ai/system/media_attachments/files/111/057/540/204/258/675/original/dadd4e640d1e376e.jpg https://creative.ai/system/media_attachments/files/111/057/540/300/256/623/original/fe4828a6bbabbc36.jpg https://creative.ai/system/media_attachments/files/111/057/540/409/093/713/original/74a4347936334afa.jpg
📝 Certified Robust Models with Slack Control and Large Lipschitz Constants 🧠🔭 "Proposes a Calibrated Lipschitz-Margin Loss (CLL) that improves certified robustness by explicitly calibrating the loss wrt margin and Lipschitz constant, thereby establishing full control over slack and improving robustness certificates even with larger Lipschitz constants." [gal30b+] 🤖 #LG #CV ⚙️ https://github.com/mlosch/CLL 🔗 https://arxiv.org/abs/2309.06166v1 #arxiv https://creative.ai/system/media_attachments/files/111/057/303/921/951/400/original/25eff6f2e0df30a5.jpg https://creative.ai/system/media_attachments/files/111/057/303/986/950/236/original/eefe9f9b33e6401c.jpg https://creative.ai/system/media_attachments/files/111/057/304/045/039/737/original/4d76c295234184d3.jpg https://creative.ai/system/media_attachments/files/111/057/304/112/864/240/original/4e75bcc56655e315.jpg
📝 Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning 🧠🔭 "An expert network is relieved of the duty of keeping the previous knowledge and can focus on performing optimally on the new tasks, while a previous network is used in an adaptation-retrospection phase to avoid forgetting and initialize a new expert with the knowledge of the old network." [gal30b+] 🤖 #LG #CV ⚙️ https://github.com/alviur/pocon_wacv2024 🔗 https://arxiv.org/abs/2309.06086v1 #arxiv https://creative.ai/system/media_attachments/files/111/057/186/119/699/977/original/9069df62bba99425.jpg https://creative.ai/system/media_attachments/files/111/057/186/255/127/281/original/9bb24e46e13a1b23.jpg https://creative.ai/system/media_attachments/files/111/057/186/312/822/265/original/a2e76599cb26d280.jpg https://creative.ai/system/media_attachments/files/111/057/186/376/843/587/original/cdb59842da4fdb5b.jpg
📝 KD-FixMatch: Knowledge Distillation Siamese Neural Networks 🧠🔭 "Presents KD-FixMatch, a novel SSL algorithm that addresses the limitations of FixMatch by incorporating knowledge distillation to enhance performance and reduce performance degradation in the early training stage." [gal30b+] 🤖 #LG #CV ⚙️ https://github.com/petewarden/tensorflow_ 🔗 https://arxiv.org/abs/2309.05826v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/891/023/997/696/original/8f7a5be1c3515a6e.jpg
📝 DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices 🔭 "Proposes a collaborative inference framework, DeViT, to facilitate edge deployment by decomposing large ViTs into multiple small models for collaborative inference at the edge devices." [gal30b+] 🤖 #CV #DC #PF 🔗 https://arxiv.org/abs/2309.05015v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/553/518/795/175/original/1a18544287dd8f57.jpg https://creative.ai/system/media_attachments/files/111/056/553/585/769/253/original/58a1d9e7552c6eba.jpg https://creative.ai/system/media_attachments/files/111/056/553/649/520/780/original/7ffcc32cf6cd671c.jpg https://creative.ai/system/media_attachments/files/111/056/553/718/374/058/original/993481ce295fdedb.jpg
📝 Towards Fully Decoupled End-to-End Person Search 🔭 "Task-incremental end-to-end person search network is proposed for the detection and re-id sub-task, which decouples the model architecture for the two sub-tasks." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.04967v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/526/931/470/687/original/35c0c7a30a08fa91.jpg https://creative.ai/system/media_attachments/files/111/056/527/020/097/625/original/f7df18939915767d.jpg https://creative.ai/system/media_attachments/files/111/056/527/082/239/075/original/f369d200408651ca.jpg https://creative.ai/system/media_attachments/files/111/056/552/780/312/441/original/8d79cec487fdcb44.jpg
📝 Semi-Supervised Learning for Face Anti-Spoofing Using Apex Frame 🔭 "An apex frame is derived from a video by computing a weighted sum of its frames, where the weights are determined using a Gaussian distribution centered around the video's central frame." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.04958v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/525/825/443/220/original/6128a9dbccb5e6d3.jpg https://creative.ai/system/media_attachments/files/111/056/525/903/134/297/original/529e606c4094598e.jpg https://creative.ai/system/media_attachments/files/111/056/525/972/335/010/original/85649558b8a77181.jpg
📝 Semi-Supervised Instance Segmentation with a Learned Shape Prior 🔭 "A variational autoencoder (VAE) is trained on either (a) real cell shape patches or (b) synthetic cell shape patches + noise to learn shape prior." [gal30b+] 🤖 #CV ⚙️ https://github.com/looooongChen/shape 🔗 https://arxiv.org/abs/2309.04888v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/524/684/586/414/original/3e5738a5a81ff1ad.jpg https://creative.ai/system/media_attachments/files/111/056/524/751/072/696/original/ef07b938e66eab3f.jpg https://creative.ai/system/media_attachments/files/111/056/524/812/926/766/original/c99d8da7d695cb1f.jpg
📝 SortedAP: Rethinking Evaluation Metrics for Instance Segmentation 🔭 "Proposes a new metric called sortedAP, which strictly decreases with both object- and pixel-level imperfections and has an uninterrupted penalization scale over the entire domain." [gal30b+] 🤖 #CV ⚙️ https://www.github.com/looooongChen/sortedAP 🔗 https://arxiv.org/abs/2309.04887v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/524/126/100/519/original/06760aec4e48c3cc.jpg https://creative.ai/system/media_attachments/files/111/056/524/193/134/157/original/c4dd21ffa4d4e991.jpg https://creative.ai/system/media_attachments/files/111/056/524/249/812/194/original/c9fe366ffc593bcb.jpg
📝 ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-Agnostic Counting 🔭🧠 "Composed by three stages: a) a feature extractor, b) an object detector, and c) a blind counter that predicts the number of each type of object in the image without using any example of that type." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.04820v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/523/201/002/684/original/d007eba0a3518d80.jpg https://creative.ai/system/media_attachments/files/111/056/523/274/288/798/original/8618452868c0f0d5.jpg https://creative.ai/system/media_attachments/files/111/056/523/340/093/188/original/1af1364647d15852.jpg https://creative.ai/system/media_attachments/files/111/056/523/469/360/119/original/008026183d551c20.jpg
📝 Speech2Lip: High-Fidelity Speech to Lip Generation by Learning From a Short Video 🔭 "A decomposition-synthesis-composition framework that disentangles speech-sensitive and speech-insensitive motion/appearance to facilitate effective learning from limited training data, resulting in the generation of natural-looking videos." [gal30b+] 🤖 #CV ⚙️ https://github.com/CVMI-Lab/Speech2Lip 🔗 https://arxiv.org/abs/2309.04814v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/522/383/944/711/original/b30a7dbb2d9aa70c.jpg https://creative.ai/system/media_attachments/files/111/056/522/446/180/901/original/a188024ea3a620c5.jpg https://creative.ai/system/media_attachments/files/111/056/522/501/783/897/original/2f4d849a6f31c00d.jpg https://creative.ai/system/media_attachments/files/111/056/522/548/889/895/original/8d51acc39eaca7b0.jpg
📝 Self-Supervised Transformer with Domain Adaptive Reconstruction for General Face Forgery Video Detection 🔭 "Explores to take full advantage of the difference between real and forgery videos by only exploring the common representation of real face videos in a self-supervised manner, and then fine-tuned a linear head on specific face forgery video datasets." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.04795v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/521/365/451/633/original/7593baa986f2cf00.jpg https://creative.ai/system/media_attachments/files/111/056/521/428/052/958/original/1defd17e883a411d.jpg https://creative.ai/system/media_attachments/files/111/056/521/492/171/602/original/4804eae545ef9d0f.jpg https://creative.ai/system/media_attachments/files/111/056/521/561/848/414/original/653c6db5c50fab4a.jpg
📝 Latent Degradation Representation Constraint for Single Image Deraining 🔭 "The DAEncoder is proposed to adaptively extract latent degradation representation by using the deformable convolutions to exploit the direction consistency of rain streaks, then the constraint loss is introduced to explicitly constrain the degradation representation learning during training." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.04780v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/520/486/247/161/original/77b0715ff9def370.jpg https://creative.ai/system/media_attachments/files/111/056/520/581/603/261/original/fd46ea9247b3d0af.jpg
📝 Deep Video Restoration for Under-Display Camera 🔭 "Consists of a spatial branch with local-aware transformers, a temporal branch embedded temporal transformers, and a spatial-temporal fusion module (see Fig." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.04752v1 #arxiv https://creative.ai/system/media_attachments/files/111/056/519/388/092/377/original/cc4b9664cc92d7b3.jpg https://creative.ai/system/media_attachments/files/111/056/519/453/402/697/original/dbb5fbf08030418c.jpg https://creative.ai/system/media_attachments/files/111/056/519/514/359/341/original/d2d081bc45fac929.jpg https://creative.ai/system/media_attachments/files/111/056/519/576/947/988/original/8ff393f3cf264365.jpg
📝 Grouping Boundary Proposals for Fast Interactive Image Segmentation 🔭 "The adaptive cut can disconnect the image domain such that the target contours are imposed to pass through this cut only once, and the selected boundary proposals and corresponding minimal paths are used to delineate the target contours." [gal30b+] 🤖 #CV ⚙️ https://github.com/Mirebeau/HamiltonFastMarching 🔗 https://arxiv.org/abs/2309.04169v1 #arxiv https://creative.ai/system/media_attachments/files/111/048/842/871/817/871/original/92ec913d97ae4b95.jpg https://creative.ai/system/media_attachments/files/111/048/842/954/083/020/original/67968dc78225252e.jpg https://creative.ai/system/media_attachments/files/111/048/843/033/001/890/original/089539260256c90b.jpg https://creative.ai/system/media_attachments/files/111/048/843/206/267/062/original/a6778f6ddc17cbdf.jpg
📝 Multimodal Transformer for Material Segmentation 🔭🧠 "Proposes a fusion strategy that can effectively fuse information from different combinations of multiple modalities including RGB, Angle of Linear Polarization (AoLP), Degree of Linear Polarization (DoLP) and Near-Infrared (NIR)." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/csiplab/MMSFormer 🔗 https://arxiv.org/abs/2309.04001v1 #arxiv https://creative.ai/system/media_attachments/files/111/046/483/477/601/558/original/0d2c63cca4e87cc1.jpg https://creative.ai/system/media_attachments/files/111/046/483/569/994/878/original/d8770df15a837a86.jpg https://creative.ai/system/media_attachments/files/111/046/483/649/675/255/original/e91363e8dfc83a04.jpg
📝 Adapting Self-Supervised Representations to Multi-Domain Setups 🔭🧠 "The Domain Disentanglement Module (DDM) is a lightweight component that can be used with various self-supervised learning methods to improve their ability to learn generalizable representations when trained on multiple domains." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.03999v1 #arxiv https://creative.ai/system/media_attachments/files/111/046/306/507/928/638/original/431795262e9add58.jpg https://creative.ai/system/media_attachments/files/111/046/306/573/607/180/original/4544c609f902feed.jpg https://creative.ai/system/media_attachments/files/111/046/306/633/177/984/original/49dda3de4cd5e98a.jpg https://creative.ai/system/media_attachments/files/111/046/306/700/904/662/original/7a821a30d79f2e0e.jpg
📝 CDFSL-V: Cross-Domain Few-Shot Learning for Videos 🔭 "Leverages a masked autoencoder-based self-supervised training objective to learn from both source and target data in a self-supervised manner, which can be used to learn generic features from target video data." [gal30b+] 🤖 #CV ⚙️ https://github.com/Sarinda251/CDFSL-V}{https://github.com/Sarinda251/CDFSL-V 🔗 https://arxiv.org/abs/2309.03989v1 #arxiv https://creative.ai/system/media_attachments/files/111/046/129/470/442/936/original/717f01c193643916.jpg https://creative.ai/system/media_attachments/files/111/046/129/535/129/359/original/0991de0b1f6d3999.jpg https://creative.ai/system/media_attachments/files/111/046/129/591/686/740/original/e14f9fc99c4d621e.jpg https://creative.ai/system/media_attachments/files/111/046/129/665/858/589/original/998addde0749a7e2.jpg
📝 UER: A Heuristic Bias Addressing Approach for Online Continual Learning 🧠🔭 "UER learns current samples only by the angle factor and further replays previous samples by both the norm and angle factors to address the bias problem in continual learning, achieving superior performance over various state-of-the-art methods." [gal30b+] 🤖 #LG #CV ⚙️ https://github.com/FelixHuiweiLin/UER 🔗 https://arxiv.org/abs/2309.04081v1 #arxiv https://creative.ai/system/media_attachments/files/111/045/952/629/004/689/original/d5179cc04ed82287.jpg https://creative.ai/system/media_attachments/files/111/045/952/720/646/142/original/ad195af1315131d5.jpg https://creative.ai/system/media_attachments/files/111/045/952/795/809/984/original/67e5873d2589a276.jpg https://creative.ai/system/media_attachments/files/111/045/952/859/453/896/original/4534641e83ccbbec.jpg
📝 Improving Resnet-9 Generalization Trained on Small Datasets 🧠🔭 "A combination of various techniques to improve generalization including sharpness aware optimization, label smoothing, gradient centralization, input patch whitening as well as metalearning based training." [gal30b+] 🤖 #LG #CV 🔗 https://arxiv.org/abs/2309.03965v1 #arxiv https://creative.ai/system/media_attachments/files/111/045/775/604/551/509/original/e7d903714c1065b9.jpg
📝 Exploring Sparse MoE in GANs for Text-Conditioned Image Synthesis 🔭 "A mixture-of-experts (MoE) based generative text-to-image (T2I) model that employs a collection of experts to process the feature, together with a sparse router to help select the most suitable expert for each feature point." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03904v1 #arxiv https://creative.ai/system/media_attachments/files/111/034/940/928/681/904/original/9dffa549f79e198c.jpg https://creative.ai/system/media_attachments/files/111/034/940/980/698/084/original/7656c69ef7e6947a.jpg https://creative.ai/system/media_attachments/files/111/034/941/037/667/729/original/a0e80249799bf85a.jpg https://creative.ai/system/media_attachments/files/111/034/941/114/177/816/original/f4494cc81cf9f690.jpg
📝 Tracking Anything with Decoupled Video Segmentation 🔭 "Works by first using a segmentation network on every frame, and the network produces a probability for each pixel to belong to the foreground object or background; then it uses bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03903v1 #arxiv https://creative.ai/system/media_attachments/files/111/034/763/947/013/361/original/c858974dbdca66a1.jpg https://creative.ai/system/media_attachments/files/111/034/764/003/651/515/original/77ad67e306bcac9f.jpg https://creative.ai/system/media_attachments/files/111/034/764/052/979/553/original/3c5b39d8c7bbeade.jpg https://creative.ai/system/media_attachments/files/111/034/764/113/758/573/original/bd1b8c6176b13bc8.jpg
📝 The Making and Breaking of Camouflage 🔭 "Proposes three camouflage scores for measuring camouflage in the feature space, which are used to evaluate existing camouflage datasets and generate a large-scale and challenging dataset for camouflaged instance segmentation." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03899v1 #arxiv https://creative.ai/system/media_attachments/files/111/034/586/968/631/971/original/2ecc0a38d9a3f6c1.jpg https://creative.ai/system/media_attachments/files/111/034/587/031/031/926/original/2d66f6d45047885b.jpg https://creative.ai/system/media_attachments/files/111/034/587/111/426/639/original/dfefcd3790176ca7.jpg https://creative.ai/system/media_attachments/files/111/034/587/164/524/885/original/92ed9dcc954e161f.jpg
📝 ProPainter: Improving Propagation and Transformer for Video Inpainting 🔭 "Introduces a novel video inpainting framework called ProPainter, which involves enhanced propagation mechanism and sparse Transformer for efficient video inpainting, outperforming previous state-of-the-art approaches by a large margin." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03897v1 #arxiv https://creative.ai/system/media_attachments/files/111/034/410/128/452/966/original/8bfe4c9baeb53d14.jpg https://creative.ai/system/media_attachments/files/111/034/410/213/140/629/original/c35e407ffa59d9ef.jpg https://creative.ai/system/media_attachments/files/111/034/410/286/613/053/original/926717049ad0d3c3.jpg https://creative.ai/system/media_attachments/files/111/034/410/347/795/264/original/bdd6bb0a70348abc.jpg
📝 InstructDiffusion: A Generalist Modeling Interface for Vision Tasks 🔭 "Formulates human instructions to a pixel prediction task, where an InstructDiffusion model is trained to predict pixels according to user instructions, such as encircling the man's left shoulder in red or applying a blue mask to the left car." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03895v1 #arxiv https://creative.ai/system/media_attachments/files/111/034/174/024/504/315/original/14090293f7bd3e8b.jpg https://creative.ai/system/media_attachments/files/111/034/174/126/303/393/original/2e534adef4024c3c.jpg https://creative.ai/system/media_attachments/files/111/034/174/182/459/235/original/e0b71e57dff1796a.jpg https://creative.ai/system/media_attachments/files/111/034/174/239/131/280/original/8b09709e3300955c.jpg
📝 Box-Based Refinement for Weakly Supervised and Unsupervised Localization Tasks 🔭 "A box-based detector is trained to predict the location of the phrases in the image, and then applied to the output of the network to improve it further and enhance the localization performance of weakly supervised and unsupervised methods." [gal30b+] 🤖 #CV ⚙️ https://github.com/eyalgomel/box-based-refinement 🔗 https://arxiv.org/abs/2309.03874v1 #arxiv https://creative.ai/system/media_attachments/files/111/033/997/037/694/441/original/138682114f99bab1.jpg https://creative.ai/system/media_attachments/files/111/033/997/151/761/534/original/bae2437bf1a77d35.jpg https://creative.ai/system/media_attachments/files/111/033/997/253/642/406/original/014838c9fe047102.jpg https://creative.ai/system/media_attachments/files/111/033/997/346/987/780/original/789bb4a417a9128a.jpg
📝 Text-to-Feature Diffusion for Audio-Visual Few-Shot Learning 🔭 "AV-DIFF is a text-to-feature diffusion framework, which first fuses the temporal and audio-visual features via cross-modal attention and then generates multi-modal features for the novel classes." [gal30b+] 🤖 #CV ⚙️ https://github.com/ExplainableML/AVDIFF-GFSL 🔗 https://arxiv.org/abs/2309.03869v1 #arxiv https://creative.ai/system/media_attachments/files/111/033/643/131/039/310/original/5d24bfc58dab4b24.jpg https://creative.ai/system/media_attachments/files/111/033/643/196/689/261/original/7f7c5da6c6e9ee1d.jpg https://creative.ai/system/media_attachments/files/111/033/643/253/543/043/original/b357da815d99b949.jpg
📝 Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption 🔭 "A novel phasic content fusing few-shot diffusion model with directional distribution consistency loss, which targets different learning objectives at distinct training stages of the diffusion model, is designed." [gal30b+] 🤖 #CV ⚙️ https://github.com/sjtuplayer/few-shot-diffusion 🔗 https://arxiv.org/abs/2309.03729v1 #arxiv https://creative.ai/system/media_attachments/files/111/033/466/263/091/089/original/bc007d2c7993d5b8.jpg https://creative.ai/system/media_attachments/files/111/033/466/331/470/681/original/9f8a2827e79e280f.jpg https://creative.ai/system/media_attachments/files/111/033/466/402/912/892/original/b748d88162c5a2fd.jpg https://creative.ai/system/media_attachments/files/111/033/466/509/720/121/original/9b860756d3171e3c.jpg
📝 Interpretable Visual Question Answering via Reasoning Supervision 🔭 "Based on a transformer-based architecture that leverages reasoning supervision as a supervisory signal to guide the visual attention to important elements of the scene, without requiring explicit grounding annotations." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03726v1 #arxiv https://creative.ai/system/media_attachments/files/111/033/289/534/828/923/original/8e402443917f84ba.jpg https://creative.ai/system/media_attachments/files/111/033/289/596/815/937/original/c6bcfea6e262b609.jpg https://creative.ai/system/media_attachments/files/111/033/289/653/372/205/original/dbe43fba37e1fa0f.jpg
📝 Efficient Adaptive Human-Object Interaction Detection with Concept-Guided Memory 🔭 "ADA-CM has two operating modes: (1) training-free and (2) updating a lightweight set of parameters, which can be incorporated with existing HOI detectors." [gal30b+] 🤖 #CV ⚙️ https://github.com/ltttpku/ADA-CM 🔗 https://arxiv.org/abs/2309.03696v1 #arxiv https://creative.ai/system/media_attachments/files/111/032/994/492/460/885/original/402c8f8c726ac2d0.jpg https://creative.ai/system/media_attachments/files/111/032/994/566/285/697/original/e04e2f9e179c7a08.jpg https://creative.ai/system/media_attachments/files/111/032/994/622/718/411/original/0f6ce5cba2bddf9b.jpg https://creative.ai/system/media_attachments/files/111/032/994/680/971/230/original/5e1e066cc2808d7e.jpg
📝 Prompt-Based Context- And Domain-Aware Pretraining for Vision and Language Navigation 🔭 "PANDA consists of a domain-aware stage and a context-aware stage, which performs prompt-based tuning and contrastive learning, respectively, on a pretrained VLN model." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03661v1 #arxiv https://creative.ai/system/media_attachments/files/111/032/876/375/076/485/original/5de6127b0d273e77.jpg
📝 Enhancing Sample Utilization Through Sample Adaptive Augmentation in Semi-Supervised Learning 🔭 "Sample Adaptive Augmentation(SAA) consists of a sample selection module and a sample augmentation module, which helps to optimize the SSL models by giving more attention to naive samples and augmenting them in a more diverse manner." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03598v1 #arxiv https://creative.ai/system/media_attachments/files/111/032/640/605/821/315/original/f502f7956fc7f4b2.jpg https://creative.ai/system/media_attachments/files/111/032/640/663/124/706/original/1cec6e348c45dfec.jpg https://creative.ai/system/media_attachments/files/111/032/640/723/030/516/original/0c5f19a77f32288c.jpg https://creative.ai/system/media_attachments/files/111/032/640/798/083/675/original/134ab6681c6f42a3.jpg
📝 DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions 🔭 "Learns to classify the actual position for each non-overlapping patch among all possible positions solely based on their visual appearance, by minimizing the negative log-likelihood." [gal30b+] 🤖 #CV ⚙️ https://github.com/Haochen-Wang409/DropPos 🔗 https://arxiv.org/abs/2309.03576v1 #arxiv https://creative.ai/system/media_attachments/files/111/032/522/565/913/890/original/11c6a1cb9ef661b0.jpg https://creative.ai/system/media_attachments/files/111/032/522/624/635/968/original/04b1a8a529d17e50.jpg https://creative.ai/system/media_attachments/files/111/032/522/679/132/611/original/d9c5124764e444f6.jpg
📝 Region Generation and Assessment Network for Occluded Person Re-Identification 🔭 "RGANet utilizes pre-trained CLIP to locate the human body regions using semantic prototypes extracted from text descriptions, and then it measures the importance of each generated region." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03558v1 #arxiv https://creative.ai/system/media_attachments/files/111/032/345/600/902/446/original/5efee51d5c7cd6a3.jpg https://creative.ai/system/media_attachments/files/111/032/345/660/682/863/original/8b73376e95e2a871.jpg https://creative.ai/system/media_attachments/files/111/032/345/728/283/127/original/07795020ee81ca2d.jpg
📝 Trash to Treasure: Low-Light Object Detection via Decomposition-and-Aggregation 🔭 "A newly designed enhancer is introduced as the scene decomposition module, whose removed illumination is exploited as the auxiliary to extract detection-friendly features, and then a semantic aggregation module is established to further integrate multi-scale scene-related semantic information." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03548v1 #arxiv https://creative.ai/system/media_attachments/files/111/032/109/788/615/868/original/47a6560dcee0e820.jpg https://creative.ai/system/media_attachments/files/111/032/109/843/280/228/original/4b29a3c09480fcf0.jpg https://creative.ai/system/media_attachments/files/111/032/109/897/472/344/original/d58b2350c78b2774.jpg https://creative.ai/system/media_attachments/files/111/032/109/957/644/900/original/c81823edb79986f3.jpg
📝 Dynamic Frame Interpolation in Wavelet Domain 🔭 "WaveletVFI uses a lightweight motion perception network to estimate an initial intermediate optical flow, and embeds a threshold classifier in it to learn a dynamic threshold for more computation reduction." [gal30b+] 🤖 #CV ⚙️ https://github.com/ltkong218/WaveletVFI 🔗 https://arxiv.org/abs/2309.03508v1 #arxiv https://creative.ai/system/media_attachments/files/111/031/814/826/310/225/original/8ef41cd46be972d3.jpg https://creative.ai/system/media_attachments/files/111/031/814/884/312/460/original/c0a28736e79b3d92.jpg https://creative.ai/system/media_attachments/files/111/031/814/935/252/356/original/b97d59aad64257a7.jpg https://creative.ai/system/media_attachments/files/111/031/814/982/873/570/original/5362afe9919bbbf3.jpg
📝 DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing Using Determiners 🔭 "DetermiNet provides 250,000 synthetically generated images and captions with ground truth bounding boxes for the objects of interest in images based on 25 determiners." [gal30b+] 🤖 #CV ⚙️ https://github.com/clarence-lee-sheng/ 🔗 https://arxiv.org/abs/2309.03483v1 #arxiv https://creative.ai/system/media_attachments/files/111/031/578/740/791/179/original/9884ff5fae9278e1.jpg https://creative.ai/system/media_attachments/files/111/031/578/799/868/454/original/d990e935a6457fcf.jpg https://creative.ai/system/media_attachments/files/111/031/578/852/717/302/original/c6a777edb7634dd8.jpg https://creative.ai/system/media_attachments/files/111/031/578/913/580/707/original/04a6bcfafaf749fc.jpg
📝 Temporal Collection and Distribution for Referring Video Object Segmentation 🔭 "Given a video sequence, the proposed framework simultaneously maintains a global referent token and a sequence of object queries across the frames, where the former is responsible for capturing video-level referent according to the language expression, while the latter serves to better locate and segment objects with each frame." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03473v1 #arxiv https://creative.ai/system/media_attachments/files/111/031/342/723/287/204/original/11b932ef425b11d5.jpg https://creative.ai/system/media_attachments/files/111/031/342/782/780/419/original/a079f158e003156d.jpg https://creative.ai/system/media_attachments/files/111/031/342/844/632/914/original/5012c407119ef14d.jpg https://creative.ai/system/media_attachments/files/111/031/342/922/210/999/original/6022d922f2835c7a.jpg
📝 Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation 🔭 "The proposed generative scanpath representation (GSR), which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition, provides a global overview of gazed-focused contents derived from scanpaths." [gal30b+] 🤖 #CV ⚙️ https://github.com/xiangjieSui/GSR 🔗 https://arxiv.org/abs/2309.03472v1 #arxiv https://creative.ai/system/media_attachments/files/111/031/106/980/382/563/original/7c94c2e82751d8d6.jpg https://creative.ai/system/media_attachments/files/111/031/107/058/787/713/original/c19770d0a3196cc8.jpg https://creative.ai/system/media_attachments/files/111/031/107/118/775/541/original/f6d86c07da9c6bdc.jpg https://creative.ai/system/media_attachments/files/111/031/107/204/547/175/original/fb92cd08a807cb71.jpg
📝 Multi-Modality Guidance Network for Missing Modality Inference 🔭🧠 "Proposes a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models for inference on scenarios with missing modalities." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.03452v1 #arxiv https://creative.ai/system/media_attachments/files/111/030/871/032/095/015/original/5c727f2b4f7b0946.jpg https://creative.ai/system/media_attachments/files/111/030/871/103/854/683/original/94171f704f54bbef.jpg
📝 Distribution-Aware Prompt Tuning for Vision-Language Models 🔭 "Distribution-aware prompt tuning maximizes inter-dispersion as well as minimizing intra-dispersion between embeddings of two modalities in the latent space, which leads to effective feature space alignment between them." [gal30b+] 🤖 #CV ⚙️ https://github.com/mlvlab/DAPT 🔗 https://arxiv.org/abs/2309.03406v1 #arxiv https://creative.ai/system/media_attachments/files/111/030/517/184/916/921/original/8692584a704f15a9.jpg https://creative.ai/system/media_attachments/files/111/030/517/250/776/129/original/dfec5786437db52c.jpg https://creative.ai/system/media_attachments/files/111/030/517/339/234/885/original/915943e96c7c2af5.jpg https://creative.ai/system/media_attachments/files/111/030/517/430/668/653/original/f0fb1367a07c4700.jpg
📝 Reasonable Anomaly Detection in Long Sequences 🔭 "A Stacked State Machine model is proposed to represent the temporal dependencies which are consistent across long-range observations and functions in predicting future states based on past ones, the divergence between the predictions with inherent normal patterns and observed ones determines anomalies." [gal30b+] 🤖 #CV ⚙️ https://github.com/AllenYLJiang/Anomaly-Detection-in-Sequences 🔗 https://arxiv.org/abs/2309.03401v1 #arxiv https://creative.ai/system/media_attachments/files/111/030/399/189/667/281/original/7003e434d8afd75b.jpg
📝 Active Shooter Detection and Robust Tracking Utilizing Supplemental Synthetic Data 🔭 "Uses domain randomization and transfer learning to allow for the effective training of YOLOv8 using synthetic data generated with Unreal Engine, which is then used to detect shooters in video streams." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.03381v1 #arxiv https://creative.ai/system/media_attachments/files/111/030/045/137/576/297/original/ff124ebed3c4c514.jpg https://creative.ai/system/media_attachments/files/111/030/045/207/131/058/original/ee511ad705dd3f86.jpg https://creative.ai/system/media_attachments/files/111/030/045/292/639/986/original/5ec91483ae8ffac4.jpg https://creative.ai/system/media_attachments/files/111/030/045/366/533/145/original/46bd91b70b0a6b8b.jpg
📝 ViewMix: Augmentation for Robust Representation in Self-Supervised Learning 🔭🧠 "Cut and paste patches from one view to another and create different views of the same image to form positive pairs, and the network is trained to maximize the agreement between positive pairs while minimizing the agreement between negative pairs." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.03360v1 #arxiv https://creative.ai/system/media_attachments/files/111/029/868/345/026/922/original/8fb681072c38509f.jpg https://creative.ai/system/media_attachments/files/111/029/868/394/954/621/original/02a525bfdd0dfb09.jpg https://creative.ai/system/media_attachments/files/111/029/868/455/432/545/original/c7a51405a244586b.jpg
Notes by 9a622e93 | export