notes: 9a622e93

▲ ▼

 📝 Physical Invisible Backdoor Based on Camera Imaging 🔭

"By using the camera imaging process and leveraging the CFA interpolation algorithm and the camera fingerprint feature, the proposed method can implement a physical invisible backdoor attack without changing nature pixels of the images." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07428v1 #arxiv

https://creative.ai/system/media_attachments/files/111/071/845/016/249/555/original/0f581ac9a76ae8e7.jpg

https://creative.ai/system/media_attachments/files/111/071/845/100/453/176/original/210e0fd9d96ffd0c.jpg

https://creative.ai/system/media_attachments/files/111/071/845/163/616/272/original/78742d691dcbbb84.jpg

https://creative.ai/system/media_attachments/files/111/071/845/224/782/382/original/32bdf85e2445e1af.jpg

▲ ▼

 📝 Masked Diffusion with Task-Awareness for Procedure Planning in Instructional Videos 🔭

"The introduced mask acts akin to a task-oriented attention filter, enabling the diffusion/denoising process to concentrate on a subset of action types that are pertinent to the given task." [gal30b+] 🤖 #CV

⚙️ https://github.com/ffzzy840304/Masked-PDPP
🔗 https://arxiv.org/abs/2309.07409v1 #arxiv

https://creative.ai/system/media_attachments/files/111/071/549/817/427/736/original/b33c9fc202f5dea6.jpg

https://creative.ai/system/media_attachments/files/111/071/549/870/341/872/original/03f227fadfb3966a.jpg

https://creative.ai/system/media_attachments/files/111/071/549/933/645/294/original/fe8eab4106812a57.jpg

▲ ▼

 📝 Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance 🔭

"By estimating Dirichlet concentration parameters for singletons, comprehensive subjective opinions, including confusion and ignorance, could be achieved via further evidence combinations, which enables flexible visual recognition with uncertainty quantification." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07403v1 #arxiv

https://creative.ai/system/media_attachments/files/111/071/195/895/528/430/original/8521d2b1410adc43.jpg

https://creative.ai/system/media_attachments/files/111/071/195/956/616/153/original/d4fc2d1788278a85.jpg

https://creative.ai/system/media_attachments/files/111/071/196/081/021/533/original/a2c1673a0b828141.jpg

https://creative.ai/system/media_attachments/files/111/071/196/144/923/567/original/17d0f4948fb2617a.jpg

▲ ▼

 📝 Judging a Video by Its Bitstream Cover 🔭

"Develops a bitstream-based classifier that uses the MPEG-1/2/4 bitstream of a video to classify them into distinct categories such as Sport, Music Video, and Animation." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07361v1 #arxiv

https://creative.ai/system/media_attachments/files/111/070/960/061/006/342/original/74e2a1fe95230b2c.jpg

https://creative.ai/system/media_attachments/files/111/070/960/124/793/210/original/8359147ae6889fc1.jpg

https://creative.ai/system/media_attachments/files/111/070/960/275/064/408/original/c12c69c3e6f54d59.jpg

https://creative.ai/system/media_attachments/files/111/070/960/350/112/899/original/4049b29e1d102738.jpg

▲ ▼

 📝 Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection 🔭

"MMHL is comprised of supervised and self-supervised loss functions which utilize semantic features from different modalities and reduce the distance between RGB and thermal features, respectively, during saliency map generation." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07297v1 #arxiv

https://creative.ai/system/media_attachments/files/111/070/665/197/004/891/original/8c2d680dc2440f00.jpg

https://creative.ai/system/media_attachments/files/111/070/665/265/148/458/original/0bdd98c60c6ffc3a.jpg

https://creative.ai/system/media_attachments/files/111/070/665/331/558/427/original/4448f388a5844867.jpg

https://creative.ai/system/media_attachments/files/111/070/665/405/402/709/original/ac545cb71fa1f7b2.jpg

▲ ▼

 📝 Unbiased Face Synthesis with Diffusion Models: Are We There Yet? 🔭🧠

"Consists of several qualitative and quantitative measures, including embedding-based metrics and user studies, to audit the characteristics of generated faces conditioned on a set of social attributes." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.07277v1 #arxiv

https://creative.ai/system/media_attachments/files/111/070/370/308/001/348/original/93ad2759578a4be1.jpg

https://creative.ai/system/media_attachments/files/111/070/370/386/856/264/original/fba4e5943e2c8126.jpg

https://creative.ai/system/media_attachments/files/111/070/370/470/908/439/original/dce1931a5f910fc9.jpg

https://creative.ai/system/media_attachments/files/111/070/370/562/017/919/original/ceffeea5a5681383.jpg

▲ ▼

 📝 LCReg: Long-Tailed Image Classification with Latent Categories Based Recognition 🔭

"Learns a set of class-agnostic latent features shared by both head and tail classes, and then use semantic data augmentation on the latent features to implicitly increase the diversity of the training sample." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07186v1 #arxiv

https://creative.ai/system/media_attachments/files/111/070/075/445/960/573/original/23c8b3f689fe0ac5.jpg

https://creative.ai/system/media_attachments/files/111/070/075/508/686/924/original/61f65f5cd12d7e8f.jpg

https://creative.ai/system/media_attachments/files/111/070/075/586/292/168/original/42491090266aff51.jpg

▲ ▼

 📝 Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch 🧠🔭

"DiffAug first mines sufficient prior semantic knowledge about the neighborhood to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge, while ensuring that the augmented and original data share a smoothed latent space." [gal30b+] 🤖 #LG #CE #CV

⚙️ https://github.com/zangzelin/DiffAug
🔗 https://arxiv.org/abs/2309.07909v1 #arxiv

https://creative.ai/system/media_attachments/files/111/069/780/110/041/416/original/87de421ab477c80a.jpg

https://creative.ai/system/media_attachments/files/111/069/780/199/868/301/original/c18c6fe7f9457edd.jpg

https://creative.ai/system/media_attachments/files/111/069/780/267/517/617/original/74461425fd4fe0a4.jpg

https://creative.ai/system/media_attachments/files/111/069/780/328/602/393/original/9ecff75f5c0e944a.jpg

▲ ▼

 📝 Hardening RGB-D Object Recognition Systems Against Adversarial Patch Attacks 🔭

"Finds that RGB features make the functions learned by the network more complex and, thus, more sensitive to small perturbations, compared to depth features, which have been proved to be more invariant to small transformations." [gal30b+] 🤖 #CV #CR

⚙️ https://github.com/Trusted-AI/adversarial-robustness-toolbox/
🔗 https://arxiv.org/abs/2309.07106v1 #arxiv

https://creative.ai/system/media_attachments/files/111/069/486/804/016/368/original/d8758964763bcdbe.jpg

https://creative.ai/system/media_attachments/files/111/069/486/883/529/348/original/397e5cacc99c6ec9.jpg

https://creative.ai/system/media_attachments/files/111/069/486/958/608/683/original/4dac0596d0a9a944.jpg

https://creative.ai/system/media_attachments/files/111/069/487/031/925/256/original/9a61ba9a1f285e88.jpg

▲ ▼

 📝 FAIR: Frequency-Aware Image Restoration for Industrial Visual Anomaly Detection 🔭

"FAIR is a novel self-supervised image restoration task that restores images from their high-frequency components, which enables precise reconstruction of normal patterns while mitigating unfavorable generalization to anomalies." [gal30b+] 🤖 #CV

⚙️ https://github.com/liutongkun/FAIR
🔗 https://arxiv.org/abs/2309.07068v1 #arxiv

https://creative.ai/system/media_attachments/files/111/069/014/874/646/522/original/ca0c7236cc5f3df7.jpg

https://creative.ai/system/media_attachments/files/111/069/014/936/129/353/original/324c1ababcfa20a1.jpg

https://creative.ai/system/media_attachments/files/111/069/014/994/662/446/original/b49aded915575c9e.jpg

https://creative.ai/system/media_attachments/files/111/069/015/052/093/066/original/a59fca6d01a9f9ed.jpg

▲ ▼

 📝 Aggregating Long-Term Sharp Features via Hybrid Transformers for Video Deblurring 🔭

"A window-based local Transformer is employed for exploiting features from neighboring frames with cross attention, and a global multi-scale Transformer is utilized to aggregate long-term sharp features." [gal30b+] 🤖 #CV

⚙️ https://github.com/shangwei5/STGTN
🔗 https://arxiv.org/abs/2309.07054v1 #arxiv

https://creative.ai/system/media_attachments/files/111/068/778/981/350/924/original/679c0a24cccc5032.jpg

https://creative.ai/system/media_attachments/files/111/068/779/035/864/156/original/c045293f1cce34be.jpg

https://creative.ai/system/media_attachments/files/111/068/779/125/648/905/original/85a1bb0bf9040d7c.jpg

https://creative.ai/system/media_attachments/files/111/068/779/178/957/523/original/3a7dde3e3bf4139e.jpg

▲ ▼

 📝 Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning 🔭

"A margin-based Prototypical Contrast Learning embedding network reaps the benefits of prototype-data (cluster quality enhancement) and implicit data-data (fine-grained representations) interaction while providing substantial cluster supervision to the embedding network and the generator." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06987v1 #arxiv

https://creative.ai/system/media_attachments/files/111/068/425/025/710/612/original/4451f3ebfe689c84.jpg

https://creative.ai/system/media_attachments/files/111/068/425/111/954/500/original/dabc710c444664a2.jpg

https://creative.ai/system/media_attachments/files/111/068/425/172/370/165/original/09b27994a389da11.jpg

https://creative.ai/system/media_attachments/files/111/068/425/248/166/865/original/dfafcd0b4175cb52.jpg

▲ ▼

 📝 Differentiable JPEG: The Devil Is in the Details 🔭

"A novel differentiable JPEG approach is proposed, overcoming the limitations of existing methods: Differentiable wrt the input image, the JPEG quality, the quantization tables, and the color conversion parameters." [gal30b+] 🤖 #CV #MM

⚙️ https://github.com/necla-ml/Diff-JPEG
🔗 https://arxiv.org/abs/2309.06978v1 #arxiv

https://creative.ai/system/media_attachments/files/111/068/189/392/660/380/original/a4309c4f3e6aaf00.jpg

https://creative.ai/system/media_attachments/files/111/068/189/452/588/617/original/a630a23cf1e84cc7.jpg

https://creative.ai/system/media_attachments/files/111/068/189/510/222/692/original/1ce6342063bff45c.jpg

https://creative.ai/system/media_attachments/files/111/068/189/577/173/964/original/bcceb990cd72e8c2.jpg

▲ ▼

 📝 Neural Network-Based Coronary Dominance Classification of RCA Angiograms 🔭

"Employs convolutional neural network ConvNext and Swin transformer for 2D image (frames) classification along with a majority vote for cardio angiographic view classification." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06958v1 #arxiv

https://creative.ai/system/media_attachments/files/111/068/071/217/697/404/original/5a052a82d3c7274f.jpg

https://creative.ai/system/media_attachments/files/111/068/071/274/519/477/original/46ced6b924e05c24.jpg

https://creative.ai/system/media_attachments/files/111/068/071/330/330/969/original/44c9bc1a4fe2ebe4.jpg

https://creative.ai/system/media_attachments/files/111/068/071/391/888/767/original/4646be27f8c4f8bf.jpg

▲ ▼

 📝 DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models 🔭

"DreamStyler optimizes a multi-stage textual embedding with a context-aware text prompt, resulting in prominent image quality and flexibility to accommodate a range of style references." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06933v1 #arxiv

https://creative.ai/system/media_attachments/files/111/067/894/224/260/617/original/cc7de3bf3e362749.jpg

https://creative.ai/system/media_attachments/files/111/067/894/349/591/312/original/65a455ffca967b94.jpg

https://creative.ai/system/media_attachments/files/111/067/894/412/617/112/original/9e5d21ace9ca7c77.jpg

https://creative.ai/system/media_attachments/files/111/067/894/519/752/442/original/3533eed6382a4c13.jpg

▲ ▼

 📝 Hydra: Multi-Head Low-Rank Adaptation for Parameter Efficient Fine-Tuning 🔭

"Proposes a multi-branch adaption method named Hydra for fine-tuning large language models, which leverages both parallel and sequential adaptation methods simultaneously to combine the benefits of both." [gal30b+] 🤖 #CV

⚙️ https://github.com/extremebird/Hydra
🔗 https://arxiv.org/abs/2309.06922v1 #arxiv

https://creative.ai/system/media_attachments/files/111/067/599/465/898/447/original/9a67dc74843797b0.jpg

https://creative.ai/system/media_attachments/files/111/067/599/523/105/724/original/71cb40524aba6c0f.jpg

https://creative.ai/system/media_attachments/files/111/067/599/578/586/565/original/cb0de4240f71e918.jpg

https://creative.ai/system/media_attachments/files/111/067/599/637/465/725/original/627e9f0a0a78ca9e.jpg

▲ ▼

 📝 Keep It SimPool: Who Said Supervised Transformers Suffer From Attention Deficit? 🔭🧠

"Simpool replaces global average pooling by spatial similarity attention to improve performance and provide attention maps delineating object boundaries without explicit losses or modifying the architecture, whether supervised or self-supervised." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/billpsomas/simpool
🔗 https://arxiv.org/abs/2309.06891v1 #arxiv

https://creative.ai/system/media_attachments/files/111/067/186/512/645/317/original/7b0176085e27acac.jpg

https://creative.ai/system/media_attachments/files/111/067/186/567/794/569/original/bb1cf09340beaaa9.jpg

▲ ▼

 📝 Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization 🔭

"We disentangle high-dimensional video features into multiple components, which are explicitly trained to encode non-overlapping semantics with mutual information maximization loss (MIM) to ensure that useful task-relevant information is extracted from the original features." [gal30b+] 🤖 #CV #MM

⚙️ https://github.com/yyyooooo/DMI/
🔗 https://arxiv.org/abs/2309.06877v1 #arxiv

https://creative.ai/system/media_attachments/files/111/066/891/510/617/380/original/077d58e884e0b60c.jpg

https://creative.ai/system/media_attachments/files/111/066/891/578/774/204/original/acd5d26cc09d2c5a.jpg

https://creative.ai/system/media_attachments/files/111/066/891/631/362/378/original/935fcc1c9e114bb9.jpg

https://creative.ai/system/media_attachments/files/111/066/891/687/267/913/original/a13f4c0cc1fa949a.jpg

▲ ▼

 📝 TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification 🔭

"Works by training a linear classifier on text representations generated by a pre-trained language model (GPT-3), conditioned by task-specific prompts." [gal30b+] 🤖 #CV

⚙️ https://github.com/jmiemirza/TAP
🔗 https://arxiv.org/abs/2309.06809v1 #arxiv

https://creative.ai/system/media_attachments/files/111/066/655/677/238/287/original/98b33a67bbc27ec8.jpg

https://creative.ai/system/media_attachments/files/111/066/655/745/469/090/original/b29de7ab55571630.jpg

▲ ▼

 📝 Motion-Bias-Free Feature-Based SLAM 🔭

"Proposes a set of modifications that remedy the motion bias problem in SLAM by improving the feature matching process, the data association, and the outlier rejection of the pose-graph optimizer." [gal30b+] 🤖 #CV

⚙️ https://github.com/alejandrofontan/ORB_SLAM2_Deterministic
🔗 https://arxiv.org/abs/2309.06792v1 #arxiv

https://creative.ai/system/media_attachments/files/111/066/419/777/332/944/original/a484a3a70f5db1e0.jpg

https://creative.ai/system/media_attachments/files/111/066/419/843/237/478/original/47cb20b788263f3a.jpg

▲ ▼

 📝 Remote Sensing Object Detection Meets Deep Learning: A Meta-Review of Challenges and Advances 🔭

"This review summarizes the development process of RSOD and identifies five main challenges in RSOD including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06751v1 #arxiv

https://creative.ai/system/media_attachments/files/111/066/124/757/425/548/original/b68d6783be8a0c51.jpg

https://creative.ai/system/media_attachments/files/111/066/124/845/088/711/original/8188312d3951a22a.jpg

https://creative.ai/system/media_attachments/files/111/066/124/952/215/257/original/670446972632f7ce.jpg

https://creative.ai/system/media_attachments/files/111/066/125/040/139/524/original/acacc020c6112aaa.jpg

▲ ▼

 📝 GelFlow: Self-Supervised Learning of Optical Flow for Vision-Based Tactile Sensor Displacement Measurement 🔭

"Employs a multi-scale feature pyramid to handle large deformations by constructing a multi-scale feature pyramid from the input image, which is more suitable for vision-based tactile sensor images." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06735v1 #arxiv

https://creative.ai/system/media_attachments/files/111/065/947/981/502/125/original/c6975bc4b329e0b0.jpg

https://creative.ai/system/media_attachments/files/111/065/948/044/745/470/original/781747161ce5c2d9.jpg

https://creative.ai/system/media_attachments/files/111/065/948/099/794/701/original/562ac096f81883f1.jpg

▲ ▼

 📝 Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense 🔭🧠

"Works by using a non-parametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash, etc." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.06724v1 #arxiv

https://creative.ai/system/media_attachments/files/111/065/771/027/672/064/original/874fb691eadb3937.jpg

https://creative.ai/system/media_attachments/files/111/065/771/124/105/104/original/23d7f21ed9554c08.jpg

https://creative.ai/system/media_attachments/files/111/065/771/198/993/184/original/771dfd7f333cc95e.jpg

https://creative.ai/system/media_attachments/files/111/065/771/273/163/071/original/4930bd44ae095da8.jpg

▲ ▼

 📝 Deep Attentive Time Warping 🔭

"Predicts all local correspondences between two time series based on metric learning and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task." [gal30b+] 🤖 #CV

⚙️ https://github.com/matsuo-shinnosuke/deep-attentive-time
🔗 https://arxiv.org/abs/2309.06720v1 #arxiv

https://creative.ai/system/media_attachments/files/111/065/416/928/165/150/original/e01204ed2d9f5ce9.jpg

https://creative.ai/system/media_attachments/files/111/065/416/995/929/535/original/a49dab7d01061e37.jpg

https://creative.ai/system/media_attachments/files/111/065/417/060/287/494/original/d5d2697cc2ebe364.jpg

https://creative.ai/system/media_attachments/files/111/065/417/124/827/533/original/2f459db9b648cc5d.jpg

▲ ▼

 📝 MPI-Flow: Learning Realistic Optical Flow with Multiplane Images 🔭

"A learning-based MPI-Flow framework is proposed, which generates highly realistic optical flow maps from real-world images and achieves state-of-the-art performance in both unsupervised and supervised learning of optical flow estimation models." [gal30b+] 🤖 #CV

⚙️ https://github.com/Sharpiless/MPI-Flow
🔗 https://arxiv.org/abs/2309.06714v1 #arxiv

https://creative.ai/system/media_attachments/files/111/065/181/599/871/767/original/2017486bf1b85a0c.jpg

https://creative.ai/system/media_attachments/files/111/065/181/672/267/933/original/e0cf471491767214.jpg

https://creative.ai/system/media_attachments/files/111/065/181/735/554/909/original/c40255fe367f784e.jpg

https://creative.ai/system/media_attachments/files/111/065/181/826/989/919/original/1e7858009c1a236b.jpg

▲ ▼

 📝 ShaDocFormer: A Shadow-Attentive Threshold Detector with Cascaded Fusion Refiner for Document Shadow Removal' to the ICASSP 2024 Online Submission System 🔭

"The architecture of ShaDocFormer includes Shadow-attentive Threshold Detector (STD) and Cascaded Fusion Refiner (CFR), where CFR takes advantage of STD to generate shadow mask." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06670v1 #arxiv

https://creative.ai/system/media_attachments/files/111/064/886/187/111/076/original/a37985a4bceee29b.jpg

https://creative.ai/system/media_attachments/files/111/064/886/282/823/561/original/d53a9f137517d392.jpg

https://creative.ai/system/media_attachments/files/111/064/886/350/609/436/original/7738813f7479d776.jpg

▲ ▼

 📝 Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity 🔭🧠

"Introduces a training procedure that enables the exploitation of activation sparsity by inducing semi-structured sparsity through regularization and a training procedure that is aware of the structure of the GEMM operation." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.06626v1 #arxiv

https://creative.ai/system/media_attachments/files/111/064/709/382/486/009/original/f9c4dab67cc54dc5.jpg

https://creative.ai/system/media_attachments/files/111/064/709/460/165/833/original/1cf96c5cb840c7c5.jpg

https://creative.ai/system/media_attachments/files/111/064/709/611/181/238/original/3bb6030b2b453c2a.jpg

https://creative.ai/system/media_attachments/files/111/064/709/688/393/406/original/6c6e965e9258086e.jpg

▲ ▼

 📝 Zero-Shot Visual Classification with Guided Cropping 🔭

"Uses a zero-shot object detector to guide cropping of input images and increase the influence of object-relevant features in zero-shot classification task using CLIP." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06581v1 #arxiv

https://creative.ai/system/media_attachments/files/111/064/532/454/231/283/original/fae70e60ebb97330.jpg

https://creative.ai/system/media_attachments/files/111/064/532/520/685/337/original/0b24dd6218b51a22.jpg

https://creative.ai/system/media_attachments/files/111/064/532/614/122/767/original/41e0b427f79ce757.jpg

https://creative.ai/system/media_attachments/files/111/064/532/677/049/507/original/a1adb4c135a9b011.jpg

▲ ▼

 📝 Strong-Weak Integrated Semi-Supervision for Unsupervised Single and Multi Target Domain Adaptation 🔭

"A strong representative set with high confidence but low diversity target domain samples and a weak representative set with low confidence but high diversity target domain samples are generated and updated dynamically during training." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06528v1 #arxiv

https://creative.ai/system/media_attachments/files/111/064/237/458/478/631/original/bb476568a520587e.jpg

https://creative.ai/system/media_attachments/files/111/064/237/557/413/987/original/88d5fc2e482667f3.jpg

https://creative.ai/system/media_attachments/files/111/064/237/617/850/247/original/acc9445a0475f712.jpg

https://creative.ai/system/media_attachments/files/111/064/237/680/619/307/original/164e0a3eb566d682.jpg

▲ ▼

 📝 DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention 🔭

"Presents a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks that leverages the synergy between the two modalities." [gal30b+] 🤖 #CV #MM

🔗 https://arxiv.org/abs/2309.06511v1 #arxiv

https://creative.ai/system/media_attachments/files/111/064/001/500/220/326/original/e4f11b50e0e2f98d.jpg

https://creative.ai/system/media_attachments/files/111/064/001/563/011/631/original/d6dc62bb068539cf.jpg

https://creative.ai/system/media_attachments/files/111/064/001/631/581/187/original/edfd4da139570ac8.jpg

https://creative.ai/system/media_attachments/files/111/064/001/698/879/045/original/00509b08cd9ce46d.jpg

▲ ▼

 📝 PILOT: A Pre-Trained Model-Based Continual Learning Toolbox 🧠🔭

"PILOT implements several state-of-the-art pre-trained model-based approaches that tackle class-incremental learning, which is a continual learning setting where new classes continually arrive." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.07117v1 #arxiv

https://creative.ai/system/media_attachments/files/111/063/765/560/800/717/original/5f28f4b7028b7a0c.jpg

https://creative.ai/system/media_attachments/files/111/063/765/645/445/091/original/77fa0b05c821d2a4.jpg

▲ ▼

 📝 Generalizable Neural Fields as Partially Observed Neural Processes 🧠🔭

"Proposes a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.06660v1 #arxiv

https://creative.ai/system/media_attachments/files/111/063/470/460/731/300/original/a099670ecec37f3d.jpg

https://creative.ai/system/media_attachments/files/111/063/470/529/999/124/original/f4948b09755e6534.jpg

https://creative.ai/system/media_attachments/files/111/063/470/596/253/011/original/b2ee028a8f62e2d5.jpg

https://creative.ai/system/media_attachments/files/111/063/470/663/270/484/original/b016ee822b9d8d58.jpg

▲ ▼

 📝 Harmonic-Nas: Hardware-Aware Multimodal Neural Architecture Search on Resource-Constrained Devices 🧠🔭

"Harmonic-NAS is a two-tier Neural Architecture Search (NAS) approach for Multimodal Neural Networks (MM-NN) for efficient inference on IoT devices." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.06612v1 #arxiv

https://creative.ai/system/media_attachments/files/111/063/293/398/958/992/original/ee31ee37edf6a2ef.jpg

https://creative.ai/system/media_attachments/files/111/063/293/492/811/749/original/b94cf324f5e28d28.jpg

https://creative.ai/system/media_attachments/files/111/063/293/553/378/854/original/8824b255374af50a.jpg

https://creative.ai/system/media_attachments/files/111/063/293/624/608/650/original/6feab7afc8a850df.jpg

▲ ▼

 📝 Exploring Non-Additive Randomness on ViT Against Query-Based Black-Box Attacks 🔭

"Proposes a novel approach of using non-additive stochasticity in Vision Transformers based models to defend against black-box attacks in the query-based scenario, which is underexplored to date." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06438v1 #arxiv

https://creative.ai/system/media_attachments/files/111/063/143/466/003/363/original/3dce36c7a83eaa36.jpg

https://creative.ai/system/media_attachments/files/111/063/143/531/154/807/original/34233498338d8e7b.jpg

https://creative.ai/system/media_attachments/files/111/063/143/594/595/012/original/c306d230102cdd85.jpg

▲ ▼

 📝 Padding-Free Convolution Based on Preservation of Differential Characteristics of Kernels 🔭

"By making convolution over an incomplete sliding window "collapse" to a linear differential operator evaluated locally at its central pixel, which no longer requires information from the neighbouring missing pixels." [gal30b+] 🤖 #CV

⚙️ https://github.com/stfc-sciml/DifferentialConv2d
🔗 https://arxiv.org/abs/2309.06370v1 #arxiv

https://creative.ai/system/media_attachments/files/111/062/966/404/657/429/original/f649b9b2cc0570ec.jpg

https://creative.ai/system/media_attachments/files/111/062/966/516/718/071/original/b294a6329a839021.jpg

▲ ▼

 📝 Exploring Flat Minima for Domain Generalization with Large Learning Rates 🔭

"Observes that using a large learning rate can not only promote weight diversify but also help identify flat regions in the loss landscape, which can be used to improve the generalization of DNNs." [gal30b+] 🤖 #CV

⚙️ https://github.com/koncle/DG-with-Large-LR
🔗 https://arxiv.org/abs/2309.06337v1 #arxiv

https://creative.ai/system/media_attachments/files/111/062/730/407/507/182/original/e361222be4d1bf0a.jpg

https://creative.ai/system/media_attachments/files/111/062/730/465/455/892/original/e279a93343b7ffe4.jpg

https://creative.ai/system/media_attachments/files/111/062/730/521/755/338/original/b10ee1ce07d8efae.jpg

https://creative.ai/system/media_attachments/files/111/062/730/578/189/206/original/856470ffc2ecf7d9.jpg

▲ ▼

 📝 SAMPLING: Scene-Adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis From a Single Image 🔭

"Introduces SAMPLING, a novel view synthesis method for large-scale outdoor scenes with a single image as input and an adaptive-bins strategy for multiplane images." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06323v1 #arxiv

https://creative.ai/system/media_attachments/files/111/062/612/507/877/439/original/6340da365d141ed9.jpg

https://creative.ai/system/media_attachments/files/111/062/612/586/346/546/original/262e25f6db810060.jpg

https://creative.ai/system/media_attachments/files/111/062/612/668/099/068/original/d1ca75479b5463eb.jpg

https://creative.ai/system/media_attachments/files/111/062/612/741/469/655/original/12b731da0c5216e0.jpg

▲ ▼

 📝 Towards High-Quality Specular Highlight Removal by Leveraging Large-Scale Synthetic Data 🔭

"Proposes a three-stage network to remove specular highlights from a single image by decomposing it into the albedo, shading, and specular residue components, refining the decomposition results, and adjusting tone of the refined result to match that of the input as closely as possible." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06302v1 #arxiv

https://creative.ai/system/media_attachments/files/111/062/199/571/084/106/original/42f72a9eb51cd5bf.jpg

https://creative.ai/system/media_attachments/files/111/062/199/630/808/773/original/1321b5be09a9d9f2.jpg

https://creative.ai/system/media_attachments/files/111/062/199/705/785/361/original/89fb0f385a42f5c4.jpg

https://creative.ai/system/media_attachments/files/111/062/199/761/272/326/original/f3f21699fed159ad.jpg

▲ ▼

 📝 Self-Training and Multi-Task Learning for Limited Data: Evaluation Study on Object Detection 🔭

"Self-training and multi-task learning frameworks, despite being particularly data demanding, have potentials for data exploitation if such assumptions can be relaxed to be less restrictive and data demanding." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06288v1 #arxiv

https://creative.ai/system/media_attachments/files/111/061/905/497/191/117/original/c52b7a65169c5832.jpg

https://creative.ai/system/media_attachments/files/111/061/905/564/127/846/original/5fb92f9853a39b0b.jpg

▲ ▼

 📝 Fg-T2m: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model 🔭

"Contains two parts: 1) a linguistics-structure assisted module that constructs accurate and complete language feature to fully utilize text information; and 2) a context-aware progressive reasoning module that learns neighborhood and overall semantic linguistics features from shallow and deep graph neural networks." [gal30b+] 🤖 #CV #MM

🔗 https://arxiv.org/abs/2309.06284v1 #arxiv

https://creative.ai/system/media_attachments/files/111/061/609/741/461/488/original/b496151402c63cb1.jpg

https://creative.ai/system/media_attachments/files/111/061/609/815/143/293/original/c9eef27c1014b257.jpg

https://creative.ai/system/media_attachments/files/111/061/609/867/815/574/original/6a5d46d06df839ce.jpg

https://creative.ai/system/media_attachments/files/111/061/609/917/646/630/original/e787b034460941c4.jpg

▲ ▼

 📝 IBAFormer: Intra-Batch Attention Transformer for Domain Generalized Semantic Segmentation 🔭

"Proposes a novel intra-batch attention mechanism, which incorporates information from other independent samples in the same batch, enriching contextual information and diversifying training data for each attention block." [gal30b+] 🤖 #CV

⚙️ https://github.com/open-mmlab/mmsegmentation
🔗 https://arxiv.org/abs/2309.06282v1 #arxiv

https://creative.ai/system/media_attachments/files/111/061/373/882/394/988/original/14e30f9f64279b1f.jpg

https://creative.ai/system/media_attachments/files/111/061/373/934/557/001/original/923b00df9133f9b8.jpg

https://creative.ai/system/media_attachments/files/111/061/373/992/676/282/original/9221f4c1ec33c322.jpg

https://creative.ai/system/media_attachments/files/111/061/374/052/501/021/original/530cce66c85ee604.jpg

▲ ▼

 📝 OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation 🔭

"Proposes an unsupervised framework of object-centric temporal action segmentation (OTAS), which consists of three modules as shown in (Figure ): global feature extraction, self-supervised local feature extraction, and boundary selection." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06276v1 #arxiv

https://creative.ai/system/media_attachments/files/111/061/314/817/558/065/original/6429defa500cc42a.jpg

https://creative.ai/system/media_attachments/files/111/061/314/878/699/751/original/ef18ef9586d27832.jpg

https://creative.ai/system/media_attachments/files/111/061/315/018/991/474/original/df74ce463163b3f1.jpg

https://creative.ai/system/media_attachments/files/111/061/315/074/869/521/original/05c5ac2d5875ebb2.jpg

▲ ▼

 📝 Modality Unifying Network for Visible-Infrared Person Re-Identification 🔭

"A Modality Unifying Network (MUN) is proposed for cross-modality person search by generating an auxiliary modality to explore modality-shared and modality-specific representations." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06262v1 #arxiv

https://creative.ai/system/media_attachments/files/111/061/078/961/634/650/original/e2a5bce6407b8da9.jpg

https://creative.ai/system/media_attachments/files/111/061/079/079/548/345/original/d91eed2a5c63dd6c.jpg

https://creative.ai/system/media_attachments/files/111/061/079/142/149/512/original/535ed902f2a6bfd5.jpg

https://creative.ai/system/media_attachments/files/111/061/079/206/414/787/original/ee3949c5ccd298ea.jpg

▲ ▼

 📝 Use Neural Networks to Recognize Students' Handwritten Letters and Incorrect Symbols 🔭

"Given students' multiple-choice answers as the input, the image classifier predicts their answers, i-e, the classification label with the highest predicted probability is considered the correct answer." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06221v1 #arxiv

https://creative.ai/system/media_attachments/files/111/060/843/018/636/776/original/d5fd24c2a7d261da.jpg

https://creative.ai/system/media_attachments/files/111/060/843/082/502/775/original/4c8eaf58cea1aae2.jpg

https://creative.ai/system/media_attachments/files/111/060/843/140/640/205/original/277fb78ee6890fbe.jpg

https://creative.ai/system/media_attachments/files/111/060/843/198/588/794/original/b1fa6671ca466931.jpg

▲ ▼

 📝 Fast Sparse PCA via Positive Semidefinite Projection for Unsupervised Feature Selection 🔭

"By imposing PSD constraint and a regularization parameter setting strategy, it's proved that the optimal solution to a convex SPCA model is optimized on the PSD cone, which is equivalent to an orthogonal matrix." [gal30b+] 🤖 #CV

⚙️ https://github.com/liuyanfang023/KBS-RNE
🔗 https://arxiv.org/abs/2309.06202v1 #arxiv

https://creative.ai/system/media_attachments/files/111/060/665/916/274/450/original/21a79854905a94ee.jpg

https://creative.ai/system/media_attachments/files/111/060/666/013/423/410/original/8f3969df4ad47e38.jpg

https://creative.ai/system/media_attachments/files/111/060/666/070/019/388/original/b8b311e07f86e842.jpg

https://creative.ai/system/media_attachments/files/111/060/666/200/868/524/original/8a046894be5a21f5.jpg

▲ ▼

 📝 Dual-Path Temporal Map Optimization for Make-Up Temporal Video Grounding 🔭

"DPTMO extracts both query-agnostic and query-guided features to construct two proposal sets and uses specific evaluation methods for the two sets, which represent the cross-modal makeup video-text similarity and multi-modal fusion relationship, complementing each other." [gal30b+] 🤖 #CV #MM

⚙️ https://github.com/AIM3-RUC/Youmakeup
🔗 https://arxiv.org/abs/2309.06176v1 #arxiv

https://creative.ai/system/media_attachments/files/111/060/548/246/694/502/original/636fe070de2592bb.jpg

https://creative.ai/system/media_attachments/files/111/060/548/314/083/064/original/05c1fd28f48f06ca.jpg

https://creative.ai/system/media_attachments/files/111/060/548/393/774/767/original/24d86a7780f0082a.jpg

https://creative.ai/system/media_attachments/files/111/060/548/472/698/386/original/c217b32d489a03db.jpg

▲ ▼

 📝 Towards Reliable Domain Generalization: A New Dataset and Evaluations 🔭🧠

"Proposes a new domain generalization task for handwritten Chinese character recognition (HCCR) to enrich the application scenarios of DG method research, which is not studied in previous work." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.06142v1 #arxiv

https://creative.ai/system/media_attachments/files/111/060/312/248/090/194/original/6ad0bd4460087c27.jpg

https://creative.ai/system/media_attachments/files/111/060/312/327/038/221/original/397428046dd0aee4.jpg

https://creative.ai/system/media_attachments/files/111/060/312/420/944/434/original/77652b1590d35084.jpg

https://creative.ai/system/media_attachments/files/111/060/312/547/610/276/original/229088e7ecacee84.jpg

▲ ▼

 📝 Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning 🔭

"DVPT is a new PETL method, it can generate a dynamic instance-wise token for each image via a Meta-Net module, which captures the dynamic instance-wise visual features." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06123v1 #arxiv

https://creative.ai/system/media_attachments/files/111/060/076/233/591/986/original/6ac16ce5187b9c7b.jpg

https://creative.ai/system/media_attachments/files/111/060/076/289/122/482/original/30a6c226ba44a48c.jpg

https://creative.ai/system/media_attachments/files/111/060/076/345/970/842/original/a63618d72e925d3c.jpg

▲ ▼

 📝 C-RITNet: Set Infrared and Visible Image Fusion Free From Complementary Information Mining 🔭

"First skillfully sidesteps aggregating complementary information in IVIF, and then it reasonably transfers complementary information into redundant one to integrate both the shared and complementary features from two modalities." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.06118v1 #arxiv

https://creative.ai/system/media_attachments/files/111/059/899/214/452/622/original/b5a8ac2554a7a24e.jpg

https://creative.ai/system/media_attachments/files/111/059/899/284/437/100/original/b04580fdff288d96.jpg

https://creative.ai/system/media_attachments/files/111/059/899/371/237/735/original/089665e9922afeb1.jpg

https://creative.ai/system/media_attachments/files/111/059/899/446/075/934/original/e7b2b20e30eda98e.jpg

▲ ▼

 📝 Learning From History: Task-Agnostic Model Contrastive Learning for Image Restoration 🔭

"SPNIR introduces the Self-Prior guided Negative loss to enable "learning from history", which can adaptively and automatically generate negative samples to train the target model without introducing any task-specific bias." [gal30b+] 🤖 #CV

⚙️ https://github.com/Aitical/Task-agnostic_Model_Contrastive_Learning_Image_Restoration
🔗 https://arxiv.org/abs/2309.06023v1 #arxiv

https://creative.ai/system/media_attachments/files/111/059/663/407/088/844/original/8e00b9843bdd1d15.jpg

https://creative.ai/system/media_attachments/files/111/059/663/469/163/793/original/0af5091d2954258d.jpg

https://creative.ai/system/media_attachments/files/111/059/663/526/360/616/original/beb5a1a5f97dff8a.jpg

https://creative.ai/system/media_attachments/files/111/059/663/588/668/020/original/3a78fc9b899e5637.jpg

▲ ▼

 📝 ATTA: Anomaly-Aware Test-Time Adaptation for Out-of-Distribution Detection in Segmentation 🔭🧠

"Proposes the dual-level OOD detection approach to handle domain shift and semantic shift jointly, by distinguishing whether domain shift exists in the image by leveraging global low-level features, and identifying pixels with semantic shift by utilizing dense high-level feature maps." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/gaozhitong/ATTA
🔗 https://arxiv.org/abs/2309.05994v1 #arxiv

https://creative.ai/system/media_attachments/files/111/059/545/482/913/104/original/ab8985a9d3facb4c.jpg

https://creative.ai/system/media_attachments/files/111/059/545/546/073/030/original/2f54d80b7bded899.jpg

https://creative.ai/system/media_attachments/files/111/059/545/610/019/377/original/24291aa38924c978.jpg

https://creative.ai/system/media_attachments/files/111/059/545/679/793/899/original/3959e11b702ace86.jpg

▲ ▼

 📝 Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation 🔭

"A straightforward textual template and a foreground-background segmentation algorithm are employed to generate foreground images set against isolated backgrounds, while an image captioning method and a text-to-image synthesis framework are used to produce context images." [gal30b+] 🤖 #CV

⚙️ https://github.com/gyhandy/Text2Image-for-Detection
🔗 https://arxiv.org/abs/2309.05956v1 #arxiv

https://creative.ai/system/media_attachments/files/111/059/368/533/648/740/original/066e4979c5360b6e.jpg

https://creative.ai/system/media_attachments/files/111/059/368/730/220/144/original/a0e8cfd8105d3284.jpg

https://creative.ai/system/media_attachments/files/111/059/368/838/903/965/original/81e4970c6e85f980.jpg

https://creative.ai/system/media_attachments/files/111/059/368/917/888/108/original/c0bae1aa1c7ffb15.jpg

▲ ▼

 📝 Hierarchical Conditional Semi-Paired Image-to-Image Translation for Multi-Task Image Defect Correction on Shopping Websites 🔭🧠

"A novel unified Image-to-Image model to correct multiple defects across different product types, leveraging an attention mechanism to hierarchically incorporate high-level defect groups and specific defect types to guide the network to focus on defect-related image regions." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.05883v1 #arxiv

https://creative.ai/system/media_attachments/files/111/059/191/534/600/656/original/c251548b9a92f3b0.jpg

https://creative.ai/system/media_attachments/files/111/059/191/595/538/646/original/99e325a115f109fe.jpg

https://creative.ai/system/media_attachments/files/111/059/191/652/138/318/original/7ef71f3ccf2a7743.jpg

https://creative.ai/system/media_attachments/files/111/059/191/708/153/968/original/87b87967d07bcbed.jpg

▲ ▼

 📝 Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation 🔭

"The Self-Correlation and Cross-Correlation Learning Network (SCCNe) is a few-shot remote sensing image semantic segmentation model that consists of a support branch and a query branch." [gal30b+] 🤖 #CV

⚙️ https://github.com/linhanwang/SCCNe
🔗 https://arxiv.org/abs/2309.05840v1 #arxiv

https://creative.ai/system/media_attachments/files/111/058/955/562/990/825/original/4348a324a02df84e.jpg

https://creative.ai/system/media_attachments/files/111/058/955/622/162/368/original/d0f1b8b09ddb3d54.jpg

https://creative.ai/system/media_attachments/files/111/058/955/708/904/863/original/7e922d387598f9cb.jpg

https://creative.ai/system/media_attachments/files/111/058/955/767/765/442/original/917e482341bb013f.jpg

▲ ▼

 📝 SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition 🔭

"SCD-Net is based on a novel contrastive learning framework which learns disentangled spatial and temporal clues via a constructed anchor and the proposed masking strategy with structural constraints, and can be applied for skeleton-based action recognition." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.05834v1 #arxiv

https://creative.ai/system/media_attachments/files/111/058/837/579/000/657/original/4b9a0c9471bd3d93.jpg

https://creative.ai/system/media_attachments/files/111/058/837/660/745/556/original/ee5a1e58f8e15bc6.jpg

https://creative.ai/system/media_attachments/files/111/058/837/725/175/054/original/02902b9cbc4a026f.jpg

https://creative.ai/system/media_attachments/files/111/058/837/785/383/215/original/a48ebfb62a82b02a.jpg

▲ ▼

 📝 Blendshapes GHUM: Real-Time Monocular Facial Blendshape Prediction 🔭

"Presents Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars." [gal30b+] 🤖 #CV

⚙️ https://github.com/google/
🔗 https://arxiv.org/abs/2309.05782v1 #arxiv

https://creative.ai/system/media_attachments/files/111/058/188/664/681/385/original/902f9611a00ca41f.jpg

https://creative.ai/system/media_attachments/files/111/058/188/725/110/724/original/e40952c622e62e74.jpg

https://creative.ai/system/media_attachments/files/111/058/188/776/962/095/original/b8942614b88ccbbf.jpg

▲ ▼

 📝 TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language 🔭

"TransferDoc is a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives which learns richer semantic concepts by unifying language and visual representations." [gal30b+] 🤖 #CV

⚙️ https://github.com/tesseract-ocr/tesseract
🔗 https://arxiv.org/abs/2309.05756v1 #arxiv

https://creative.ai/system/media_attachments/files/111/058/011/898/874/739/original/49be9f11b5424907.jpg

https://creative.ai/system/media_attachments/files/111/058/011/995/447/791/original/9dc8b2258cd7a2e3.jpg

https://creative.ai/system/media_attachments/files/111/058/012/072/599/845/original/6f8d47bef1d5822a.jpg

https://creative.ai/system/media_attachments/files/111/058/012/197/429/735/original/08f82dc2376c88f3.jpg

▲ ▼

 📝 InstaFlow: One Step Is Enough for High-Quality Diffusion-Based Text-to-Image Generation 🧠🔭

"InstaFlow is trained via a novel text-conditioned pipeline, in which reflow plays a critical role in improving the assignment between noise and images by refining the coupling between noises and images through straightening the trajectories of probability flows." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/gnobitab/InstaFlow
🔗 https://arxiv.org/abs/2309.06380v1 #arxiv

https://creative.ai/system/media_attachments/files/111/057/775/795/131/745/original/566ffbcf5173bb7a.jpg

https://creative.ai/system/media_attachments/files/111/057/775/898/577/943/original/cc187a6e454f1f07.jpg

https://creative.ai/system/media_attachments/files/111/057/775/982/685/741/original/706dce085e4a6019.jpg

https://creative.ai/system/media_attachments/files/111/057/776/064/914/533/original/be51cf40eb9e88cf.jpg

▲ ▼

 📝 Elucidating the Solution Space of Extended Reverse-Time SDE for Diffusion Models 🧠🔭

"Formulates the sampling process as an extended reverse-time SDE (ER SDE) and devise fast and training-free samplers, ER-SDE Solvers, elevating the efficiency of stochastic samplers to unprecedented levels." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.06169v1 #arxiv

https://creative.ai/system/media_attachments/files/111/057/540/039/211/175/original/86c02c6ee3b07c33.jpg

https://creative.ai/system/media_attachments/files/111/057/540/204/258/675/original/dadd4e640d1e376e.jpg

https://creative.ai/system/media_attachments/files/111/057/540/300/256/623/original/fe4828a6bbabbc36.jpg

https://creative.ai/system/media_attachments/files/111/057/540/409/093/713/original/74a4347936334afa.jpg

▲ ▼

 📝 Certified Robust Models with Slack Control and Large Lipschitz Constants 🧠🔭

"Proposes a Calibrated Lipschitz-Margin Loss (CLL) that improves certified robustness by explicitly calibrating the loss wrt margin and Lipschitz constant, thereby establishing full control over slack and improving robustness certificates even with larger Lipschitz constants." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/mlosch/CLL
🔗 https://arxiv.org/abs/2309.06166v1 #arxiv

https://creative.ai/system/media_attachments/files/111/057/303/921/951/400/original/25eff6f2e0df30a5.jpg

https://creative.ai/system/media_attachments/files/111/057/303/986/950/236/original/eefe9f9b33e6401c.jpg

https://creative.ai/system/media_attachments/files/111/057/304/045/039/737/original/4d76c295234184d3.jpg

https://creative.ai/system/media_attachments/files/111/057/304/112/864/240/original/4e75bcc56655e315.jpg

▲ ▼

 📝 Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning 🧠🔭

"An expert network is relieved of the duty of keeping the previous knowledge and can focus on performing optimally on the new tasks, while a previous network is used in an adaptation-retrospection phase to avoid forgetting and initialize a new expert with the knowledge of the old network." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/alviur/pocon_wacv2024
🔗 https://arxiv.org/abs/2309.06086v1 #arxiv

https://creative.ai/system/media_attachments/files/111/057/186/119/699/977/original/9069df62bba99425.jpg

https://creative.ai/system/media_attachments/files/111/057/186/255/127/281/original/9bb24e46e13a1b23.jpg

https://creative.ai/system/media_attachments/files/111/057/186/312/822/265/original/a2e76599cb26d280.jpg

https://creative.ai/system/media_attachments/files/111/057/186/376/843/587/original/cdb59842da4fdb5b.jpg

▲ ▼

 📝 KD-FixMatch: Knowledge Distillation Siamese Neural Networks 🧠🔭

"Presents KD-FixMatch, a novel SSL algorithm that addresses the limitations of FixMatch by incorporating knowledge distillation to enhance performance and reduce performance degradation in the early training stage." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/petewarden/tensorflow_
🔗 https://arxiv.org/abs/2309.05826v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/891/023/997/696/original/8f7a5be1c3515a6e.jpg

▲ ▼

 📝 DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices 🔭

"Proposes a collaborative inference framework, DeViT, to facilitate edge deployment by decomposing large ViTs into multiple small models for collaborative inference at the edge devices." [gal30b+] 🤖 #CV #DC #PF

🔗 https://arxiv.org/abs/2309.05015v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/553/518/795/175/original/1a18544287dd8f57.jpg

https://creative.ai/system/media_attachments/files/111/056/553/585/769/253/original/58a1d9e7552c6eba.jpg

https://creative.ai/system/media_attachments/files/111/056/553/649/520/780/original/7ffcc32cf6cd671c.jpg

https://creative.ai/system/media_attachments/files/111/056/553/718/374/058/original/993481ce295fdedb.jpg

▲ ▼

 📝 Towards Fully Decoupled End-to-End Person Search 🔭

"Task-incremental end-to-end person search network is proposed for the detection and re-id sub-task, which decouples the model architecture for the two sub-tasks." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.04967v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/526/931/470/687/original/35c0c7a30a08fa91.jpg

https://creative.ai/system/media_attachments/files/111/056/527/020/097/625/original/f7df18939915767d.jpg

https://creative.ai/system/media_attachments/files/111/056/527/082/239/075/original/f369d200408651ca.jpg

https://creative.ai/system/media_attachments/files/111/056/552/780/312/441/original/8d79cec487fdcb44.jpg

▲ ▼

 📝 Semi-Supervised Learning for Face Anti-Spoofing Using Apex Frame 🔭

"An apex frame is derived from a video by computing a weighted sum of its frames, where the weights are determined using a Gaussian distribution centered around the video's central frame." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.04958v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/525/825/443/220/original/6128a9dbccb5e6d3.jpg

https://creative.ai/system/media_attachments/files/111/056/525/903/134/297/original/529e606c4094598e.jpg

https://creative.ai/system/media_attachments/files/111/056/525/972/335/010/original/85649558b8a77181.jpg

▲ ▼

 📝 Semi-Supervised Instance Segmentation with a Learned Shape Prior 🔭

"A variational autoencoder (VAE) is trained on either (a) real cell shape patches or (b) synthetic cell shape patches + noise to learn shape prior." [gal30b+] 🤖 #CV

⚙️ https://github.com/looooongChen/shape
🔗 https://arxiv.org/abs/2309.04888v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/524/684/586/414/original/3e5738a5a81ff1ad.jpg

https://creative.ai/system/media_attachments/files/111/056/524/751/072/696/original/ef07b938e66eab3f.jpg

https://creative.ai/system/media_attachments/files/111/056/524/812/926/766/original/c99d8da7d695cb1f.jpg

▲ ▼

 📝 SortedAP: Rethinking Evaluation Metrics for Instance Segmentation 🔭

"Proposes a new metric called sortedAP, which strictly decreases with both object- and pixel-level imperfections and has an uninterrupted penalization scale over the entire domain." [gal30b+] 🤖 #CV

⚙️ https://www.github.com/looooongChen/sortedAP
🔗 https://arxiv.org/abs/2309.04887v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/524/126/100/519/original/06760aec4e48c3cc.jpg

https://creative.ai/system/media_attachments/files/111/056/524/193/134/157/original/c4dd21ffa4d4e991.jpg

https://creative.ai/system/media_attachments/files/111/056/524/249/812/194/original/c9fe366ffc593bcb.jpg

▲ ▼

 📝 ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-Agnostic Counting 🔭🧠

"Composed by three stages: a) a feature extractor, b) an object detector, and c) a blind counter that predicts the number of each type of object in the image without using any example of that type." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.04820v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/523/201/002/684/original/d007eba0a3518d80.jpg

https://creative.ai/system/media_attachments/files/111/056/523/274/288/798/original/8618452868c0f0d5.jpg

https://creative.ai/system/media_attachments/files/111/056/523/340/093/188/original/1af1364647d15852.jpg

https://creative.ai/system/media_attachments/files/111/056/523/469/360/119/original/008026183d551c20.jpg

▲ ▼

 📝 Speech2Lip: High-Fidelity Speech to Lip Generation by Learning From a Short Video 🔭

"A decomposition-synthesis-composition framework that disentangles speech-sensitive and speech-insensitive motion/appearance to facilitate effective learning from limited training data, resulting in the generation of natural-looking videos." [gal30b+] 🤖 #CV

⚙️ https://github.com/CVMI-Lab/Speech2Lip
🔗 https://arxiv.org/abs/2309.04814v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/522/383/944/711/original/b30a7dbb2d9aa70c.jpg

https://creative.ai/system/media_attachments/files/111/056/522/446/180/901/original/a188024ea3a620c5.jpg

https://creative.ai/system/media_attachments/files/111/056/522/501/783/897/original/2f4d849a6f31c00d.jpg

https://creative.ai/system/media_attachments/files/111/056/522/548/889/895/original/8d51acc39eaca7b0.jpg

▲ ▼

 📝 Self-Supervised Transformer with Domain Adaptive Reconstruction for General Face Forgery Video Detection 🔭

"Explores to take full advantage of the difference between real and forgery videos by only exploring the common representation of real face videos in a self-supervised manner, and then fine-tuned a linear head on specific face forgery video datasets." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.04795v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/521/365/451/633/original/7593baa986f2cf00.jpg

https://creative.ai/system/media_attachments/files/111/056/521/428/052/958/original/1defd17e883a411d.jpg

https://creative.ai/system/media_attachments/files/111/056/521/492/171/602/original/4804eae545ef9d0f.jpg

https://creative.ai/system/media_attachments/files/111/056/521/561/848/414/original/653c6db5c50fab4a.jpg

▲ ▼

 📝 Latent Degradation Representation Constraint for Single Image Deraining 🔭

"The DAEncoder is proposed to adaptively extract latent degradation representation by using the deformable convolutions to exploit the direction consistency of rain streaks, then the constraint loss is introduced to explicitly constrain the degradation representation learning during training." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.04780v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/520/486/247/161/original/77b0715ff9def370.jpg

https://creative.ai/system/media_attachments/files/111/056/520/581/603/261/original/fd46ea9247b3d0af.jpg

▲ ▼

 📝 Deep Video Restoration for Under-Display Camera 🔭

"Consists of a spatial branch with local-aware transformers, a temporal branch embedded temporal transformers, and a spatial-temporal fusion module (see Fig." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.04752v1 #arxiv

https://creative.ai/system/media_attachments/files/111/056/519/388/092/377/original/cc4b9664cc92d7b3.jpg

https://creative.ai/system/media_attachments/files/111/056/519/453/402/697/original/dbb5fbf08030418c.jpg

https://creative.ai/system/media_attachments/files/111/056/519/514/359/341/original/d2d081bc45fac929.jpg

https://creative.ai/system/media_attachments/files/111/056/519/576/947/988/original/8ff393f3cf264365.jpg

▲ ▼

 📝 Grouping Boundary Proposals for Fast Interactive Image Segmentation 🔭

"The adaptive cut can disconnect the image domain such that the target contours are imposed to pass through this cut only once, and the selected boundary proposals and corresponding minimal paths are used to delineate the target contours." [gal30b+] 🤖 #CV

⚙️ https://github.com/Mirebeau/HamiltonFastMarching
🔗 https://arxiv.org/abs/2309.04169v1 #arxiv

https://creative.ai/system/media_attachments/files/111/048/842/871/817/871/original/92ec913d97ae4b95.jpg

https://creative.ai/system/media_attachments/files/111/048/842/954/083/020/original/67968dc78225252e.jpg

https://creative.ai/system/media_attachments/files/111/048/843/033/001/890/original/089539260256c90b.jpg

https://creative.ai/system/media_attachments/files/111/048/843/206/267/062/original/a6778f6ddc17cbdf.jpg

▲ ▼

 📝 Multimodal Transformer for Material Segmentation 🔭🧠

"Proposes a fusion strategy that can effectively fuse information from different combinations of multiple modalities including RGB, Angle of Linear Polarization (AoLP), Degree of Linear Polarization (DoLP) and Near-Infrared (NIR)." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/csiplab/MMSFormer
🔗 https://arxiv.org/abs/2309.04001v1 #arxiv

https://creative.ai/system/media_attachments/files/111/046/483/477/601/558/original/0d2c63cca4e87cc1.jpg

https://creative.ai/system/media_attachments/files/111/046/483/569/994/878/original/d8770df15a837a86.jpg

https://creative.ai/system/media_attachments/files/111/046/483/649/675/255/original/e91363e8dfc83a04.jpg

▲ ▼

 📝 Adapting Self-Supervised Representations to Multi-Domain Setups 🔭🧠

"The Domain Disentanglement Module (DDM) is a lightweight component that can be used with various self-supervised learning methods to improve their ability to learn generalizable representations when trained on multiple domains." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.03999v1 #arxiv

https://creative.ai/system/media_attachments/files/111/046/306/507/928/638/original/431795262e9add58.jpg

https://creative.ai/system/media_attachments/files/111/046/306/573/607/180/original/4544c609f902feed.jpg

https://creative.ai/system/media_attachments/files/111/046/306/633/177/984/original/49dda3de4cd5e98a.jpg

https://creative.ai/system/media_attachments/files/111/046/306/700/904/662/original/7a821a30d79f2e0e.jpg

▲ ▼

 📝 CDFSL-V: Cross-Domain Few-Shot Learning for Videos 🔭

"Leverages a masked autoencoder-based self-supervised training objective to learn from both source and target data in a self-supervised manner, which can be used to learn generic features from target video data." [gal30b+] 🤖 #CV

⚙️ https://github.com/Sarinda251/CDFSL-V}{https://github.com/Sarinda251/CDFSL-V
🔗 https://arxiv.org/abs/2309.03989v1 #arxiv

https://creative.ai/system/media_attachments/files/111/046/129/470/442/936/original/717f01c193643916.jpg

https://creative.ai/system/media_attachments/files/111/046/129/535/129/359/original/0991de0b1f6d3999.jpg

https://creative.ai/system/media_attachments/files/111/046/129/591/686/740/original/e14f9fc99c4d621e.jpg

https://creative.ai/system/media_attachments/files/111/046/129/665/858/589/original/998addde0749a7e2.jpg

▲ ▼

 📝 UER: A Heuristic Bias Addressing Approach for Online Continual Learning 🧠🔭

"UER learns current samples only by the angle factor and further replays previous samples by both the norm and angle factors to address the bias problem in continual learning, achieving superior performance over various state-of-the-art methods." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/FelixHuiweiLin/UER
🔗 https://arxiv.org/abs/2309.04081v1 #arxiv

https://creative.ai/system/media_attachments/files/111/045/952/629/004/689/original/d5179cc04ed82287.jpg

https://creative.ai/system/media_attachments/files/111/045/952/720/646/142/original/ad195af1315131d5.jpg

https://creative.ai/system/media_attachments/files/111/045/952/795/809/984/original/67e5873d2589a276.jpg

https://creative.ai/system/media_attachments/files/111/045/952/859/453/896/original/4534641e83ccbbec.jpg

▲ ▼

 📝 Improving Resnet-9 Generalization Trained on Small Datasets 🧠🔭

"A combination of various techniques to improve generalization including sharpness aware optimization, label smoothing, gradient centralization, input patch whitening as well as metalearning based training." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.03965v1 #arxiv

https://creative.ai/system/media_attachments/files/111/045/775/604/551/509/original/e7d903714c1065b9.jpg

▲ ▼

 📝 Exploring Sparse MoE in GANs for Text-Conditioned Image Synthesis 🔭

"A mixture-of-experts (MoE) based generative text-to-image (T2I) model that employs a collection of experts to process the feature, together with a sparse router to help select the most suitable expert for each feature point." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03904v1 #arxiv

https://creative.ai/system/media_attachments/files/111/034/940/928/681/904/original/9dffa549f79e198c.jpg

https://creative.ai/system/media_attachments/files/111/034/940/980/698/084/original/7656c69ef7e6947a.jpg

https://creative.ai/system/media_attachments/files/111/034/941/037/667/729/original/a0e80249799bf85a.jpg

https://creative.ai/system/media_attachments/files/111/034/941/114/177/816/original/f4494cc81cf9f690.jpg

▲ ▼

 📝 Tracking Anything with Decoupled Video Segmentation 🔭

"Works by first using a segmentation network on every frame, and the network produces a probability for each pixel to belong to the foreground object or background; then it uses bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03903v1 #arxiv

https://creative.ai/system/media_attachments/files/111/034/763/947/013/361/original/c858974dbdca66a1.jpg

https://creative.ai/system/media_attachments/files/111/034/764/003/651/515/original/77ad67e306bcac9f.jpg

https://creative.ai/system/media_attachments/files/111/034/764/052/979/553/original/3c5b39d8c7bbeade.jpg

https://creative.ai/system/media_attachments/files/111/034/764/113/758/573/original/bd1b8c6176b13bc8.jpg

▲ ▼

 📝 The Making and Breaking of Camouflage 🔭

"Proposes three camouflage scores for measuring camouflage in the feature space, which are used to evaluate existing camouflage datasets and generate a large-scale and challenging dataset for camouflaged instance segmentation." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03899v1 #arxiv

https://creative.ai/system/media_attachments/files/111/034/586/968/631/971/original/2ecc0a38d9a3f6c1.jpg

https://creative.ai/system/media_attachments/files/111/034/587/031/031/926/original/2d66f6d45047885b.jpg

https://creative.ai/system/media_attachments/files/111/034/587/111/426/639/original/dfefcd3790176ca7.jpg

https://creative.ai/system/media_attachments/files/111/034/587/164/524/885/original/92ed9dcc954e161f.jpg

▲ ▼

 📝 ProPainter: Improving Propagation and Transformer for Video Inpainting 🔭

"Introduces a novel video inpainting framework called ProPainter, which involves enhanced propagation mechanism and sparse Transformer for efficient video inpainting, outperforming previous state-of-the-art approaches by a large margin." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03897v1 #arxiv

https://creative.ai/system/media_attachments/files/111/034/410/128/452/966/original/8bfe4c9baeb53d14.jpg

https://creative.ai/system/media_attachments/files/111/034/410/213/140/629/original/c35e407ffa59d9ef.jpg

https://creative.ai/system/media_attachments/files/111/034/410/286/613/053/original/926717049ad0d3c3.jpg

https://creative.ai/system/media_attachments/files/111/034/410/347/795/264/original/bdd6bb0a70348abc.jpg

▲ ▼

 📝 InstructDiffusion: A Generalist Modeling Interface for Vision Tasks 🔭

"Formulates human instructions to a pixel prediction task, where an InstructDiffusion model is trained to predict pixels according to user instructions, such as encircling the man's left shoulder in red or applying a blue mask to the left car." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03895v1 #arxiv

https://creative.ai/system/media_attachments/files/111/034/174/024/504/315/original/14090293f7bd3e8b.jpg

https://creative.ai/system/media_attachments/files/111/034/174/126/303/393/original/2e534adef4024c3c.jpg

https://creative.ai/system/media_attachments/files/111/034/174/182/459/235/original/e0b71e57dff1796a.jpg

https://creative.ai/system/media_attachments/files/111/034/174/239/131/280/original/8b09709e3300955c.jpg

▲ ▼

 📝 Box-Based Refinement for Weakly Supervised and Unsupervised Localization Tasks 🔭

"A box-based detector is trained to predict the location of the phrases in the image, and then applied to the output of the network to improve it further and enhance the localization performance of weakly supervised and unsupervised methods." [gal30b+] 🤖 #CV

⚙️ https://github.com/eyalgomel/box-based-refinement
🔗 https://arxiv.org/abs/2309.03874v1 #arxiv

https://creative.ai/system/media_attachments/files/111/033/997/037/694/441/original/138682114f99bab1.jpg

https://creative.ai/system/media_attachments/files/111/033/997/151/761/534/original/bae2437bf1a77d35.jpg

https://creative.ai/system/media_attachments/files/111/033/997/253/642/406/original/014838c9fe047102.jpg

https://creative.ai/system/media_attachments/files/111/033/997/346/987/780/original/789bb4a417a9128a.jpg

▲ ▼

 📝 Text-to-Feature Diffusion for Audio-Visual Few-Shot Learning 🔭

"AV-DIFF is a text-to-feature diffusion framework, which first fuses the temporal and audio-visual features via cross-modal attention and then generates multi-modal features for the novel classes." [gal30b+] 🤖 #CV

⚙️ https://github.com/ExplainableML/AVDIFF-GFSL
🔗 https://arxiv.org/abs/2309.03869v1 #arxiv

https://creative.ai/system/media_attachments/files/111/033/643/131/039/310/original/5d24bfc58dab4b24.jpg

https://creative.ai/system/media_attachments/files/111/033/643/196/689/261/original/7f7c5da6c6e9ee1d.jpg

https://creative.ai/system/media_attachments/files/111/033/643/253/543/043/original/b357da815d99b949.jpg

▲ ▼

 📝 Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption 🔭

"A novel phasic content fusing few-shot diffusion model with directional distribution consistency loss, which targets different learning objectives at distinct training stages of the diffusion model, is designed." [gal30b+] 🤖 #CV

⚙️ https://github.com/sjtuplayer/few-shot-diffusion
🔗 https://arxiv.org/abs/2309.03729v1 #arxiv

https://creative.ai/system/media_attachments/files/111/033/466/263/091/089/original/bc007d2c7993d5b8.jpg

https://creative.ai/system/media_attachments/files/111/033/466/331/470/681/original/9f8a2827e79e280f.jpg

https://creative.ai/system/media_attachments/files/111/033/466/402/912/892/original/b748d88162c5a2fd.jpg

https://creative.ai/system/media_attachments/files/111/033/466/509/720/121/original/9b860756d3171e3c.jpg

▲ ▼

 📝 Interpretable Visual Question Answering via Reasoning Supervision 🔭

"Based on a transformer-based architecture that leverages reasoning supervision as a supervisory signal to guide the visual attention to important elements of the scene, without requiring explicit grounding annotations." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03726v1 #arxiv

https://creative.ai/system/media_attachments/files/111/033/289/534/828/923/original/8e402443917f84ba.jpg

https://creative.ai/system/media_attachments/files/111/033/289/596/815/937/original/c6bcfea6e262b609.jpg

https://creative.ai/system/media_attachments/files/111/033/289/653/372/205/original/dbe43fba37e1fa0f.jpg

▲ ▼

 📝 Efficient Adaptive Human-Object Interaction Detection with Concept-Guided Memory 🔭

"ADA-CM has two operating modes: (1) training-free and (2) updating a lightweight set of parameters, which can be incorporated with existing HOI detectors." [gal30b+] 🤖 #CV

⚙️ https://github.com/ltttpku/ADA-CM
🔗 https://arxiv.org/abs/2309.03696v1 #arxiv

https://creative.ai/system/media_attachments/files/111/032/994/492/460/885/original/402c8f8c726ac2d0.jpg

https://creative.ai/system/media_attachments/files/111/032/994/566/285/697/original/e04e2f9e179c7a08.jpg

https://creative.ai/system/media_attachments/files/111/032/994/622/718/411/original/0f6ce5cba2bddf9b.jpg

https://creative.ai/system/media_attachments/files/111/032/994/680/971/230/original/5e1e066cc2808d7e.jpg

▲ ▼

 📝 Prompt-Based Context- And Domain-Aware Pretraining for Vision and Language Navigation 🔭

"PANDA consists of a domain-aware stage and a context-aware stage, which performs prompt-based tuning and contrastive learning, respectively, on a pretrained VLN model." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03661v1 #arxiv

https://creative.ai/system/media_attachments/files/111/032/876/375/076/485/original/5de6127b0d273e77.jpg

▲ ▼

 📝 Enhancing Sample Utilization Through Sample Adaptive Augmentation in Semi-Supervised Learning 🔭

"Sample Adaptive Augmentation(SAA) consists of a sample selection module and a sample augmentation module, which helps to optimize the SSL models by giving more attention to naive samples and augmenting them in a more diverse manner." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03598v1 #arxiv

https://creative.ai/system/media_attachments/files/111/032/640/605/821/315/original/f502f7956fc7f4b2.jpg

https://creative.ai/system/media_attachments/files/111/032/640/663/124/706/original/1cec6e348c45dfec.jpg

https://creative.ai/system/media_attachments/files/111/032/640/723/030/516/original/0c5f19a77f32288c.jpg

https://creative.ai/system/media_attachments/files/111/032/640/798/083/675/original/134ab6681c6f42a3.jpg

▲ ▼

 📝 DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions 🔭

"Learns to classify the actual position for each non-overlapping patch among all possible positions solely based on their visual appearance, by minimizing the negative log-likelihood." [gal30b+] 🤖 #CV

⚙️ https://github.com/Haochen-Wang409/DropPos
🔗 https://arxiv.org/abs/2309.03576v1 #arxiv

https://creative.ai/system/media_attachments/files/111/032/522/565/913/890/original/11c6a1cb9ef661b0.jpg

https://creative.ai/system/media_attachments/files/111/032/522/624/635/968/original/04b1a8a529d17e50.jpg

https://creative.ai/system/media_attachments/files/111/032/522/679/132/611/original/d9c5124764e444f6.jpg

▲ ▼

 📝 Region Generation and Assessment Network for Occluded Person Re-Identification 🔭

"RGANet utilizes pre-trained CLIP to locate the human body regions using semantic prototypes extracted from text descriptions, and then it measures the importance of each generated region." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03558v1 #arxiv

https://creative.ai/system/media_attachments/files/111/032/345/600/902/446/original/5efee51d5c7cd6a3.jpg

https://creative.ai/system/media_attachments/files/111/032/345/660/682/863/original/8b73376e95e2a871.jpg

https://creative.ai/system/media_attachments/files/111/032/345/728/283/127/original/07795020ee81ca2d.jpg

▲ ▼

 📝 Trash to Treasure: Low-Light Object Detection via Decomposition-and-Aggregation 🔭

"A newly designed enhancer is introduced as the scene decomposition module, whose removed illumination is exploited as the auxiliary to extract detection-friendly features, and then a semantic aggregation module is established to further integrate multi-scale scene-related semantic information." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03548v1 #arxiv

https://creative.ai/system/media_attachments/files/111/032/109/788/615/868/original/47a6560dcee0e820.jpg

https://creative.ai/system/media_attachments/files/111/032/109/843/280/228/original/4b29a3c09480fcf0.jpg

https://creative.ai/system/media_attachments/files/111/032/109/897/472/344/original/d58b2350c78b2774.jpg

https://creative.ai/system/media_attachments/files/111/032/109/957/644/900/original/c81823edb79986f3.jpg

▲ ▼

 📝 Dynamic Frame Interpolation in Wavelet Domain 🔭

"WaveletVFI uses a lightweight motion perception network to estimate an initial intermediate optical flow, and embeds a threshold classifier in it to learn a dynamic threshold for more computation reduction." [gal30b+] 🤖 #CV

⚙️ https://github.com/ltkong218/WaveletVFI
🔗 https://arxiv.org/abs/2309.03508v1 #arxiv

https://creative.ai/system/media_attachments/files/111/031/814/826/310/225/original/8ef41cd46be972d3.jpg

https://creative.ai/system/media_attachments/files/111/031/814/884/312/460/original/c0a28736e79b3d92.jpg

https://creative.ai/system/media_attachments/files/111/031/814/935/252/356/original/b97d59aad64257a7.jpg

https://creative.ai/system/media_attachments/files/111/031/814/982/873/570/original/5362afe9919bbbf3.jpg

▲ ▼

 📝 DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing Using Determiners 🔭

"DetermiNet provides 250,000 synthetically generated images and captions with ground truth bounding boxes for the objects of interest in images based on 25 determiners." [gal30b+] 🤖 #CV

⚙️ https://github.com/clarence-lee-sheng/
🔗 https://arxiv.org/abs/2309.03483v1 #arxiv

https://creative.ai/system/media_attachments/files/111/031/578/740/791/179/original/9884ff5fae9278e1.jpg

https://creative.ai/system/media_attachments/files/111/031/578/799/868/454/original/d990e935a6457fcf.jpg

https://creative.ai/system/media_attachments/files/111/031/578/852/717/302/original/c6a777edb7634dd8.jpg

https://creative.ai/system/media_attachments/files/111/031/578/913/580/707/original/04a6bcfafaf749fc.jpg

▲ ▼

 📝 Temporal Collection and Distribution for Referring Video Object Segmentation 🔭

"Given a video sequence, the proposed framework simultaneously maintains a global referent token and a sequence of object queries across the frames, where the former is responsible for capturing video-level referent according to the language expression, while the latter serves to better locate and segment objects with each frame." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03473v1 #arxiv

https://creative.ai/system/media_attachments/files/111/031/342/723/287/204/original/11b932ef425b11d5.jpg

https://creative.ai/system/media_attachments/files/111/031/342/782/780/419/original/a079f158e003156d.jpg

https://creative.ai/system/media_attachments/files/111/031/342/844/632/914/original/5012c407119ef14d.jpg

https://creative.ai/system/media_attachments/files/111/031/342/922/210/999/original/6022d922f2835c7a.jpg

▲ ▼

 📝 Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation 🔭

"The proposed generative scanpath representation (GSR), which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition, provides a global overview of gazed-focused contents derived from scanpaths." [gal30b+] 🤖 #CV

⚙️ https://github.com/xiangjieSui/GSR
🔗 https://arxiv.org/abs/2309.03472v1 #arxiv

https://creative.ai/system/media_attachments/files/111/031/106/980/382/563/original/7c94c2e82751d8d6.jpg

https://creative.ai/system/media_attachments/files/111/031/107/058/787/713/original/c19770d0a3196cc8.jpg

https://creative.ai/system/media_attachments/files/111/031/107/118/775/541/original/f6d86c07da9c6bdc.jpg

https://creative.ai/system/media_attachments/files/111/031/107/204/547/175/original/fb92cd08a807cb71.jpg

▲ ▼

 📝 Multi-Modality Guidance Network for Missing Modality Inference 🔭🧠

"Proposes a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models for inference on scenarios with missing modalities." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.03452v1 #arxiv

https://creative.ai/system/media_attachments/files/111/030/871/032/095/015/original/5c727f2b4f7b0946.jpg

https://creative.ai/system/media_attachments/files/111/030/871/103/854/683/original/94171f704f54bbef.jpg

▲ ▼

 📝 Distribution-Aware Prompt Tuning for Vision-Language Models 🔭

"Distribution-aware prompt tuning maximizes inter-dispersion as well as minimizing intra-dispersion between embeddings of two modalities in the latent space, which leads to effective feature space alignment between them." [gal30b+] 🤖 #CV

⚙️ https://github.com/mlvlab/DAPT
🔗 https://arxiv.org/abs/2309.03406v1 #arxiv

https://creative.ai/system/media_attachments/files/111/030/517/184/916/921/original/8692584a704f15a9.jpg

https://creative.ai/system/media_attachments/files/111/030/517/250/776/129/original/dfec5786437db52c.jpg

https://creative.ai/system/media_attachments/files/111/030/517/339/234/885/original/915943e96c7c2af5.jpg

https://creative.ai/system/media_attachments/files/111/030/517/430/668/653/original/f0fb1367a07c4700.jpg

▲ ▼

 📝 Reasonable Anomaly Detection in Long Sequences 🔭

"A Stacked State Machine model is proposed to represent the temporal dependencies which are consistent across long-range observations and functions in predicting future states based on past ones, the divergence between the predictions with inherent normal patterns and observed ones determines anomalies." [gal30b+] 🤖 #CV

⚙️ https://github.com/AllenYLJiang/Anomaly-Detection-in-Sequences
🔗 https://arxiv.org/abs/2309.03401v1 #arxiv

https://creative.ai/system/media_attachments/files/111/030/399/189/667/281/original/7003e434d8afd75b.jpg

▲ ▼

 📝 Active Shooter Detection and Robust Tracking Utilizing Supplemental Synthetic Data 🔭

"Uses domain randomization and transfer learning to allow for the effective training of YOLOv8 using synthetic data generated with Unreal Engine, which is then used to detect shooters in video streams." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03381v1 #arxiv

https://creative.ai/system/media_attachments/files/111/030/045/137/576/297/original/ff124ebed3c4c514.jpg

https://creative.ai/system/media_attachments/files/111/030/045/207/131/058/original/ee511ad705dd3f86.jpg

https://creative.ai/system/media_attachments/files/111/030/045/292/639/986/original/5ec91483ae8ffac4.jpg

https://creative.ai/system/media_attachments/files/111/030/045/366/533/145/original/46bd91b70b0a6b8b.jpg

▲ ▼

 📝 ViewMix: Augmentation for Robust Representation in Self-Supervised Learning 🔭🧠

"Cut and paste patches from one view to another and create different views of the same image to form positive pairs, and the network is trained to maximize the agreement between positive pairs while minimizing the agreement between negative pairs." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.03360v1 #arxiv

https://creative.ai/system/media_attachments/files/111/029/868/345/026/922/original/8fb681072c38509f.jpg

https://creative.ai/system/media_attachments/files/111/029/868/394/954/621/original/02a525bfdd0dfb09.jpg

https://creative.ai/system/media_attachments/files/111/029/868/455/432/545/original/c7a51405a244586b.jpg

Notes by 9a622e93 | export