Oddbean new post about | logout

Notes by 9a622e93 | export

 📝 PrototypeFormer: Learning to Explore Prototype Relationships for Few-Shot Image Classification 🔭

"PrototypeFormer uses a transformer to build a prototype extraction module and then applies contrastive loss to optimize prototypes for better feature representation in few-shot image classification tasks." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03517v1 #arxiv

https://creative.ai/system/media_attachments/files/111/192/828/640/470/676/original/9da50a018d7558a8.jpg

https://creative.ai/system/media_attachments/files/111/192/828/691/691/088/original/20f2f0508d7e9178.jpg

https://creative.ai/system/media_attachments/files/111/192/828/766/900/454/original/da7c3cc171a413b6.jpg

https://creative.ai/system/media_attachments/files/111/192/828/816/860/611/original/88f67fd914278e3a.jpg 
 📝 Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery 🔭

"Uses the Self-Distillation with No Labels (DINO) method for SAR image classification on unlabeled data and then fine-tune the pre-trained model on the labeled data to predict land cover maps." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03513v1 #arxiv

https://creative.ai/system/media_attachments/files/111/192/356/737/498/497/original/7b64211778b0216c.jpg

https://creative.ai/system/media_attachments/files/111/192/356/831/822/561/original/68fd262679ed9b3a.jpg

https://creative.ai/system/media_attachments/files/111/192/356/891/690/060/original/fb5be8e94a45a541.jpg

https://creative.ai/system/media_attachments/files/111/192/356/955/149/258/original/0041243aa15ac627.jpg 
 📝 Kandinsky: An Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion 🔭

"Kandinsky1 is an exploration of latent diffusion architecture, with a modified MoVQ implementation serving as the image autoencoder component and image prior model trained to map text to image embedding from a pre-trained CLIP model." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03502v1 #arxiv

https://creative.ai/system/media_attachments/files/111/191/825/938/789/398/original/1c3887def543d83a.jpg

https://creative.ai/system/media_attachments/files/111/191/825/992/525/152/original/7d279e0462f5a271.jpg

https://creative.ai/system/media_attachments/files/111/191/826/050/820/826/original/b6e4512875ea76db.jpg

https://creative.ai/system/media_attachments/files/111/191/826/103/506/646/original/7014e413febb3159.jpg 
 📝 Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization 🔭🧠

"MRAV-FF fuses audio-visual features across different temporal resolutions using a hierarchical gated cross-attention mechanism that weighs the importance of audio information at diverse temporal scales." [gal30b+] 🤖 #CV #LG #MM

🔗 https://arxiv.org/abs/2310.03456v1 #arxiv

https://creative.ai/system/media_attachments/files/111/191/177/184/045/779/original/605abc8ce63e34db.jpg

https://creative.ai/system/media_attachments/files/111/191/177/248/872/695/original/18dffdce61fc1611.jpg 
 📝 Denoising Diffusion Step-Aware Models 🔭

"DDSM employs a spectrum of neural networks whose sizes are adapted to the importance of each generative step, determined through evolutionary search (Fig 1), thus effectively circumventing redundant computational efforts, particularly in the less critical steps." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03337v1 #arxiv

https://creative.ai/system/media_attachments/files/111/190/705/125/838/498/original/94c31b372081b370.jpg

https://creative.ai/system/media_attachments/files/111/190/705/190/855/221/original/4a9a0dd9f441c7d5.jpg

https://creative.ai/system/media_attachments/files/111/190/705/247/663/314/original/9075bdaaf88afb60.jpg

https://creative.ai/system/media_attachments/files/111/190/705/307/077/330/original/72025607a3f315d4.jpg 
 📝 Investigating the Limitation of CLIP Models: The Worst-Performing Categories 🔭🧠

"Proposes the Class-wise Matching Margin (\cmm) to measure the inference confusion and find the worst-performing categories of CLIP models without any manual prompt engineering, laborious optimization, or access to labeled validation data." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/openai/CLIP
🔗 https://arxiv.org/abs/2310.03324v1 #arxiv

https://creative.ai/system/media_attachments/files/111/190/292/457/863/039/original/5685dbbeaab8ee65.jpg

https://creative.ai/system/media_attachments/files/111/190/292/519/546/649/original/9e41141bf3007451.jpg

https://creative.ai/system/media_attachments/files/111/190/292/579/494/295/original/0f492461e37f9023.jpg

https://creative.ai/system/media_attachments/files/111/190/292/647/778/170/original/643da9649d77e274.jpg 
 📝 Can Pre-Trained Models Assist in Dataset Distillation? 🔭

"Pre-trained Models transfer knowledge to synthetic datasets to guide Dataset Distillation accurately by selecting optimal options, including initialization parameters, model architecture, training epoch and domain knowledge." [gal30b+] 🤖 #CV

⚙️ https://github.com/yaolu-zjut/DDInterpreter
🔗 https://arxiv.org/abs/2310.03295v1 #arxiv

https://creative.ai/system/media_attachments/files/111/189/643/534/842/220/original/76810c93edcc7e6f.jpg

https://creative.ai/system/media_attachments/files/111/189/643/590/016/456/original/8b76124ac901d4af.jpg

https://creative.ai/system/media_attachments/files/111/189/643/646/144/256/original/94e004ae4734a57e.jpg 
 📝 SimVLG: Simple and Efficient Pretraining of Visual Language Generative Models 🔭

"SimVLG is a streamlined framework for the pre-training of computationally intensive vision-language generative models, leveraging frozen pre-trained large language models (LLMs)." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03291v1 #arxiv

https://creative.ai/system/media_attachments/files/111/189/171/719/622/132/original/9baa6b1e7364a804.jpg

https://creative.ai/system/media_attachments/files/111/189/171/786/110/671/original/9351db94426fbce3.jpg

https://creative.ai/system/media_attachments/files/111/189/171/843/593/607/original/5650d619f8e63b15.jpg

https://creative.ai/system/media_attachments/files/111/189/171/916/171/613/original/707ae4d63a18fbc2.jpg 
 📝 Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning 🔭🧠

"Works by maximizing the attention mask of the image region best represented by a single latent vector corresponding to the attention mask, and by minimizing reconstruction loss between the input image and decoded component image." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2310.03273v1 #arxiv

https://creative.ai/system/media_attachments/files/111/188/699/861/727/514/original/843cfd6f45596084.jpg

https://creative.ai/system/media_attachments/files/111/188/699/914/071/258/original/97fe449a2f2c06b3.jpg

https://creative.ai/system/media_attachments/files/111/188/699/964/511/039/original/5f3583398545f665.jpg

https://creative.ai/system/media_attachments/files/111/188/700/017/124/899/original/a7aa3da631cdfd2c.jpg 
 📝 EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models 🔭

"Leverages a quantization-aware variant of the low-rank adapter to transfer the denoising capabilities of full-precision models to their quantized counterparts, eliminating the need for training data." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03270v1 #arxiv

https://creative.ai/system/media_attachments/files/111/188/346/006/898/454/original/3eca0a9d35fac333.jpg

https://creative.ai/system/media_attachments/files/111/188/346/066/093/680/original/054b3e5676f82ae8.jpg

https://creative.ai/system/media_attachments/files/111/188/346/123/754/211/original/463fe94342dc5f88.jpg

https://creative.ai/system/media_attachments/files/111/188/346/190/353/632/original/4e3da9afeb7f955e.jpg 
 📝 Reinforcement Learning-Based Mixture of Vision Transformers for Video Violence Recognition 🔭

"The proposed transformer-based Mixture of Experts (MoE) video violence recognition system consists of two main modules: (1) a backbone network and (2) an intelligent router." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.03108v1 #arxiv

https://creative.ai/system/media_attachments/files/111/187/992/126/163/637/original/916b2727e08f4532.jpg

https://creative.ai/system/media_attachments/files/111/187/992/196/317/774/original/ad0df19c5f24a731.jpg

https://creative.ai/system/media_attachments/files/111/187/992/251/302/816/original/d726792826f2e237.jpg

https://creative.ai/system/media_attachments/files/111/187/992/323/054/661/original/1987f19a11b28f6a.jpg 
 📝 OMG-ATTACK: Self-Supervised on-Manifold Generation of Transferable Evasion Attacks 🧠🔭

"A self-supervised, computationally economical method for generating adversarial examples that are more related to the data rather than the model it was attacking, making it more transferable." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2310.03707v1 #arxiv

https://creative.ai/system/media_attachments/files/111/187/461/274/214/678/original/469c60333fcd701d.jpg

https://creative.ai/system/media_attachments/files/111/187/461/325/394/900/original/95b84c3b45231949.jpg

https://creative.ai/system/media_attachments/files/111/187/461/378/225/273/original/afb0f768ceef5d6d.jpg

https://creative.ai/system/media_attachments/files/111/187/461/434/666/260/original/9ce78ff2ee9455bc.jpg 
 📝 Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising 🧠🔭

"Proposes a new method to efficiently learn a regularizer parametrized by a deep neural net (DNN) using stochastic gradient descent (SGD) on the compressed version of the training database." [gal30b+] 🤖 #LG #CV #IT

🔗 https://arxiv.org/abs/2310.03085v1 #arxiv

https://creative.ai/system/media_attachments/files/111/186/989/139/851/828/original/c6e7f9975c7f01e4.jpg

https://creative.ai/system/media_attachments/files/111/186/989/216/380/404/original/b8a3ae46827cc94e.jpg

https://creative.ai/system/media_attachments/files/111/186/989/294/629/473/original/5cbcbf5fdcddd4aa.jpg

https://creative.ai/system/media_attachments/files/111/186/989/343/770/673/original/5d63d7cfcb4c0e99.jpg 
 📝 Decoding Human Activities: Analyzing Wearable Accelerometer and Gyroscope Data for Activity Recognition 🔭🧠

"A stratified multi-structural approach based on a Residual network (ResNet) ensembled with Residual MobileNet, termed as FusionActNet, is proposed for classifying different activities based on the unique features of the human body's static and dynamic movements." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2310.02011v1 #arxiv

https://creative.ai/system/media_attachments/files/111/184/807/809/059/792/original/344f61c967c5ff90.jpg

https://creative.ai/system/media_attachments/files/111/184/807/859/395/183/original/f0682d9feccb1188.jpg

https://creative.ai/system/media_attachments/files/111/184/807/910/424/757/original/063e7209fa44e30e.jpg

https://creative.ai/system/media_attachments/files/111/184/807/963/780/403/original/fabbd060d43287f8.jpg 
 📝 Development of Machine Vision Approach for Mechanical Component Identification Based on Its Dimension and Pitch 🔭

"Uses a Raspberry Pi Camera, Raspberry Pi 4, and some open-source computer vision libraries to calculate the required features of the bolts used in the assembly line." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01995v1 #arxiv

https://creative.ai/system/media_attachments/files/111/184/395/047/370/875/original/aedc947bbdba4036.jpg

https://creative.ai/system/media_attachments/files/111/184/395/100/192/184/original/7d5614edd7befd8b.jpg

https://creative.ai/system/media_attachments/files/111/184/395/154/321/277/original/52ceab819d671050.jpg

https://creative.ai/system/media_attachments/files/111/184/395/209/261/141/original/a247d4728aa097fa.jpg 
 📝 Understanding Masked Autoencoders From a Local Contrastive Perspective 🔭

"Explores a new perspective to explain what truly contributes to the "rich hidden representations inside the MAE" and reformulate the reconstruction based MAE into a local-contrastive version." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01994v1 #arxiv

https://creative.ai/system/media_attachments/files/111/183/687/327/880/081/original/d83da1624e852b32.jpg

https://creative.ai/system/media_attachments/files/111/183/687/385/427/934/original/05f83c3be3d4b93b.jpg

https://creative.ai/system/media_attachments/files/111/183/687/436/681/313/original/7a9911234b981a23.jpg

https://creative.ai/system/media_attachments/files/111/183/687/490/953/173/original/d794a5262111dd00.jpg 
 📝 A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection 🔭

"Designs a multi-level feature extractor to effectively fuse multi-level features and introduce aggregate connections to fuse them, which uses the pre-trained model to extract multi-level features from bi-temporal images and introduce aggregate connections to fuse them." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01876v1 #arxiv

https://creative.ai/system/media_attachments/files/111/181/622/982/401/125/original/905c460d4089b76f.jpg

https://creative.ai/system/media_attachments/files/111/181/623/060/041/577/original/e2891f3d81cc6f51.jpg

https://creative.ai/system/media_attachments/files/111/181/623/109/918/573/original/a1b6883e6ae51fcc.jpg

https://creative.ai/system/media_attachments/files/111/181/623/166/271/893/original/e5138613fd3f33b1.jpg 
 📝 Selective Feature Adapter for Dense Vision Transformers 🔭

"Selective feature adapter (SFA), consisting of external adapters and internal adapters, are sequentially operated over a transformer model to achieve SoTA performance under any given budget of trainable parameters." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01843v1 #arxiv

https://creative.ai/system/media_attachments/files/111/180/915/121/545/638/original/133538217b46bd77.jpg

https://creative.ai/system/media_attachments/files/111/180/915/177/115/975/original/37b96deddde5b252.jpg

https://creative.ai/system/media_attachments/files/111/180/915/241/602/752/original/1d0f13f960507c11.jpg

https://creative.ai/system/media_attachments/files/111/180/915/298/537/583/original/7315e68b612b7182.jpg 
 📝 SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-Based Question Answering 🔭

"Extracts a scene graph from an input image using a pre-trained scene graph generator and employ semantically-preserving augmentation with self-supervised techniques to learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01842v1 #arxiv

https://creative.ai/system/media_attachments/files/111/180/620/217/984/672/original/7f0291575de91f63.jpg

https://creative.ai/system/media_attachments/files/111/180/620/267/978/206/original/115b99c9a9b59793.jpg

https://creative.ai/system/media_attachments/files/111/180/620/321/579/143/original/8c0cf68cc1e716da.jpg

https://creative.ai/system/media_attachments/files/111/180/620/373/730/057/original/927c7b1e58ea8b11.jpg 
 📝 Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes 🔭

"Proposes to train a reconstruction network under the supervision of two complementary components, which are estimated using multi-exposure images and focus on HDR color as well as structure, respectively." [gal30b+] 🤖 #CV

⚙️ https://github.com/cszhilu1998/SelfHDR
🔗 https://arxiv.org/abs/2310.01840v1 #arxiv

https://creative.ai/system/media_attachments/files/111/180/207/166/174/282/original/b9caa97ad6205be0.jpg

https://creative.ai/system/media_attachments/files/111/180/207/237/100/816/original/7ce231b6e7cc6de0.jpg

https://creative.ai/system/media_attachments/files/111/180/207/302/806/318/original/ba950c8a45b97132.jpg

https://creative.ai/system/media_attachments/files/111/180/207/350/641/577/original/b92d58aad61358d2.jpg 
 📝 Skin the Sheep Not Only Once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow 🔭

"Proposes to leverage the geometric connection between optical flow estimation and stereo matching (based on the similarity upon finding pixel correspondences across images) to unify various real-world depth estimation datasets for generating supervised training data upon optical flow." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01833v1 #arxiv

https://creative.ai/system/media_attachments/files/111/179/853/293/180/756/original/cf85f1a8b3f62b65.jpg

https://creative.ai/system/media_attachments/files/111/179/853/362/561/242/original/f1fa4644dee3b3d4.jpg

https://creative.ai/system/media_attachments/files/111/179/853/424/422/597/original/ad35be98161dbeda.jpg

https://creative.ai/system/media_attachments/files/111/179/853/475/854/935/original/0de7c55a93b5a060.jpg 
 📝 AI-Generated Images as Data Source: The Dawn of Synthetic Era 🔭

"Explores the innovative concept of leveraging these AI generated images as a new data source, reshaping traditional model paradigms in visual intelligence, from training machine learning models to simulating scenarios for modeling, testing, and validation." [gal30b+] 🤖 #CV

⚙️ https://github.com/mwxely/AIGS
🔗 https://arxiv.org/abs/2310.01830v1 #arxiv

https://creative.ai/system/media_attachments/files/111/179/676/521/141/076/original/167f46aa54004eab.jpg

https://creative.ai/system/media_attachments/files/111/179/676/611/809/300/original/822ac1baaf7823a0.jpg

https://creative.ai/system/media_attachments/files/111/179/676/694/029/675/original/7a9fce2e813a1400.jpg

https://creative.ai/system/media_attachments/files/111/179/676/758/778/824/original/6b2753a7c4fd2bc5.jpg 
 📝 Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation 🔭

"Proposes a swapping mechanism and an acceptable region for sampling high-quality object images from a new image pool generated by using randomly exchanging column vectors of two text embeddings." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01819v1 #arxiv

https://creative.ai/system/media_attachments/files/111/179/086/691/077/277/original/bfeca70cb5210a35.jpg

https://creative.ai/system/media_attachments/files/111/179/086/789/309/475/original/e64962e37c41e9b0.jpg

https://creative.ai/system/media_attachments/files/111/179/086/878/891/776/original/0b24cb9f06ef9fa5.jpg

https://creative.ai/system/media_attachments/files/111/179/086/946/958/459/original/6a59b1c5abd6a7af.jpg 
 📝 PPT: Token Pruning and Pooling for Efficient Vision Transformers 🔭

"By heuristically integrating both token pruning and token pooling techniques in ViTs without additional trainable parameters, PPT reduces the model complexity while maintaining its predictive accuracy on vision tasks." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.01812v1 #arxiv

https://creative.ai/system/media_attachments/files/111/178/673/754/572/961/original/bcab5ebce41584ae.jpg

https://creative.ai/system/media_attachments/files/111/178/673/803/345/270/original/58d91cb6ec19e128.jpg

https://creative.ai/system/media_attachments/files/111/178/673/875/923/314/original/2ee27515f00eb146.jpg

https://creative.ai/system/media_attachments/files/111/178/673/947/666/063/original/ab1d69598b881459.jpg 
 📝 HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption 🔭

"$\textit{CCEval}$ is a GPT-4 assisted method that can assess the detailed captioning capability of large vision-language model (LVLM)." [gal30b+] 🤖 #CV

⚙️ https://github.com/haotian-liu/LLaVA
🔗 https://arxiv.org/abs/2310.01779v1 #arxiv

https://creative.ai/system/media_attachments/files/111/178/437/816/986/283/original/21dc7386dc43cc55.jpg

https://creative.ai/system/media_attachments/files/111/178/437/871/851/904/original/b3d8ec44a43d93f1.jpg 
 📝 Direct Inversion: Boosting Diffusion-Based Editing with 3 Lines of Code 🔭

"By disentangling the source and target diffusion branches, direct inversion achieves optimal performance of both branches with just three lines of code, achieving state-of-the-art inversion quality and edit fidelity in real-time." [gal30b+] 🤖 #CV

⚙️ https://github.com/cure-lab/DirectInversion
🔗 https://arxiv.org/abs/2310.01506v1 #arxiv

https://creative.ai/system/media_attachments/files/111/177/553/147/016/422/original/04ef3d2714a0a5c0.jpg

https://creative.ai/system/media_attachments/files/111/177/553/243/127/251/original/d781ee52c2a6334b.jpg

https://creative.ai/system/media_attachments/files/111/177/553/344/266/478/original/56aab49dd777ff44.jpg

https://creative.ai/system/media_attachments/files/111/177/553/445/353/200/original/fc190051156b19f1.jpg 
 📝 Generative Autoencoding of Dropout Patterns 🧠🔭

"A unique dropout pattern is assigned to each data point in the training dataset, then an autoencoder is trained to reconstruct the corresponding data point using this pattern as information to be encoded." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/mseitzer/pytorch-fid
🔗 https://arxiv.org/abs/2310.01712v1 #arxiv

https://creative.ai/system/media_attachments/files/111/176/373/173/131/341/original/20368636c07eafe2.jpg 
 📝 Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis 🔭

"PnP-ADMM works by integrating a data-fidelity term and an image prior in an iterative algorithm for solving imaging inverse problems, such as denoising, deconvolution and image super-resolution." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.00133v1 #arxiv

https://creative.ai/system/media_attachments/files/111/175/726/025/200/525/original/81fe7f64213b28cd.jpg

https://creative.ai/system/media_attachments/files/111/175/726/078/333/290/original/e2cfd286f6b73c3b.jpg

https://creative.ai/system/media_attachments/files/111/175/726/150/491/355/original/78fd439828b3d3a4.jpg

https://creative.ai/system/media_attachments/files/111/175/726/222/370/303/original/7f6a14283ab420c2.jpg 
 📝 Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation 🔭

"Uses unlabeled images with multi-view augmentations to generate reliable target pseudo-heatmaps using a denoising scheme and a threshold-and-refine process, and selects reliable targets from this pool using cross-student uncertainty." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.00099v1 #arxiv

https://creative.ai/system/media_attachments/files/111/175/312/955/002/964/original/06ce1b9887585040.jpg

https://creative.ai/system/media_attachments/files/111/175/313/011/815/725/original/d6959500807fc2fb.jpg

https://creative.ai/system/media_attachments/files/111/175/313/070/198/209/original/a9568609db1b3a4d.jpg

https://creative.ai/system/media_attachments/files/111/175/313/122/696/868/original/d598dc7806aa754c.jpg 
 📝 Towards Few-Call Model Stealing via Active Self-Paced Knowledge Distillation and Diffusion-Based Image Generation 🔭🧠

"Proposes the following framework: Creates a synthetic data set (called proxy data set) by leveraging the ability of diffusion models to generate realistic and diverse images." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2310.00096v1 #arxiv

https://creative.ai/system/media_attachments/files/111/175/018/033/153/247/original/c778aee133e51129.jpg

https://creative.ai/system/media_attachments/files/111/175/018/128/612/344/original/3131f39a93f3e595.jpg

https://creative.ai/system/media_attachments/files/111/175/018/186/390/083/original/f1f4eb231a037dd5.jpg

https://creative.ai/system/media_attachments/files/111/175/018/239/466/513/original/43260547b93ad73e.jpg 
 📝 DataDAM: Efficient Dataset Distillation with Attention Matching 🔭🧠

"Trains a model that can generate images that match spatial attention maps of images from other datasets across various model architectures and layers in the network, achieving state-of-the-art performance." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2310.00093v1 #arxiv

https://creative.ai/system/media_attachments/files/111/174/605/187/487/498/original/33358adb3b015b8d.jpg

https://creative.ai/system/media_attachments/files/111/174/605/328/678/566/original/b0e5fcba4da00697.jpg

https://creative.ai/system/media_attachments/files/111/174/605/440/487/506/original/303f3255cbe6d736.jpg

https://creative.ai/system/media_attachments/files/111/174/605/540/644/757/original/a581616616604b8f.jpg 
 📝 Prompt-Enhanced Self-Supervised Representation Learning for Remote Sensing Image Understanding 🔭

"A reconstructive prompt that uses original image patches as a template and a prompt-enhanced generative branch that provides contextual information through semantic consistency constraints are used for self-supervised representation learning on remote sensing images." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.00022v1 #arxiv

https://creative.ai/system/media_attachments/files/111/173/897/251/548/948/original/c03c33a06d0060ba.jpg

https://creative.ai/system/media_attachments/files/111/173/897/313/358/109/original/7addc8f8f5d72f3f.jpg

https://creative.ai/system/media_attachments/files/111/173/897/386/758/601/original/dc265a177978a8d8.jpg

https://creative.ai/system/media_attachments/files/111/173/897/432/715/538/original/939fc5c57e6f622a.jpg 
 📝 Joint Self-Supervised Depth and Optical Flow Estimation Towards Dynamic Objects 🔭

"A joint depth and optical flow estimation framework, which predicts depths in various motions by minimizing pixel wrap errors in bilateral photometric re-projections and optical flow, is proposed." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.00011v1 #arxiv

https://creative.ai/system/media_attachments/files/111/173/543/459/139/348/original/350013500ce7a95c.jpg

https://creative.ai/system/media_attachments/files/111/173/543/516/516/037/original/2f94d7ed5c9807ad.jpg

https://creative.ai/system/media_attachments/files/111/173/543/579/086/831/original/422ff90114d8771e.jpg

https://creative.ai/system/media_attachments/files/111/173/543/636/241/676/original/03c7d685c5c712ca.jpg 
 📝 Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition 🔭

"Decomposes the multi-source audio semantics into single-source semantics, allowing for more effective interaction with visual content, and propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several quantized single-source semantics." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2310.00132v1 #arxiv

https://creative.ai/system/media_attachments/files/111/173/243/452/040/051/original/3fbffe68d7029205.jpg

https://creative.ai/system/media_attachments/files/111/173/243/521/841/528/original/8253de4c189c2f14.jpg

https://creative.ai/system/media_attachments/files/111/173/243/574/515/152/original/02ae92db34b93233.jpg

https://creative.ai/system/media_attachments/files/111/173/243/646/882/862/original/8052ee811b5d50ab.jpg 
 📝 Towards Free Data Selection with General-Purpose Models 🔭🧠

"Defines semantic patterns extracted from intermediate features of the general-purpose model to capture subtle local information in each image, and enable the selection of all data samples in a single pass through distance-based sampling at the fine-grained semantic pattern level." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/yichen928/FreeSel
🔗 https://arxiv.org/abs/2309.17342v1 #arxiv

https://creative.ai/system/media_attachments/files/111/172/758/940/976/946/original/c0216fdfd0182a29.jpg

https://creative.ai/system/media_attachments/files/111/172/758/997/650/123/original/0697fd2be43f4686.jpg

https://creative.ai/system/media_attachments/files/111/172/759/051/113/449/original/9cb432042688e25a.jpg

https://creative.ai/system/media_attachments/files/111/172/759/103/174/600/original/79be92be988eedf7.jpg 
 📝 When Epipolar Constraint Meets Non-Local Operators in Multi-View Stereo 🔭

"An Epipolar Transformer performs non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines, reducing the 2D search space into the epipolar line." [gal30b+] 🤖 #CV

⚙️ https://github.com/TQTQliu/ET-MVSNet
🔗 https://arxiv.org/abs/2309.17218v1 #arxiv

https://creative.ai/system/media_attachments/files/111/171/520/252/671/750/original/18c48d3792448204.jpg

https://creative.ai/system/media_attachments/files/111/171/520/315/336/387/original/d1190c4a5d77bc63.jpg

https://creative.ai/system/media_attachments/files/111/171/520/368/967/379/original/e4a47df90f4a9d97.jpg

https://creative.ai/system/media_attachments/files/111/171/520/421/130/775/original/146934126d88d96b.jpg 
 📝 Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images with Improved Loss Function Combination 🔭

"A novel loss function and carefully selected existing loss functions are tailored to address the challenges specific to histology images, such as tissue structure and cell morphology, to enhance adaptation performance in the histology domain." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17172v1 #arxiv

https://creative.ai/system/media_attachments/files/111/170/635/542/425/599/original/2a14cd666c205aa9.jpg

https://creative.ai/system/media_attachments/files/111/170/635/598/898/516/original/2dbe8193c55e9ef0.jpg

https://creative.ai/system/media_attachments/files/111/170/635/656/342/256/original/0901f092e38d6781.jpg

https://creative.ai/system/media_attachments/files/111/170/635/714/610/208/original/e40ee45828f2a4a1.jpg 
 📝 Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling 🔭

"An Action General-Specific Graph is developed to learn and decouple the action-general and action-specific knowledge so that the task-consistent score-discriminative features can be better extracted across various tasks." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17105v1 #arxiv

https://creative.ai/system/media_attachments/files/111/170/340/670/493/076/original/6281d8f9264c9f44.jpg

https://creative.ai/system/media_attachments/files/111/170/340/739/030/777/original/b9c78e5a840dcd0d.jpg

https://creative.ai/system/media_attachments/files/111/170/340/796/631/124/original/eff65da4b86ac450.jpg

https://creative.ai/system/media_attachments/files/111/170/340/850/607/585/original/856b964a345c8051.jpg 
 📝 Guiding Instruction-Based Image Editing via Multimodal Large Language Models 🔭

"Derives expressive instructions and provides explicit guidance to guide editing models to capture the visual imagination of natural instructions and manipulate images accordingly through end-to-end training." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17102v1 #arxiv

https://creative.ai/system/media_attachments/files/111/170/104/606/630/314/original/63ee3dff42c1e5b1.jpg

https://creative.ai/system/media_attachments/files/111/170/104/695/562/706/original/f7ebb56cd937ca98.jpg

https://creative.ai/system/media_attachments/files/111/170/104/748/431/672/original/c761273927676c1e.jpg

https://creative.ai/system/media_attachments/files/111/170/104/794/734/595/original/51058f54c19e2d72.jpg 
 📝 Prototype-Based Aleatoric Uncertainty Quantification for Cross-Modal Retrieval 🔭

"Proposes a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity." [gal30b+] 🤖 #CV

⚙️ https://github.com/leolee99/PAU
🔗 https://arxiv.org/abs/2309.17093v1 #arxiv

https://creative.ai/system/media_attachments/files/111/169/750/782/934/366/original/7610d236613e48b2.jpg

https://creative.ai/system/media_attachments/files/111/169/750/844/598/621/original/7502da3207060b58.jpg

https://creative.ai/system/media_attachments/files/111/169/750/898/280/807/original/b4868bee2721db40.jpg

https://creative.ai/system/media_attachments/files/111/169/750/954/374/091/original/00827dbf5f7c6c2a.jpg 
 📝 SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning 🔭

"SegRCDB is based on insights about what is important in pre-training for semantic segmentation and allows efficient pre-training, which achieved higher mIoU than the pre-training with COCO-Stuff for fine-tuning on ADE-20k and Cityscapes with the same training images." [gal30b+] 🤖 #CV

⚙️ https://github.com/dahlian00/SegRCDB
🔗 https://arxiv.org/abs/2309.17083v1 #arxiv

https://creative.ai/system/media_attachments/files/111/169/514/874/264/581/original/f30004ea7d1cd269.jpg

https://creative.ai/system/media_attachments/files/111/169/514/933/714/417/original/6225ac3ff0a0453a.jpg

https://creative.ai/system/media_attachments/files/111/169/515/006/864/470/original/7a2a40311e43dab1.jpg

https://creative.ai/system/media_attachments/files/111/169/515/062/131/451/original/aab5f2a098c6e83f.jpg 
 📝 DeeDiff: Dynamic Uncertainty-Aware Early Exiting for Accelerating Diffusion Model Generation 🔭

"Proposes DeeDiff, an early exiting framework for diffusion models by introducing an uncertainty estimation module (UEM), which is attached to each intermediate layer to estimate the prediction uncertainty of each layer." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17074v1 #arxiv

https://creative.ai/system/media_attachments/files/111/169/278/848/744/715/original/f71cb6fd9ea2de97.jpg

https://creative.ai/system/media_attachments/files/111/169/278/903/277/156/original/fe24a343f8f4f2a2.jpg

https://creative.ai/system/media_attachments/files/111/169/278/960/200/737/original/9ebcb984e1c97383.jpg

https://creative.ai/system/media_attachments/files/111/169/279/024/441/081/original/4f84c6114490d63f.jpg 
 📝 Imagery Dataset for Condition Monitoring of Synthetic Fibre Ropes 🔭

"A comprehensive dataset has been generated, comprising a total of 6,942 raw images representing both normal and defective synthetic fibre ropes (SFRs)." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17058v1 #arxiv

https://creative.ai/system/media_attachments/files/111/169/160/964/617/938/original/9730eb93ef2ba35e.jpg

https://creative.ai/system/media_attachments/files/111/169/161/016/271/314/original/688a74fc4241d8d3.jpg

https://creative.ai/system/media_attachments/files/111/169/161/093/428/754/original/ea604495384d072a.jpg 
 📝 A 5-Point Minimal Solver for Event Camera Relative Motion Estimation 🔭

"Proposes eventails, which represent spatio-temporal structures generated from line features in a sequence of events, in order to derive a new minimal solver for event-based linear velocity estimation from a known rotation." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17054v1 #arxiv

https://creative.ai/system/media_attachments/files/111/168/807/024/241/913/original/b0fd66056c49ec1a.jpg

https://creative.ai/system/media_attachments/files/111/168/807/084/837/762/original/3b84a3139da052d2.jpg

https://creative.ai/system/media_attachments/files/111/168/807/131/819/900/original/2ec0af6f4fa77fdf.jpg

https://creative.ai/system/media_attachments/files/111/168/807/183/781/363/original/e7f522696941d15c.jpg 
 📝 On Uniform Scalar Quantization for Learned Image Compression 🔭

"Proposes a method based on stochastic uniform annealing for learned image compression, which has an adjustable temperature coefficient to control a tradeoff between the train-test mismatch and gradient estimation risk." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17051v1 #arxiv

https://creative.ai/system/media_attachments/files/111/168/571/140/353/271/original/df391fe473436458.jpg

https://creative.ai/system/media_attachments/files/111/168/571/217/234/274/original/987246cf2f27253f.jpg

https://creative.ai/system/media_attachments/files/111/168/571/292/327/805/original/49349f89009165b8.jpg

https://creative.ai/system/media_attachments/files/111/168/571/370/041/885/original/5c8d3551596d8156.jpg 
 📝 HoloAssist: An Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World 🔭

"HoloAssist is a large-scale egocentric human interaction dataset, where two people complete collaborative physical manipulation tasks by wearing a mixed-reality headset that captures seven synchronized data streams." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.17024v1 #arxiv

https://creative.ai/system/media_attachments/files/111/168/335/214/914/185/original/8f1d8dffa592d011.jpg

https://creative.ai/system/media_attachments/files/111/168/335/275/795/828/original/2b41b20888d18aa2.jpg

https://creative.ai/system/media_attachments/files/111/168/335/340/719/708/original/139c72125738322f.jpg

https://creative.ai/system/media_attachments/files/111/168/335/405/444/107/original/7938a20a9f44020d.jpg 
 📝 Segment Anything Model Is a Good Teacher for Local Feature Learning 🔭🧠

"The SAMFeat consists of three modules, which are Pixel Semantic Relational Distillation (PSRD), Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), and Edge Attention Guidance (EAG)." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/vignywang/SAMFeat
🔗 https://arxiv.org/abs/2309.16992v1 #arxiv

https://creative.ai/system/media_attachments/files/111/167/981/286/351/887/original/e03d549f1e63f8e2.jpg

https://creative.ai/system/media_attachments/files/111/167/981/347/788/754/original/a18174142f900ac6.jpg

https://creative.ai/system/media_attachments/files/111/167/981/408/938/012/original/477c5e428bbee31d.jpg

https://creative.ai/system/media_attachments/files/111/167/981/459/473/839/original/4a5ff57b8be51b2d.jpg 
 📝 SpikeMOT: Event-Based Multi-Object Tracking with Sparse Motion Features 🔭

"SpikeMOT leverages spiking neural networks to extract sparse spatiotemporal features from event streams associated with objects and to track the object movement at high-frequency, while a simultaneous object detector provides updated spatial information of these objects at an equivalent frame rate." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16987v1 #arxiv

https://creative.ai/system/media_attachments/files/111/167/568/438/459/184/original/c52a164aa2734ff4.jpg

https://creative.ai/system/media_attachments/files/111/167/568/507/905/005/original/3c95557172ddba47.jpg

https://creative.ai/system/media_attachments/files/111/167/568/567/703/387/original/b910010b041b6b65.jpg

https://creative.ai/system/media_attachments/files/111/167/568/630/472/249/original/60ed644ff2f75daf.jpg 
 📝 CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving 🔭

"A novel unified multi-scale blur-event fusion neural network (CZ-Net) is proposed to jointly recover sharp latent sequences in the exposure period of a blurry input and the corresponding High-Resolution (HR) events." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16949v1 #arxiv

https://creative.ai/system/media_attachments/files/111/166/742/805/670/154/original/dc2c371caa3c2169.jpg

https://creative.ai/system/media_attachments/files/111/166/742/888/309/276/original/c45ed630c5d21dd0.jpg

https://creative.ai/system/media_attachments/files/111/166/742/956/595/481/original/4a402a20b2b26c74.jpg

https://creative.ai/system/media_attachments/files/111/166/743/027/155/353/original/c8ab94020c3a957c.jpg 
 📝 Incremental Rotation Averaging Revisited and More: A New Rotation Averaging Benchmark 🔭

"Introduces a novel Incremental Rotation Averaging method (denoted as IRAv4), in which a task-specific connected dominating set is extracted to serve as a more reliable and accurate reference for rotation global alignment." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16924v1 #arxiv

https://creative.ai/system/media_attachments/files/111/166/329/904/793/952/original/7f7ea01b6ac014d1.jpg

https://creative.ai/system/media_attachments/files/111/166/329/955/174/369/original/9cac35367bebf5b9.jpg

https://creative.ai/system/media_attachments/files/111/166/330/040/802/289/original/809606d714418bcd.jpg 
 📝 YOLOR-Based Multi-Task Learning 🔭

"Learns a single model that can perform object detection, instance segmentation, semantic segmentation, and image captioning tasks with competitive performance on all tasks while maintaining a low parameter count and without any pre-training." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16921v1 #arxiv

https://creative.ai/system/media_attachments/files/111/166/093/809/899/869/original/5a22170b0cbf2b0d.jpg

https://creative.ai/system/media_attachments/files/111/166/093/872/257/159/original/522b32619f79bd28.jpg

https://creative.ai/system/media_attachments/files/111/166/093/938/301/053/original/b134b3f1f9e88f52.jpg

https://creative.ai/system/media_attachments/files/111/166/093/996/125/014/original/2d22e273655ee31b.jpg 
 📝 Space-Time Attention with Shifted Non-Local Search 🔭🧠

"The method, named Shifted Non-Local Search, executes a small grid search surrounding the predicted offsets to correct small spatial errors and achieves state-of-the-art results on video denoising." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.16849v1 #arxiv

https://creative.ai/system/media_attachments/files/111/165/032/191/545/077/original/830843159d9a2052.jpg

https://creative.ai/system/media_attachments/files/111/165/032/244/403/400/original/0109f3c6451f9e23.jpg

https://creative.ai/system/media_attachments/files/111/165/032/298/280/140/original/5ddc9c142b715e03.jpg

https://creative.ai/system/media_attachments/files/111/165/032/359/074/490/original/4e2e74918f66cc4a.jpg 
 📝 Automatic Cadastral Boundary Detection of Very High Resolution Images Using Mask R-CNN 🔭🧠

"Instance segmentation is used to solve this problem which uses Mask R-CNN and backbone of ResNet-50 pre-trained on ImageNet dataset, with some geometric post-processing on its output to improve the performance of detection and segmentation of buildings." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.16708v1 #arxiv

https://creative.ai/system/media_attachments/files/111/164/442/485/280/383/original/a35b17ae13d88aad.jpg

https://creative.ai/system/media_attachments/files/111/164/442/572/290/606/original/e91d1fac808fed58.jpg

https://creative.ai/system/media_attachments/files/111/164/442/651/268/417/original/202aff0cfc2b0b4f.jpg

https://creative.ai/system/media_attachments/files/111/164/442/739/692/061/original/566e9bcc1666ea06.jpg 
 📝 Learning to Transform for Generalizable Instance-Wise Invariance 🔭🧠

"The normalizing flow predicts a distribution over transformations that maximizes the likelihood of the transformed image under the distribution of training instances, conditioned on the class label of the original image." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/sutkarsh/flow_inv/
🔗 https://arxiv.org/abs/2309.16672v1 #arxiv

https://creative.ai/system/media_attachments/files/111/161/492/670/382/254/original/8a479f558780392c.jpg

https://creative.ai/system/media_attachments/files/111/161/492/836/636/371/original/e1defcae4b76e24c.jpg

https://creative.ai/system/media_attachments/files/111/161/492/944/138/219/original/f7857c4497e07298.jpg

https://creative.ai/system/media_attachments/files/111/161/493/153/807/211/original/e0b4a285528cf05c.jpg 
 📝 Training a Large Video Model on a Single Machine in a Day 🔭

"Uses a combination of state-of-the-art techniques including data pre-fetching, data pipelining, mixed precision, gradient checkpointing, and tensor-core offloading to optimize each component of the pipeline (IO, CPU, and GPU computation)." [gal30b+] 🤖 #CV

⚙️ https://github.com/zhaoyue-zephyrus/AVION
🔗 https://arxiv.org/abs/2309.16669v1 #arxiv

https://creative.ai/system/media_attachments/files/111/161/315/589/280/671/original/b975c4b69b2bf082.jpg

https://creative.ai/system/media_attachments/files/111/161/315/690/930/304/original/1d4185fdec238feb.jpg

https://creative.ai/system/media_attachments/files/111/161/315/751/932/320/original/e50d1064608eda54.jpg

https://creative.ai/system/media_attachments/files/111/161/315/858/378/981/original/0532b988b41cd58d.jpg 
 📝 Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors 🔭

"Proposes an equivariant regularization technique, consisting of a spatial averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks." [gal30b+] 🤖 #CV

⚙️ https://github.com/mikuhatsune/equivariance
🔗 https://arxiv.org/abs/2309.16646v1 #arxiv

https://creative.ai/system/media_attachments/files/111/160/549/155/972/818/original/8f21dbb0759f9e21.jpg

https://creative.ai/system/media_attachments/files/111/160/549/223/797/067/original/58bf4d225a0c0fb3.jpg

https://creative.ai/system/media_attachments/files/111/160/549/290/799/136/original/8a17bd06ef4cd6aa.jpg

https://creative.ai/system/media_attachments/files/111/160/549/360/559/201/original/b9d4bdde1dfb8716.jpg 
 📝 Deep Geometrized Cartoon Line Inbetweening 🔭

"Geometrizes anime line drawings into graphs and reframes the inbetweening task as a graph fusion problem with vertex repositioning, which is achieved via a vertex geometric embedding module, a vertex correspondence Transformer, an effective mechanism for vertex repositioning and a visibility predictor." [gal30b+] 🤖 #CV

⚙️ https://github.com/lisiyao21/AnimeInbet
🔗 https://arxiv.org/abs/2309.16643v1 #arxiv

https://creative.ai/system/media_attachments/files/111/160/312/939/998/801/original/c343b9da0c5cd0d7.jpg

https://creative.ai/system/media_attachments/files/111/160/313/010/429/884/original/5fb16f5f1065043a.jpg

https://creative.ai/system/media_attachments/files/111/160/313/137/110/465/original/0db74ba3d007a1e5.jpg

https://creative.ai/system/media_attachments/files/111/160/313/210/381/737/original/df9445a61c68235b.jpg 
 📝 End-to-End (Instance)-Image Goal Navigation Through Correspondence as an Emergent Phenomenon 🔭

"The first stage pretext task is cross-view completion, where the model is trained to take an RGB-D observation from view A (source view) and complete the depth from a different view B (target view)." [gal30b+] 🤖 #CV

⚙️ https://github.com/naver/croco
🔗 https://arxiv.org/abs/2309.16634v1 #arxiv

https://creative.ai/system/media_attachments/files/111/160/135/958/555/129/original/99e5ff621492e9e4.jpg

https://creative.ai/system/media_attachments/files/111/160/136/013/789/567/original/94e1d8e61b84df21.jpg

https://creative.ai/system/media_attachments/files/111/160/136/075/645/168/original/0ca32a39ee60155b.jpg

https://creative.ai/system/media_attachments/files/111/160/136/124/352/752/original/052307eba1d1122e.jpg 
 📝 KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing 🔭

"The key of the KV Inversion is the design of the Key-Value (KV) structure, which can make the editing result follow the action semantics, and at the same time, retain the original object texture." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16608v1 #arxiv

https://creative.ai/system/media_attachments/files/111/159/900/030/464/045/original/ad3491434e9596be.jpg

https://creative.ai/system/media_attachments/files/111/159/900/088/128/945/original/eedc56e346b87cbd.jpg

https://creative.ai/system/media_attachments/files/111/159/900/150/912/017/original/648ababa6c8737b9.jpg

https://creative.ai/system/media_attachments/files/111/159/900/207/834/911/original/8e8b13e93f5a98d5.jpg 
 📝 Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection 🔭🧠

"TensorFact decomposes each layer of a deep network (ResNet) into 3 factor matrices, one each for the spatial, filter and channel dimensions, thus reducing the total number of parameters." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.16592v1 #arxiv

https://creative.ai/system/media_attachments/files/111/159/605/165/629/581/original/b4202e3ba35a2594.jpg

https://creative.ai/system/media_attachments/files/111/159/605/217/409/963/original/66780ea40819109b.jpg

https://creative.ai/system/media_attachments/files/111/159/605/270/865/456/original/98bc9047f3e24faf.jpg

https://creative.ai/system/media_attachments/files/111/159/605/322/441/680/original/d13308cf57d92c11.jpg 
 📝 Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping 🔭

"Neural noise can be used to separate objects from each other in the presence of background clutter, without any additional supervision, even in the presence of illusory contours, occlusion, and continuity." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16515v1 #arxiv

https://creative.ai/system/media_attachments/files/111/159/251/256/165/954/original/c0e0919220a669c4.jpg

https://creative.ai/system/media_attachments/files/111/159/251/319/576/505/original/57256cea6fb19969.jpg

https://creative.ai/system/media_attachments/files/111/159/251/386/004/005/original/3275adf610a5cdbe.jpg

https://creative.ai/system/media_attachments/files/111/159/251/464/338/518/original/1c09d7c7b8c1aa4e.jpg 
 📝 CCEdit: Creative and Controllable Video Editing via Diffusion Models 🔭

"A ControlNet-based architecture that decouples structural and appearance aspects of a video, while maintaining consistency between them, to accommodate a wide spectrum of user editing requirements." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16496v1 #arxiv

https://creative.ai/system/media_attachments/files/111/158/897/239/141/087/original/b3cac78eace6c7b0.jpg

https://creative.ai/system/media_attachments/files/111/158/897/321/175/207/original/c633bd74ae26b9cb.jpg

https://creative.ai/system/media_attachments/files/111/158/897/376/640/620/original/d3daddf977dc0db5.jpg

https://creative.ai/system/media_attachments/files/111/158/897/467/713/539/original/061a3f716402b0b3.jpg 
 📝 Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems 🔭🧠

"Uses a deep learning model trained on publicly available data, which is fine-tuned on a small set of labeled target parking lot images (few-shot learning)." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.16495v1 #arxiv

https://creative.ai/system/media_attachments/files/111/158/661/461/013/402/original/620b71484154b67c.jpg

https://creative.ai/system/media_attachments/files/111/158/661/507/591/140/original/2988f259f8abb6aa.jpg

https://creative.ai/system/media_attachments/files/111/158/661/552/808/098/original/8e817b4f59703658.jpg

https://creative.ai/system/media_attachments/files/111/158/661/601/571/649/original/30b810f835b913a1.jpg 
 📝 Accurate and Lightweight Dehazing via Multi-Receptive-Field Non-Local Network and Novel Contrastive Regularization 🔭

"Multi-receptive-field non-local network (MRFNLN) is presented, which contains the multi-stream feature attention block (MSFAB) and cross non-local block (CNLB)." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16494v1 #arxiv

https://creative.ai/system/media_attachments/files/111/158/484/420/281/759/original/cb0393522c76cbe9.jpg

https://creative.ai/system/media_attachments/files/111/158/484/491/069/512/original/610cbf2f7c9b0e7a.jpg

https://creative.ai/system/media_attachments/files/111/158/484/549/238/784/original/4bb92e2994cede5c.jpg

https://creative.ai/system/media_attachments/files/111/158/484/603/635/220/original/5bc7ddd64382316e.jpg 
 📝 Rethinking Domain Generalization: Discriminability and Generalizability 🔭

"A novel framework called Discriminative Microscopic Distribution Alignment is presented which concurrently imbues features with formidable discriminability and robust generalizability, consisting of two core components: Selective Channel Pruning and Micro-level Distribution Alignment." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16483v1 #arxiv

https://creative.ai/system/media_attachments/files/111/158/248/517/409/268/original/9fc798ea45ea6276.jpg

https://creative.ai/system/media_attachments/files/111/158/248/582/049/122/original/ad12099a690df993.jpg

https://creative.ai/system/media_attachments/files/111/158/248/646/805/213/original/03c39eae4e48c41e.jpg

https://creative.ai/system/media_attachments/files/111/158/248/698/535/820/original/1eedcdb80b9e53bb.jpg 
 📝 Diverse Target and Contribution Scheduling for Domain Generalization 🔭

"Consists of Diverse Target Supervision (DTS) and Diverse Contribution Balance (DCB), with the aim of addressing the limitations associated with the common utilization of one-hot labels and equal contributions for source domains in Domain Generalization." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16460v1 #arxiv

https://creative.ai/system/media_attachments/files/111/158/071/571/406/496/original/ccbd8f20086db09f.jpg

https://creative.ai/system/media_attachments/files/111/158/071/643/723/537/original/ae19ac05a5442a2d.jpg 
 📝 Towards Novel Class Discovery: A Study in Novel Skin Lesions Clustering 🔭

"Proposes a new framework for automatic discovery of new semantic classes from skin lesion dataset based on the knowledge of known classes, which leverages contrastive learning, multi-view cross pseudo-supervision, and neighborhood consensus." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16451v1 #arxiv

https://creative.ai/system/media_attachments/files/111/157/717/711/212/766/original/37939d22ac8cf9f0.jpg 
 📝 Distilling ODE Solvers of Diffusion Models Into Smaller Steps 🔭

"Proposes a straightforward distillation approach that optimizes the ODE solver rather than training the denoising network, resulting in a new ODE solver with improved speed-quality tradeoffs and preserving the sampling trajectory from the ODE solver." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16421v1 #arxiv

https://creative.ai/system/media_attachments/files/111/157/422/823/289/120/original/be857b7ef3816b27.jpg

https://creative.ai/system/media_attachments/files/111/157/422/914/430/625/original/9917f6ff420e3538.jpg

https://creative.ai/system/media_attachments/files/111/157/422/978/490/917/original/b0b89b48967f2c21.jpg

https://creative.ai/system/media_attachments/files/111/157/423/079/465/247/original/f84166a05b90aba0.jpg 
 📝 HIC-YOLOv5: Improved YOLOv5 for Small Object Detection 🔭

"Works by adding an additional prediction head specific to small object detection in the original YOLOv5, adopting an involution block in between the backbone and neck to increase channel information of the feature map, an attention mechanism named CBAM is applied at the end of the backbone." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16393v1 #arxiv

https://creative.ai/system/media_attachments/files/111/157/127/859/413/736/original/d2fd625f286025e1.jpg

https://creative.ai/system/media_attachments/files/111/157/127/910/096/735/original/fbf9d58115e0af29.jpg

https://creative.ai/system/media_attachments/files/111/157/127/962/300/587/original/c3eeb62306dd4da2.jpg

https://creative.ai/system/media_attachments/files/111/157/128/013/863/562/original/91ba5f6738cf5ebd.jpg 
 📝 Aperture Diffraction for Compact Snapshot Spectral Imaging 🔭

"Designs Aperture Diffraction Imaging Spectrometer (ADIS) which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, which realizes sub-super-pixel spatial resolution and high spectral resolution imaging by elaborating the imaging optical theory and reconstruction algorithm." [gal30b+] 🤖 #CV

⚙️ https://github.com/Krito-ex/CSST
🔗 https://arxiv.org/abs/2309.16372v1 #arxiv

https://creative.ai/system/media_attachments/files/111/156/832/982/712/560/original/c58f6b682e953f24.jpg

https://creative.ai/system/media_attachments/files/111/156/833/046/623/138/original/e84a6fd62366f06d.jpg

https://creative.ai/system/media_attachments/files/111/156/833/105/013/561/original/21cdde2331715688.jpg

https://creative.ai/system/media_attachments/files/111/156/833/162/295/006/original/9a73766db9a38b39.jpg 
 📝 Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning 🔭

"Training a GAN based synthetic-image generator translating available day-time image examples into night images and using it in metric learning as a form of augmentation and supplying training data to the scarce domain." [gal30b+] 🤖 #CV

⚙️ https://github.com/mohwald/gandtr
🔗 https://arxiv.org/abs/2309.16351v1 #arxiv

https://creative.ai/system/media_attachments/files/111/156/597/004/754/981/original/a4d360fc7f27a7ab.jpg

https://creative.ai/system/media_attachments/files/111/156/597/056/547/925/original/db1bddfbf824cb41.jpg

https://creative.ai/system/media_attachments/files/111/156/597/109/489/416/original/a2af030d40e4346b.jpg

https://creative.ai/system/media_attachments/files/111/156/597/168/236/470/original/7c09b815ee815422.jpg 
 📝 GAFlow: Incorporating Gaussian Attention Into Optical Flow 🔭

"AGaussian-Constrained Layer (GCL) is proposed to highlight the local neighborhood during feature extraction while the proposed Gaussian-Guided Attention Module (G-GAM) is able to enforce the motion affinity during matching." [gal30b+] 🤖 #CV

⚙️ https://github.com/LA30/GAFlow
🔗 https://arxiv.org/abs/2309.16217v1 #arxiv

https://creative.ai/system/media_attachments/files/111/155/889/149/508/226/original/1800867cb22dc937.jpg

https://creative.ai/system/media_attachments/files/111/155/889/213/116/632/original/440857e316539ad3.jpg

https://creative.ai/system/media_attachments/files/111/155/889/263/698/893/original/0a814baf33e2398f.jpg

https://creative.ai/system/media_attachments/files/111/155/889/319/535/420/original/5d31b8e41b7a1885.jpg 
 📝 Nonconvex Third-Order Tensor Recovery Based on Logarithmic Minimax Function 🔭

"Can protect large singular values while imposing stronger penalization on small singular values, thus leading to an improved low-rank tensor recovery performance compared with other state-of-the-art methods." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16208v1 #arxiv

https://creative.ai/system/media_attachments/files/111/155/653/130/438/723/original/552dad1ccf8c67d0.jpg

https://creative.ai/system/media_attachments/files/111/155/653/215/082/269/original/2dd0e8cf0afe0191.jpg

https://creative.ai/system/media_attachments/files/111/155/653/297/178/457/original/05b4025559d394b0.jpg

https://creative.ai/system/media_attachments/files/111/155/653/353/929/424/original/a7844fdff956e40f.jpg 
 📝 Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks 🔭

"PSAT utilizes hypernetworks to train specialized models against a single perturbation and aggregates these specialized models to defend against multiple perturbations, achieving multi-perturbation robustness and parameter efficiency." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16207v1 #arxiv

https://creative.ai/system/media_attachments/files/111/155/476/230/035/664/original/329ff341e9187d3f.jpg

https://creative.ai/system/media_attachments/files/111/155/476/302/252/130/original/ffe0b29b3d1a467e.jpg 
 📝 Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling 🔭🧠

"Uncertainty-based sampling and diversity-based sampling are used to select the most informative images for labeling in a post-hoc setup where the segmentation model has already been trained." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.16139v1 #arxiv

https://creative.ai/system/media_attachments/files/111/155/240/478/424/709/original/67e91a9c9e7d351b.jpg

https://creative.ai/system/media_attachments/files/111/155/240/541/462/960/original/b961e10f9315abd2.jpg

https://creative.ai/system/media_attachments/files/111/155/240/600/361/566/original/e7871896e1efc1a0.jpg 
 📝 Context-I2w: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval 🔭

"An Intent View Selector first dynamically learns a rotation rule to map the identical image to a task-specific manipulation view, and a Visual Target Extractor further captures local information covering the main targets in ZS-CIR tasks." [gal30b+] 🤖 #CV

⚙️ https://github.com/Pter61/context_i2w
🔗 https://arxiv.org/abs/2309.16137v1 #arxiv

https://creative.ai/system/media_attachments/files/111/154/945/345/167/916/original/03df4b97a0bbe400.jpg

https://creative.ai/system/media_attachments/files/111/154/945/395/137/752/original/da7fbf6a887cb408.jpg

https://creative.ai/system/media_attachments/files/111/154/945/448/085/893/original/da8bc1ef1da00343.jpg

https://creative.ai/system/media_attachments/files/111/154/945/511/610/530/original/bee4c8b80cddc2cc.jpg