Oddbean new post about | logout

Notes by 9a622e93 | export

 📝 UVL: A Unified Framework for Video Tampering Localization 🔭

"UVL detects forged regions in videos by extracting common features of fake videos, including boundary artifacts of synthetic edges, unnatural distribution of generated pixels, and noncorrelation between the forgery region and the original." [gal30b+] 🤖 #CV #CR

🔗 https://arxiv.org/abs/2309.16126v1 #arxiv

https://creative.ai/system/media_attachments/files/111/154/473/639/116/103/original/ff2f287ffe3722a6.jpg

https://creative.ai/system/media_attachments/files/111/154/473/693/870/348/original/e6159769db6607d5.jpg

https://creative.ai/system/media_attachments/files/111/154/473/755/369/055/original/9a9d5fac8b6b205d.jpg

https://creative.ai/system/media_attachments/files/111/154/473/812/557/020/original/16dcb6e197c11ac5.jpg 
 📝 Handbook on Leveraging Lines for Two-View Relative Pose Estimation 🔭

"Jointly exploits points, lines, and their coincidences in a hybrid manner for estimating the relative pose between calibrated image pairs, by investigating all possible configurations where these data modalities can be used together and reviewing the minimal solvers that are available in the literature." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.16040v1 #arxiv

https://creative.ai/system/media_attachments/files/111/154/237/773/983/567/original/3af0bb7ad8ce1aa9.jpg

https://creative.ai/system/media_attachments/files/111/154/237/832/395/940/original/36ca5fe77748f89b.jpg

https://creative.ai/system/media_attachments/files/111/154/237/887/412/860/original/171e841af8eeff0d.jpg

https://creative.ai/system/media_attachments/files/111/154/237/944/982/365/original/506d471c9aa41ade.jpg 
 📝 GeoCLIP: Clip-Inspired Alignment Between Locations and Images for Effective Worldwide Geo-Localization 🔭🧠

"Proposes GeoCLIP, a CLIP-inspired Image-to-GPS retrieval approach, that enforces alignment between image and its corresponding GPS locations using GPS encoding through random Fourier features." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.16020v1 #arxiv

https://creative.ai/system/media_attachments/files/111/154/001/718/193/642/original/46b042a41e9f8c7a.jpg

https://creative.ai/system/media_attachments/files/111/154/001/800/372/724/original/2c6cd188e55e9dd0.jpg

https://creative.ai/system/media_attachments/files/111/154/001/862/778/324/original/b3efc33e6166c5c4.jpg

https://creative.ai/system/media_attachments/files/111/154/001/936/846/948/original/8883ce602424edb4.jpg 
 📝 The Devil Is in the Details: A Deep Dive Into the Rabbit Hole of Data Filtering 🔭🧠

"Proposes a multi-stage data filtering approach, which includes single-modality filtering, cross-modality filtering and data distribution alignment stages, to find high-quality images from the original dataset." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/mlfoundations/datacomp
🔗 https://arxiv.org/abs/2309.15954v1 #arxiv

https://creative.ai/system/media_attachments/files/111/153/352/983/456/270/original/5d76eb430af531c1.jpg

https://creative.ai/system/media_attachments/files/111/153/353/062/787/320/original/5d1ca7913096f134.jpg

https://creative.ai/system/media_attachments/files/111/153/353/171/436/634/original/743c820de694cfb9.jpg

https://creative.ai/system/media_attachments/files/111/153/353/230/765/694/original/a593163822a07589.jpg 
 📝 Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts 🔭

"Introduces a new parameter-efficient approach for vision-language tasks called VITIS that combines multimodal prompt learning and a transformer-based mapping network, while keeping the pretrained models frozen." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15915v1 #arxiv

https://creative.ai/system/media_attachments/files/111/152/940/113/001/536/original/98e0f07f1add44a8.jpg

https://creative.ai/system/media_attachments/files/111/152/940/168/578/302/original/915bf1afcc42be59.jpg 
 📝 Highly Efficient SNNs for High-Speed Object Detection 🔭

"Proposes an efficient Spiking Neural Network (SNN) for object detection based on the quantization training method and continuous inference scheme by using a Feed-Forward Integrate-and-Fire (FewdIF) neuron." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15883v1 #arxiv

https://creative.ai/system/media_attachments/files/111/152/763/100/294/762/original/ed181763a648586c.jpg 
 📝 Reflection Invariance Learning for Few-Shot Semantic Segmentation 🔭

"Proposes a fresh few-shot segmentation framework to mine the reflection invariance in a multi-view matching manner, where a stronger category representation is obtained for matching the query features." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15850v1 #arxiv

https://creative.ai/system/media_attachments/files/111/152/527/274/304/994/original/f9b2aff4810f7ed4.jpg

https://creative.ai/system/media_attachments/files/111/152/527/335/569/177/original/6fc28e0e75260a8b.jpg

https://creative.ai/system/media_attachments/files/111/152/527/414/619/800/original/8aab1a003d64af53.jpg

https://creative.ai/system/media_attachments/files/111/152/527/465/252/711/original/6c09afc5afe33546.jpg 
 📝 Exploiting the Signal-Leak Bias in Diffusion Models 🔭🧠

"Models the signal leak distribution in the spatial frequency and pixel domains to allow us to control the generated images in the corresponding image domains, such as brightness and style, without any additional training." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.15842v1 #arxiv

https://creative.ai/system/media_attachments/files/111/152/076/746/477/905/original/65444c6c6e3aa10e.jpg

https://creative.ai/system/media_attachments/files/111/152/076/815/734/314/original/96daef4ede93edc0.jpg

https://creative.ai/system/media_attachments/files/111/152/076/875/402/078/original/3a5df53de60ff51d.jpg

https://creative.ai/system/media_attachments/files/111/152/076/945/190/062/original/a5b65303404d76d2.jpg 
 📝 One for All: Video Conversation Is Feasible Without Video Instruction Tuning 🔭

"Introduces a novel method, Branching Temporal Adapter (BT-Adapter), for extending image-language pretrained models into the video domain, which serves as a plug-and-use temporal modeling branch alongside the CLIP backbone." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15785v1 #arxiv

https://creative.ai/system/media_attachments/files/111/151/133/152/651/466/original/4f057a3255966bdf.jpg

https://creative.ai/system/media_attachments/files/111/151/133/211/907/634/original/4189b0f8c6a5a4e1.jpg

https://creative.ai/system/media_attachments/files/111/151/133/271/062/709/original/09b7c05ed6a10796.jpg

https://creative.ai/system/media_attachments/files/111/151/133/327/289/654/original/747b7bd075c79261.jpg 
 📝 AaP-ReID: Improved Attention-Aware Person Re-Identification 🔭

"AaP-ReID is a part-based and attention-based method for person ReID based on AlignedReID++, incorporating Channel-Wise Attention Bottleneck (CWAbottleneck) blocks to improve the ability to extract discriminating features from a pedestrian image." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15780v1 #arxiv

https://creative.ai/system/media_attachments/files/111/150/838/354/910/851/original/77ddd6b3860f7a0b.jpg

https://creative.ai/system/media_attachments/files/111/150/838/409/159/876/original/939d619dc8bbdf46.jpg

https://creative.ai/system/media_attachments/files/111/150/838/482/377/071/original/a68d15ed0f033cee.jpg

https://creative.ai/system/media_attachments/files/111/150/838/539/805/458/original/a118ede36bbca630.jpg 
 📝 CAIT: Triple-Win Compression Towards High Accuracy, Fast Inference, and Favorable Transferability for ViTs 🔭

"Proposes a joint compression framework (CAIT) for Vision Transformers (ViTs) that offers both high accuracy and fast inference speed, while also maintaining favorable transferability to downstream tasks." [gal30b+] 🤖 #CV

⚙️ https://github.com/facebookresearch/fvcore
🔗 https://arxiv.org/abs/2309.15755v1 #arxiv

https://creative.ai/system/media_attachments/files/111/150/484/257/328/782/original/abde1beefdcdc602.jpg

https://creative.ai/system/media_attachments/files/111/150/484/320/990/109/original/a428b84dbf96d279.jpg

https://creative.ai/system/media_attachments/files/111/150/484/374/330/763/original/dbdcac3f966045c8.jpg

https://creative.ai/system/media_attachments/files/111/150/484/424/814/786/original/dc8397f43cc752a0.jpg 
 📝 Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation 🔭🧠

"A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel using a different network head for each head, and combine the results." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.15726v1 #arxiv

https://creative.ai/system/media_attachments/files/111/149/953/570/699/727/original/b80be63ae7306726.jpg

https://creative.ai/system/media_attachments/files/111/149/953/638/575/523/original/a6249648a39c6087.jpg

https://creative.ai/system/media_attachments/files/111/149/953/700/512/384/original/e9bec6d9bc0da5e7.jpg

https://creative.ai/system/media_attachments/files/111/149/953/765/565/129/original/5fa50f4411d6df55.jpg 
 📝 Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing 🔭

"Dynamic Prompt Learning (DPL) forces cross-attention maps to focus on correct noun words in the text prompt by updating the dynamic tokens for nouns in the textual input with the proposed leakage repairment losses." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15664v1 #arxiv

https://creative.ai/system/media_attachments/files/111/149/422/608/358/000/original/9f886faf5d0fb103.jpg

https://creative.ai/system/media_attachments/files/111/149/422/705/376/953/original/142ba16ef9e3291e.jpg

https://creative.ai/system/media_attachments/files/111/149/422/766/281/937/original/b223da1c4fb7416c.jpg

https://creative.ai/system/media_attachments/files/111/149/422/819/502/238/original/7c4ec080a2c99576.jpg 
 📝 Human Kinematics-Inspired Skeleton-Based Video Anomaly Detection 🔭

"A human kinematic-inspired anomaly detection method is introduced that explicitly uses human kinematic features to detect anomalies, including walking stride, skeleton displacement at feet level, and neck level." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15662v1 #arxiv

https://creative.ai/system/media_attachments/files/111/149/186/609/666/209/original/c922a712ca780349.jpg

https://creative.ai/system/media_attachments/files/111/149/186/669/005/135/original/8f5a37a64ce04bee.jpg

https://creative.ai/system/media_attachments/files/111/149/186/725/749/633/original/43f17fd9443d2aa5.jpg

https://creative.ai/system/media_attachments/files/111/149/186/779/101/558/original/392127a293638ff0.jpg 
 📝 Domain Generalization Across Tumor Types, Laboratories, and Species -- Insights From the 2022 Edition of the Mitosis Domain Generalization Challenge 🔭

"The challenge was organized as part of the IEEE International Symposium on Biomedical Imaging in conjunction with MICCAI and MICCAI-SIIMS Virtual Joint Conference (IEEE-MICCAI-SIIMS-VJC)." [gal30b+] 🤖 #CV

⚙️ https://github.com/DeepMicroscopy/
🔗 https://arxiv.org/abs/2309.15589v1 #arxiv

https://creative.ai/system/media_attachments/files/111/148/655/920/032/484/original/488386ca65a75bf0.jpg

https://creative.ai/system/media_attachments/files/111/148/655/971/827/479/original/fa503ada56e739cc.jpg

https://creative.ai/system/media_attachments/files/111/148/656/027/094/508/original/77b1d60d00dc7128.jpg

https://creative.ai/system/media_attachments/files/111/148/656/080/593/474/original/4d24db61012d1be1.jpg 
 📝 Confidence-Based Visual Dispersal for Few-Shot Unsupervised Domain Adaptation 🔭

"C-VisDiT consists of a cross-domain visual dispersal strategy that transfers only high-confidence source knowledge for model adaptation and an intra-domain visual dispersal strategy that guides the learning of hard target samples with easy ones." [gal30b+] 🤖 #CV

⚙️ https://github.com/Bostoncake/C-VisDiT
🔗 https://arxiv.org/abs/2309.15575v1 #arxiv

https://creative.ai/system/media_attachments/files/111/148/125/084/139/633/original/e95708ff9e20f056.jpg

https://creative.ai/system/media_attachments/files/111/148/125/138/930/442/original/3b82604423eebc56.jpg

https://creative.ai/system/media_attachments/files/111/148/125/190/908/162/original/c50c65ef6a319bfc.jpg

https://creative.ai/system/media_attachments/files/111/148/125/254/721/121/original/5bb827077de44b57.jpg 
 📝 Guided Frequency Loss for Image Restoration 🔭

"The Guided Frequency Loss (GFL) aggregates three major components that work in parallel to enhance learning efficiency; a Charbonnier component, a Laplacian Pyramid component, and a Gradual Frequency component." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15563v1 #arxiv

https://creative.ai/system/media_attachments/files/111/147/830/247/203/547/original/e5901ce955bc5169.jpg

https://creative.ai/system/media_attachments/files/111/147/830/297/730/010/original/d74ccb2850c6624b.jpg

https://creative.ai/system/media_attachments/files/111/147/830/355/621/946/original/3e7984675b5ea539.jpg 
 📝 Learning From SAM: Harnessing a Segmentation Foundation Model for Sim2Real Domain Adaptation Through Regularization 🔭

"Proposes a novel self-supervised domain adaptation approach for semantic segmentation that is based on segment invariance-variance training, leveraging the Segment Anything Model to extract segments from the unlabelled target data." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15562v1 #arxiv

https://creative.ai/system/media_attachments/files/111/147/476/286/690/390/original/9ea7e76287f56dd2.jpg

https://creative.ai/system/media_attachments/files/111/147/476/346/721/293/original/6c58debfd4c88fca.jpg

https://creative.ai/system/media_attachments/files/111/147/476/402/766/322/original/cd57cd8dcc99ffe6.jpg

https://creative.ai/system/media_attachments/files/111/147/476/457/991/340/original/f1bde52ae8ac26a0.jpg 
 📝 Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection 🔭

"A straightforward defense strategy, Patch-based Occlusion-aware Detection (POD), is devised for human detection in IR images, which efficiently augments training samples with random patches and subsequently detects them." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15519v1 #arxiv

https://creative.ai/system/media_attachments/files/111/146/650/541/166/186/original/310d22039ab9aecd.jpg

https://creative.ai/system/media_attachments/files/111/146/650/593/422/316/original/d6d490b26d445afa.jpg 
 📝 Transferability of Representations Learned Using Supervised Contrastive Learning Trained on a Multi-Domain Dataset 🔭

"Supervised Contrastive Learning models are trained to maximize agreement between differently augmented views of the same image while minimizing agreement between augmented views of different images from the same batch of images." [gal30b+] 🤖 #CV

⚙️ https://github.com/rois-codh/kaokore
🔗 https://arxiv.org/abs/2309.15486v1 #arxiv

https://creative.ai/system/media_attachments/files/111/145/058/096/424/512/original/32731df35732f292.jpg

https://creative.ai/system/media_attachments/files/111/145/058/154/492/392/original/4f7531446b4d6e8e.jpg

https://creative.ai/system/media_attachments/files/111/145/058/214/567/094/original/9e99cea3ffd47d79.jpg 
 📝 InternLM-XComposer: A Vision-Language Large Model for Advanced Text-Image Comprehension and Composition 🔭

"InternLM-XComposer can effortlessly generate coherent and contextual articles that seamlessly integrate images, providing a more engaging and immersive reading experience with rich knowledge and comprehension." [gal30b+] 🤖 #CV

⚙️ https://github.com/InternLM/InternLM-XComposer
🔗 https://arxiv.org/abs/2309.15112v1 #arxiv

https://creative.ai/system/media_attachments/files/111/143/965/013/905/919/original/a6ecb82c1875ccf3.jpg

https://creative.ai/system/media_attachments/files/111/143/965/126/813/101/original/d417d6ba5175b799.jpg

https://creative.ai/system/media_attachments/files/111/143/965/212/314/618/original/ce09f229b1b18785.jpg

https://creative.ai/system/media_attachments/files/111/143/965/311/895/757/original/a67f7ec6e9019be7.jpg 
 📝 LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models 🔭

"Proposes LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model." [gal30b+] 🤖 #CV

⚙️ https://github.com/Breakthrough/PySceneDetect
🔗 https://arxiv.org/abs/2309.15103v1 #arxiv

https://creative.ai/system/media_attachments/files/111/143/729/176/774/925/original/5c178d4e01e1f6dc.jpg

https://creative.ai/system/media_attachments/files/111/143/729/268/178/294/original/c28c0a5afd74eb6d.jpg

https://creative.ai/system/media_attachments/files/111/143/729/363/808/381/original/aea0977c25439479.jpg

https://creative.ai/system/media_attachments/files/111/143/729/446/261/083/original/700d6851daaacaa5.jpg 
 📝 An Ensemble Model for Distorted Images in Real Scenarios 🔭

"A combination of data enhancement, detection box ensemble, denoiser ensemble, super-resolution models, and transfer learning is used to make the model achieve excellent performance on the CDCOCO dataset." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14998v1 #arxiv

https://creative.ai/system/media_attachments/files/111/142/726/415/793/199/original/6c342c38435f8d2e.jpg

https://creative.ai/system/media_attachments/files/111/142/726/477/195/516/original/7009052f5587a46f.jpg

https://creative.ai/system/media_attachments/files/111/142/726/533/289/096/original/b7306d763644bcef.jpg

https://creative.ai/system/media_attachments/files/111/142/726/592/605/837/original/db2bb36428ea1534.jpg 
 📝 IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network 🔭

"The proposed IAIFNet consists of an illumination enhancement network and an image fusion network, with the adaptive differential fusion module (ADFM) and salient target aware module (STAM) for the effective integration of features and targets." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14997v1 #arxiv

https://creative.ai/system/media_attachments/files/111/142/372/433/396/500/original/5d3b9d323ba24ac3.jpg

https://creative.ai/system/media_attachments/files/111/142/372/489/176/245/original/735b16adf005f84e.jpg

https://creative.ai/system/media_attachments/files/111/142/372/553/118/534/original/d05b3e829ef0fc76.jpg

https://creative.ai/system/media_attachments/files/111/142/372/606/562/029/original/d5029d072a077595.jpg 
 📝 FEC: Three Finetuning-Free Methods to Enhance Consistency for Real Image Editing 🔭

"Proposes a new image sampling method called FEC, which consists of three sampling methods, each designed for different editing types and settings, and achieves two important goals in image editing task: ensuring successful reconstruction and improving the performance of many editing methods." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14934v1 #arxiv

https://creative.ai/system/media_attachments/files/111/141/428/902/103/798/original/159d23e051f36750.jpg

https://creative.ai/system/media_attachments/files/111/141/428/987/909/415/original/6470689fa1f7a8c9.jpg

https://creative.ai/system/media_attachments/files/111/141/429/068/881/303/original/5a527a0ceccf2b5a.jpg

https://creative.ai/system/media_attachments/files/111/141/429/147/771/474/original/f5efe83bf3eb6873.jpg 
 📝 Noise-Tolerant Unsupervised Adapter for Vision-Language Models 🔭🧠

"NtUA works as a key-value cache that formulates visual features and predicted pseudo-labels of the few-shot unlabelled target samples as key-value pairs." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.14928v1 #arxiv

https://creative.ai/system/media_attachments/files/111/141/192/906/263/361/original/7c8e63f65f1eb5a2.jpg

https://creative.ai/system/media_attachments/files/111/141/192/992/868/323/original/d98df5d515f0160a.jpg

https://creative.ai/system/media_attachments/files/111/141/193/051/875/282/original/32e847d83dddaae0.jpg 
 📝 Face Cartoonisation for Various Poses Using StyleGAN 🔭

"Trains an encoder to capture both pose and identity information from images and generate a corresponding embedding within the StyleGAN latent space for cartoonisation purpose by passing the embedding through a pre-trained generator." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14908v1 #arxiv

https://creative.ai/system/media_attachments/files/111/140/956/965/468/342/original/e0c7a5ebd09fb980.jpg

https://creative.ai/system/media_attachments/files/111/140/957/047/043/970/original/dea43633b009caf1.jpg

https://creative.ai/system/media_attachments/files/111/140/957/122/955/982/original/a5cc070c73d6dc9e.jpg

https://creative.ai/system/media_attachments/files/111/140/957/194/949/708/original/a2d8bbcb0fefbdd4.jpg 
 📝 Pre-Training-Free Image Manipulation Localization Through Non-Mutually Exclusive Contrastive Learning 🔭

"Proposes a pivot structure to constantly change the role of contour patches between positives and negatives while training and thus avoids spatial corruption caused by the role-changing process with a pivot-consistent loss." [gal30b+] 🤖 #CV

⚙️ https://github.com/Knightzjz/NCL-IML
🔗 https://arxiv.org/abs/2309.14900v1 #arxiv

https://creative.ai/system/media_attachments/files/111/140/780/017/406/691/original/c1011027eba82892.jpg

https://creative.ai/system/media_attachments/files/111/140/780/092/603/071/original/795c7542d5ea3a96.jpg

https://creative.ai/system/media_attachments/files/111/140/780/144/740/629/original/4682ae48721bb05b.jpg

https://creative.ai/system/media_attachments/files/111/140/780/207/911/020/original/47787554d343472f.jpg 
 📝 Cross-Dataset-Robust Method for Blind Real-World Image Quality Assessment 🔭

"First, many individual BIQA models based on SwinT are trained on different real-world BIQA datasets respectively and jointly used to generate pseudo-labels which adopts the probability of relative quality of two random images instead of fixed quality score." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14868v1 #arxiv

https://creative.ai/system/media_attachments/files/111/140/426/442/077/255/original/9fd0c396cd0e1d36.jpg

https://creative.ai/system/media_attachments/files/111/140/426/518/706/748/original/bd2d869f8175698f.jpg

https://creative.ai/system/media_attachments/files/111/140/426/595/368/721/original/990f2a9cf6ddddd4.jpg

https://creative.ai/system/media_attachments/files/111/140/426/657/598/230/original/3f48915706377832.jpg 
 📝 ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios 🔭

"A dataset and benchmarks on four tasks related to human-object interactions in the industrial domain: 1) untrimmed action detection, 2) egocentric human-object interaction detection, 3) short-term object interaction anticipation and 4) natural language understanding of intents and entities." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14809v1 #arxiv

https://creative.ai/system/media_attachments/files/111/140/131/232/183/623/original/0f2b52b56d494134.jpg

https://creative.ai/system/media_attachments/files/111/140/131/310/111/294/original/2d40b275bc695de7.jpg

https://creative.ai/system/media_attachments/files/111/140/131/370/681/356/original/a3d5e8f147044c12.jpg

https://creative.ai/system/media_attachments/files/111/140/131/430/064/495/original/9ab27c815c2d5eb0.jpg 
 📝 Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation 🔭

"Designs a novel motion-as-option network that can take RGB images and optical flow maps as an input at the same time, and produce two different predictions." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14786v1 #arxiv

https://creative.ai/system/media_attachments/files/111/139/895/171/649/291/original/142b4261311e784a.jpg

https://creative.ai/system/media_attachments/files/111/139/895/235/910/017/original/a568536db1f38a14.jpg 
 📝 Multi-Label Feature Selection Using Adaptive and Transformed Relevance 🔭

"An information-theoretical filter-based approach, which is scalable to extensive feature and label spaces, addressing multi-label classification scenarios where instances are associated with multiple class labels simultaneously." [gal30b+] 🤖 #CV

⚙️ https://github.com/Sadegh28/ATR
🔗 https://arxiv.org/abs/2309.14768v1 #arxiv

https://creative.ai/system/media_attachments/files/111/139/718/472/133/139/original/5976d009862658a7.jpg

https://creative.ai/system/media_attachments/files/111/139/718/565/265/155/original/2634606db5afa013.jpg

https://creative.ai/system/media_attachments/files/111/139/718/664/050/987/original/b4fdb3e7cf55774a.jpg

https://creative.ai/system/media_attachments/files/111/139/718/759/845/773/original/c200e992879547c8.jpg 
 📝 Image Denoising via Style Disentanglement 🔭

"The style of the noisy image is encoded with a style module and then transferred to the clean image, which induces low-response activations for noise features and high-response activations for content features in the feature space." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14755v1 #arxiv

https://creative.ai/system/media_attachments/files/111/139/246/309/592/924/original/80a44eb19381fb0c.jpg

https://creative.ai/system/media_attachments/files/111/139/246/365/493/490/original/86892d13dca786bc.jpg

https://creative.ai/system/media_attachments/files/111/139/246/422/398/388/original/ac8542cf6613507e.jpg

https://creative.ai/system/media_attachments/files/111/139/246/475/223/288/original/5df72d262c7c87d5.jpg 
 📝 Advanced Volleyball Stats for All Levels: Automatic Setting Tactic Detection and Classification with a Single Camera 🔭

"Uses a novel set trajectory classifier to identify setters’ movements and the ball’s trajectory from a single camera view, and to detect the opposing team's right-side hitter’s current row (front or back) during gameplay." [gal30b+] 🤖 #CV

⚙️ https://github.com/volleyIEEE/VolleyStats
🔗 https://arxiv.org/abs/2309.14753v1 #arxiv

https://creative.ai/system/media_attachments/files/111/139/010/528/682/065/original/bccf1bccfcec8b75.jpg

https://creative.ai/system/media_attachments/files/111/139/010/581/127/114/original/f5a1d2d8fc62e27d.jpg

https://creative.ai/system/media_attachments/files/111/139/010/628/560/007/original/b1af7a91b72a32f7.jpg 
 📝 SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion 🔭

"Can generate high-quality fusion images from pairs of infrared and visible images, which can boost the performance of downstream computer-vision tasks, such as object detection and recognition." [gal30b+] 🤖 #CV

⚙️ https://github.com/QiaoYang-CV/SSPFUSION
🔗 https://arxiv.org/abs/2309.14745v1 #arxiv

https://creative.ai/system/media_attachments/files/111/138/715/657/849/974/original/a73dd5263fe38247.jpg

https://creative.ai/system/media_attachments/files/111/138/715/718/909/193/original/83af5339c9b3375e.jpg

https://creative.ai/system/media_attachments/files/111/138/715/774/512/196/original/978609b7eba5309d.jpg 
 📝 Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement 🔭

"BDCE is a bootstrap diffusion model that exploits the learning of distribution of the curve parameters instead of the normal-light image itself, and denoise module is applied in each iteration of curve adjustment to denoise the intermediate enhanced result of each iteration." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14709v1 #arxiv

https://creative.ai/system/media_attachments/files/111/138/538/635/713/232/original/9d297d4eb526ef4f.jpg

https://creative.ai/system/media_attachments/files/111/138/538/691/505/460/original/012ba6b8ddd78103.jpg

https://creative.ai/system/media_attachments/files/111/138/538/741/659/832/original/95f4757bff52aa60.jpg

https://creative.ai/system/media_attachments/files/111/138/538/790/758/288/original/60a66f636a2241cc.jpg 
 📝 Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator 🔭

"Free-Bloom harnesses large language models as the director to generate a semantic coherence prompt sequence, while pre-train latent diffusion models as the animator to generate the high fidelity frames." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.14494v1 #arxiv

https://creative.ai/system/media_attachments/files/111/137/123/122/750/405/original/a58d4b94f4a92e48.jpg

https://creative.ai/system/media_attachments/files/111/137/123/205/291/171/original/f4681491f971c64e.jpg

https://creative.ai/system/media_attachments/files/111/137/123/283/252/780/original/9e02a0ee3238baef.jpg

https://creative.ai/system/media_attachments/files/111/137/123/383/662/259/original/c27a64d49d1a9ed5.jpg 
 📝 HPCR: Holistic Proxy-Based Contrastive Replay for Online Continual Learning 🧠🔭

"The proxy-based contrastive replay (PCR) alleviates the catastrophic forgetting issue via contrastive learning with anchor-to-proxy pairs, while the holistic proxy-based contrastive replay (HPCR) learns more finegrained semantic information with anchor-to-sample pairs." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.15038v1 #arxiv

https://creative.ai/system/media_attachments/files/111/136/887/171/062/649/original/849dfea0a098739d.jpg

https://creative.ai/system/media_attachments/files/111/136/887/233/846/621/original/f1ec8150bb4dd8db.jpg

https://creative.ai/system/media_attachments/files/111/136/887/318/893/565/original/5298666f674eb35a.jpg

https://creative.ai/system/media_attachments/files/111/136/887/370/007/594/original/b68c0b357b611656.jpg 
 📝 Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization 🧠🔭

"The proposed TRIBE framework is built upon a tri-net architecture with balanced batchnorm to adapt the source pre-trained model towards a class imbalanced testing data stream with continual domain shift." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/Gorilla-Lab-SCUT/TRIBE
🔗 https://arxiv.org/abs/2309.14949v1 #arxiv

https://creative.ai/system/media_attachments/files/111/136/651/119/031/274/original/80bfbbc43451b437.jpg

https://creative.ai/system/media_attachments/files/111/136/651/169/799/991/original/64bd37e2da57c6d1.jpg

https://creative.ai/system/media_attachments/files/111/136/651/225/658/517/original/6c6f1dd860e3cee0.jpg

https://creative.ai/system/media_attachments/files/111/136/651/281/687/084/original/ce7354550579baaf.jpg 
 📝 Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation 🧠🔭

"Domain-guided Conditional Diffusion Model generates high-fidelity and diversity samples for the target domain, and generated samples help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.14360v1 #arxiv

https://creative.ai/system/media_attachments/files/111/136/474/226/544/070/original/f36b2f3b1df18910.jpg

https://creative.ai/system/media_attachments/files/111/136/474/277/905/830/original/b8998cbee4ae8e2c.jpg

https://creative.ai/system/media_attachments/files/111/136/474/330/244/014/original/d6ca601821e15e1e.jpg

https://creative.ai/system/media_attachments/files/111/136/474/389/800/868/original/4f7dd76724d43aad.jpg 
 📝 GLOBER: Coherent Non-Autoregressive Video Generation via GLOBal Guided Video DecodER 🔭

"Proposes a novel non-autoregressive method GLOBER, which first generates global features to obtain comprehensive global guidance and then synthesizes video frames based on the global features to generate coherent videos." [gal30b+] 🤖 #CV

⚙️ https://github.com/iva-mzsun/GLOBER
🔗 https://arxiv.org/abs/2309.13274v1 #arxiv

https://creative.ai/system/media_attachments/files/111/135/914/953/613/012/original/0afe49260c99b101.jpg

https://creative.ai/system/media_attachments/files/111/135/915/024/812/634/original/2c05c853ef365ba9.jpg

https://creative.ai/system/media_attachments/files/111/135/915/106/152/338/original/fd9fb067cc0d5d64.jpg

https://creative.ai/system/media_attachments/files/111/135/915/189/437/593/original/6d7b0300f81c9615.jpg 
 📝 Order-Preserving Consistency Regularization for Domain Adaptation and Generalization 🔭🧠

"OCR constrains the prediction of two augmented views to be the order-preserving in probability, making the model robust to task-irrelevant transformations such as domain-specific attributes." [gal30b+] 🤖 #CV #LG

⚙️ https://github.com/advboxes/AdvBox
🔗 https://arxiv.org/abs/2309.13258v1 #arxiv

https://creative.ai/system/media_attachments/files/111/135/620/299/026/192/original/4fa8bbf6fede865e.jpg

https://creative.ai/system/media_attachments/files/111/135/620/359/645/377/original/1f1a5b733cd866f5.jpg

https://creative.ai/system/media_attachments/files/111/135/620/412/914/316/original/1a13f418d4ff846a.jpg

https://creative.ai/system/media_attachments/files/111/135/620/477/475/255/original/30de8a284cb7ce4a.jpg 
 📝 RTrack: Accelerating Convergence for Visual Object Tracking via Pseudo-Boxes Exploration 🔭

"The RTrack tracker is based on a new object representation method that uses a set of sample points to generate a pseudo-bounding box to capture the spatial extent information and emphasize local areas." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.13257v1 #arxiv

https://creative.ai/system/media_attachments/files/111/135/384/269/796/078/original/02e0585a8642a175.jpg

https://creative.ai/system/media_attachments/files/111/135/384/326/464/071/original/e4bcc95793745eba.jpg

https://creative.ai/system/media_attachments/files/111/135/384/388/642/316/original/ccce92ff5f50c1a2.jpg

https://creative.ai/system/media_attachments/files/111/135/384/463/790/579/original/ebfac17363fd8310.jpg 
 📝 RBFormer: Improve Adversarial Robustness of Transformer by Robust Bias 🔭

"We enhance the adversarial robustness by increasing the proportion of high-frequency structural robust biases, thus mitigating the susceptibility to robustness issues, and introduce a novel Robust Bias Transformer-based Structure (RBFormer) that shows robust superiority compared to several existing baseline structures." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.13245v1 #arxiv

https://creative.ai/system/media_attachments/files/111/134/794/424/338/136/original/4c534a782ef2c8cf.jpg 
 📝 Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation 🔭

"Proposes a spatial-temporal knowledge-embedded transformer (STKET) that incorporates prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations for video scene graph generation (VidSGG)." [gal30b+] 🤖 #CV

⚙️ https://github.com/HCPLab-SYSU/STKET
🔗 https://arxiv.org/abs/2309.13237v1 #arxiv

https://creative.ai/system/media_attachments/files/111/134/440/445/725/010/original/179c0f206035b7d2.jpg

https://creative.ai/system/media_attachments/files/111/134/440/531/529/514/original/4a26c21ea37e4208.jpg

https://creative.ai/system/media_attachments/files/111/134/440/608/648/723/original/2f4f44061ec34a19.jpg

https://creative.ai/system/media_attachments/files/111/134/440/663/199/031/original/99037a793c69633a.jpg 
 📝 ClusterFormer: Clustering as a Universal Visual Learner 🔭

"Introduces a new vision model, named ClusteRFormer, based on the CLUSTERing paradigm with TransFORMer, a new Transformer architecture that introduces recurrent cross-attention, a mechanism for updating cluster centers that facilitates strong representation learning." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.13196v1 #arxiv

https://creative.ai/system/media_attachments/files/111/134/204/493/959/095/original/18ffc8f7011a9775.jpg

https://creative.ai/system/media_attachments/files/111/134/204/550/389/724/original/a158123c6c23d1fd.jpg 
 📝 Trading-Off Mutual Information on Feature Aggregation for Face Recognition 🔭🧠

"Aggregates the face embeddings from two state-of-the-art (SOTA) face recognition models, ArcFace and AdaFace, using the transformer self-attention mechanism." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.13137v1 #arxiv

https://creative.ai/system/media_attachments/files/111/133/909/597/577/155/original/9075c27ffe60acf0.jpg

https://creative.ai/system/media_attachments/files/111/133/909/653/920/161/original/021daee9e2da03f7.jpg

https://creative.ai/system/media_attachments/files/111/133/909/711/493/631/original/fbe84b81e64c8d50.jpg 
 📝 Zero-Shot Object Counting with Language-Vision Models 🔭

"Uses large language-vision models to generate class prototypes from the input class name and use them to select the patches containing the target objects for counting exemplars, as well as a ranking model to estimate the counting error of each patch." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.13097v1 #arxiv

https://creative.ai/system/media_attachments/files/111/133/437/575/475/461/original/a73c4c9e39a5ad2c.jpg

https://creative.ai/system/media_attachments/files/111/133/437/656/408/921/original/0446b36e40109b9c.jpg

https://creative.ai/system/media_attachments/files/111/133/437/743/481/364/original/4c3c2e4376b6a51b.jpg

https://creative.ai/system/media_attachments/files/111/133/437/804/884/196/original/7d4b694b01ab5559.jpg 
 📝 C$^2$VAE: Gaussian Copula-Based VAE Differing Disentangled From Coupled Representations with Contrastive Posterior 🧠🔭

"Presents a self-supervised variational autoencoder to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.13303v1 #arxiv

https://creative.ai/system/media_attachments/files/111/132/906/859/244/990/original/e980ae06b98ea06d.jpg

https://creative.ai/system/media_attachments/files/111/132/906/923/125/996/original/1af9d7b97045e721.jpg 
 📝 Spatial-Frequency Channels, Shape Bias, and Adversarial Robustness 🧠🔭

"Critical band masking is an established tool that can reveal the frequency-selective filters used for object recognition in humans and neural networks by measuring the sensitivity of categorization performance to noise added at each spatial frequency." [gal30b+] 🤖 #LG #CV

🔗 https://arxiv.org/abs/2309.13190v1 #arxiv

https://creative.ai/system/media_attachments/files/111/132/671/041/979/089/original/7b7a721fa20d5ef9.jpg

https://creative.ai/system/media_attachments/files/111/132/671/112/219/372/original/9e20566b826ee24b.jpg

https://creative.ai/system/media_attachments/files/111/132/671/171/982/444/original/37a0878a02575642.jpg

https://creative.ai/system/media_attachments/files/111/132/671/244/013/200/original/203f1f2a7c6f9c2a.jpg 
 📝 Flow Factorized Representation Learning 🧠🔭

"A novel setup bringing new understandings to both disentanglement and equivariance, achieving higher likelihoods on standard representation learning benchmarks while also demonstrating a degree of robustness and generalizability approaching the ultimate goal of usefully factorized representation learning." [gal30b+] 🤖 #LG #CV

⚙️ https://github.com/KingJamesSong/latent-flow
🔗 https://arxiv.org/abs/2309.13167v1 #arxiv

https://creative.ai/system/media_attachments/files/111/132/494/034/473/207/original/77f186060c61b348.jpg

https://creative.ai/system/media_attachments/files/111/132/494/099/585/179/original/a1a9658acf9003ac.jpg

https://creative.ai/system/media_attachments/files/111/132/494/167/445/031/original/bc3127d8c5c9893a.jpg

https://creative.ai/system/media_attachments/files/111/132/494/219/761/898/original/e4a07270e6ee7625.jpg 
 📝 Detect Every Thing with Few Examples 🔭

"DE-ViT utilizes vision-only backbones and improves general detection ability by transforming multi-classification tasks into binary classification tasks while bypassing per-class inference, and a novel region propagation technique for localization." [gal30b+] 🤖 #CV

⚙️ https://github.com/mlzxy/devit
🔗 https://arxiv.org/abs/2309.12969v1 #arxiv

https://creative.ai/system/media_attachments/files/111/132/119/978/012/272/original/d4c3517e058a88b9.jpg

https://creative.ai/system/media_attachments/files/111/132/120/047/319/266/original/0de24361b7299ae8.jpg

https://creative.ai/system/media_attachments/files/111/132/120/098/697/525/original/f3652df3006b3bfe.jpg

https://creative.ai/system/media_attachments/files/111/132/120/159/687/517/original/7436d4b3e32d2182.jpg 
 📝 Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation 🔭

"BAS learns the whole region of the object by using foreground region guidance and area constraint, which can achieve a more effective localization performance by using activation value to learn more object regions." [gal30b+] 🤖 #CV

⚙️ https://github.com/wpy1999/BAS-Extension
🔗 https://arxiv.org/abs/2309.12943v1 #arxiv

https://creative.ai/system/media_attachments/files/111/131/530/197/390/678/original/89e65c60bf4f3713.jpg

https://creative.ai/system/media_attachments/files/111/131/530/277/921/166/original/eacda4b61de0bcad.jpg

https://creative.ai/system/media_attachments/files/111/131/530/363/750/382/original/3174af4626b6575f.jpg

https://creative.ai/system/media_attachments/files/111/131/530/416/177/334/original/e049f3ec87ad36be.jpg 
 📝 Domain Adaptive Few-Shot Open-Set Learning 🔭

"A novel approach called Domain Adaptive Few-Shot Open Set Recognition (DA-FSOS) and an architecture named DAFOSNET, which learns a shared and discriminative embedding space while creating a pseudo open-space decision boundary." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12814v1 #arxiv

https://creative.ai/system/media_attachments/files/111/131/176/414/369/838/original/9481fbe59323828d.jpg

https://creative.ai/system/media_attachments/files/111/131/176/465/058/584/original/39377633e35d3af2.jpg

https://creative.ai/system/media_attachments/files/111/131/176/520/772/137/original/b9d7aec05e955da5.jpg 
 📝 LMC: Large Model Collaboration with Cross-Assessment for Training-Free Open-Set Object Recognition 🔭

"Learns from multiple large models pre-trained through different paradigms to tackle open-set recognition in a training-free manner, with novel designs to effectively extract knowledge from large models." [gal30b+] 🤖 #CV

⚙️ https://github.com/Harryqu123/LMC}{here
🔗 https://arxiv.org/abs/2309.12780v1 #arxiv

https://creative.ai/system/media_attachments/files/111/130/468/541/097/954/original/8911ec79321584a1.jpg

https://creative.ai/system/media_attachments/files/111/130/468/591/340/863/original/7a9d7a3eba470406.jpg

https://creative.ai/system/media_attachments/files/111/130/468/642/220/753/original/2ee198e268477a1e.jpg

https://creative.ai/system/media_attachments/files/111/130/468/698/289/294/original/bcffd03a8090c6a7.jpg 
 📝 Transformer-Based Image Compression with Variable Image Quality Objectives 🔭

"Uses a single Transformer-based auto-encoder to compress an image with a variable quality objective by adaptively generating prompt tokens for the encoder and/or decoder using a prompt generation network." [gal30b+] 🤖 #CV #MM

🔗 https://arxiv.org/abs/2309.12717v1 #arxiv

https://creative.ai/system/media_attachments/files/111/130/055/506/101/124/original/99b8d737be25240b.jpg

https://creative.ai/system/media_attachments/files/111/130/055/582/319/796/original/854d38e32861a8b1.jpg

https://creative.ai/system/media_attachments/files/111/130/055/639/320/629/original/ff08cef3c6be0c86.jpg

https://creative.ai/system/media_attachments/files/111/130/055/691/461/178/original/d2f271b6343cbed6.jpg 
 📝 Mixed Attention Auto Encoder for Multi-Class Industrial Anomaly Detection 🔭

"Uses spatial attention mechanism to capture global category information and channel attention mechanism to model the feature distributions of multiple classes, and it employs an adaptive noise generator and a multi-scale fusion module for the pre-trained features." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12700v1 #arxiv

https://creative.ai/system/media_attachments/files/111/129/701/643/977/911/original/7e5d238db5d603f2.jpg

https://creative.ai/system/media_attachments/files/111/129/701/694/891/021/original/8a583e360856370d.jpg

https://creative.ai/system/media_attachments/files/111/129/701/741/672/560/original/b02ab02a6f8b5ec2.jpg

https://creative.ai/system/media_attachments/files/111/129/701/796/130/840/original/5b4e07a0ad6a22ad.jpg 
 📝 Exploiting Modality-Specific Features for Multi-Modal Manipulation Detection and Grounding 🔭

"Designs a novel framework that explores modality-specific features while preserving the capability for multi-modal alignment by introducing visual/language pre-trained encoders and dual-branch cross-attention." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12657v1 #arxiv

https://creative.ai/system/media_attachments/files/111/129/406/846/171/904/original/4cb94cb54069b0dd.jpg

https://creative.ai/system/media_attachments/files/111/129/406/899/702/280/original/0b0c3a6d488ef5d6.jpg 
 📝 RHINO: Regularizing the Hash-Based Implicit Neural Representation 🔭

"RHINO connects the input coordinate and the network additionally without modifying the architecture of current hash-based INRs by using an additional continuous analytical function, which ensures a seamless backpropagation of gradients from the network's output back to the input coordinates, thereby enhancing regularization." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12642v1 #arxiv

https://creative.ai/system/media_attachments/files/111/129/053/024/055/650/original/16fd4148dd84ca01.jpg

https://creative.ai/system/media_attachments/files/111/129/053/122/824/271/original/17adefb4c9a2448f.jpg

https://creative.ai/system/media_attachments/files/111/129/053/191/119/674/original/c3b62263b2bd83f7.jpg

https://creative.ai/system/media_attachments/files/111/129/053/296/083/071/original/5b10d03ee8eb19a9.jpg 
 📝 Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects 🔭

"GCANet introduces a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention module." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12641v1 #arxiv

https://creative.ai/system/media_attachments/files/111/128/699/001/058/036/original/6b8f3fdd70acfcb9.jpg

https://creative.ai/system/media_attachments/files/111/128/699/079/306/874/original/907814e9b8b477cf.jpg

https://creative.ai/system/media_attachments/files/111/128/699/149/545/259/original/8a2b914ed38e4226.jpg

https://creative.ai/system/media_attachments/files/111/128/699/202/517/519/original/395c45babede84de.jpg 
 📝 Decision Fusion Network with Perception Fine-Tuning for Defect Classification 🔭

"The decision fusion network (DFNet) is designed to strengthen the decision ability of the network through a decision fusion module (DFM) and a perception fine-tuning module (PFM)." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12630v1 #arxiv

https://creative.ai/system/media_attachments/files/111/128/168/255/235/084/original/720104560f8baeae.jpg

https://creative.ai/system/media_attachments/files/111/128/168/308/702/995/original/7f8c9fd703e2ea77.jpg

https://creative.ai/system/media_attachments/files/111/128/168/372/561/500/original/02006c5c12248810.jpg

https://creative.ai/system/media_attachments/files/111/128/168/420/067/250/original/e11e598565ed2df2.jpg 
 📝 BGF-YOLO: Enhanced YOLOv8 with Multiscale Attentional Feature Fusion for Brain Tumor Detection 🔭

"A novel BGFG-YOLO architecture by incorporating Bi-level Routing Attention (BRA), Generalized feature pyramid networks (GFPN), Forth detecting head, and Generalized-IoU (GIoU) bounding box regression loss into YOLOv8." [gal30b+] 🤖 #CV

⚙️ https://github.com/mkang315/BGFG-YOLO
🔗 https://arxiv.org/abs/2309.12585v1 #arxiv

https://creative.ai/system/media_attachments/files/111/127/755/303/714/003/original/14de242ea0b0b545.jpg 
 📝 Triple-View Knowledge Distillation for Semi-Supervised Semantic Segmentation 🔭

"TriKD utilizes the knowledge distillation skill to learn the complementary semantics among these encoders with two different architectures for semi-supervised semantic segmentation by using a few labeled images and a large amount of unlabeled images." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.12557v1 #arxiv

https://creative.ai/system/media_attachments/files/111/127/401/388/026/932/original/76847f773d831b5f.jpg

https://creative.ai/system/media_attachments/files/111/127/401/447/703/859/original/b46081764a8fc94c.jpg

https://creative.ai/system/media_attachments/files/111/127/401/523/940/257/original/e2133caa7b2b7d97.jpg 
 📝 TextCLIP: Text-Guided Face Image Generation and Manipulation Without Adversarial Training 🔭

"TextCLIP is a unified framework for text-guided image generation and manipulation without complex multi-stage generation and adversarial training, which can also be extended to image-guided generation and manipulation." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.11923v1 #arxiv

https://creative.ai/system/media_attachments/files/111/119/349/286/338/371/original/28ffcbe578565c9b.jpg

https://creative.ai/system/media_attachments/files/111/119/349/370/580/562/original/c9f1633431c9487a.jpg

https://creative.ai/system/media_attachments/files/111/119/349/446/601/791/original/453f276d8204a2ed.jpg

https://creative.ai/system/media_attachments/files/111/119/349/500/468/963/original/766ce10f87f87845.jpg 
 📝 AV-MaskEnhancer: Enhancing Video Representations Through Audio-Visual Masked Autoencoder 🔭

"AV-MaskEnhancer is an approach that combines the visual and audio information to learn more effective high-quality video representation by mask autoencoders (MAE)." [gal30b+] 🤖 #CV #MM

🔗 https://arxiv.org/abs/2309.08738v1 #arxiv

https://creative.ai/system/media_attachments/files/111/094/509/041/616/169/original/f2428ee83eab6b0c.jpg

https://creative.ai/system/media_attachments/files/111/094/509/102/233/015/original/dcbeefd06a4e553d.jpg

https://creative.ai/system/media_attachments/files/111/094/509/176/727/635/original/92f714df9928d373.jpg 
 📝 OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects 🔭

"Introduces Open-Illumination, an open dataset for material, lighting, and geometry reconstruction from single-view images and provide a comprehensive evaluation of several state-of-the-art inverse rendering methods on it." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07921v1 #arxiv

https://creative.ai/system/media_attachments/files/111/077/153/089/624/352/original/c772a4a6a91e1d82.jpg

https://creative.ai/system/media_attachments/files/111/077/153/161/979/824/original/35d11049a8a84ea4.jpg

https://creative.ai/system/media_attachments/files/111/077/153/228/909/893/original/849f71dca69e5c43.jpg

https://creative.ai/system/media_attachments/files/111/077/153/311/446/424/original/3f3527f88814de41.jpg 
 📝 ALWOD: Active Learning for Weakly-Supervised Object Detection 🔭

"ALWOD is an active learning framework that utilizes a small labeled seed, a large weakly tagged set, and strategically selects images for human annotation using a new auxiliary image generator and a new acquisition function." [gal30b+] 🤖 #CV

⚙️ https://github.com/seqam-lab/ALWOD
🔗 https://arxiv.org/abs/2309.07914v1 #arxiv

https://creative.ai/system/media_attachments/files/111/076/681/232/340/008/original/4c8bfdc461ec6b8d.jpg

https://creative.ai/system/media_attachments/files/111/076/681/310/822/756/original/ad97878f979d5bf2.jpg

https://creative.ai/system/media_attachments/files/111/076/681/398/091/140/original/f32350d7e207caad.jpg

https://creative.ai/system/media_attachments/files/111/076/681/470/401/861/original/0577d2a937e2ee94.jpg 
 📝 A Novel Local-Global Feature Fusion Framework for Body-Weight Exercise Recognition with Pressure Mapping Sensors 🔭🧠

"Uses image processing techniques and the YOLO object detection to localize pressure profiles from different body parts and consider physical constraints, which is a one-step further from the existing studies using deep neural networks mainly focusing on global feature extraction." [gal30b+] 🤖 #CV #LG

🔗 https://arxiv.org/abs/2309.07888v1 #arxiv

https://creative.ai/system/media_attachments/files/111/075/914/336/113/743/original/a4af56f614874ca8.jpg

https://creative.ai/system/media_attachments/files/111/075/914/389/684/905/original/f50dd579e8e59b20.jpg 
 📝 Gradient Constrained Sharpness-Aware Prompt Learning for Vision-Language Models 🔭

"The trade-off performance correlates to both loss value and loss sharpness during optimization, while each of them are indispensable, but existing methods cannot always maintain high consistence with them." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07866v1 #arxiv

https://creative.ai/system/media_attachments/files/111/075/619/506/150/699/original/b44f75d38e8bdeab.jpg

https://creative.ai/system/media_attachments/files/111/075/619/565/149/685/original/2e7d3cc01d7ee6e0.jpg

https://creative.ai/system/media_attachments/files/111/075/619/624/395/576/original/14a5ef877419abe1.jpg

https://creative.ai/system/media_attachments/files/111/075/619/698/699/613/original/764d88294698d6cb.jpg 
 📝 Decomposition of Linear Tensor Transformations 🔭

"A generalization of the Singular Value Decomposition (SVD) for matrices to tensors in higher dimension and it is based on the properties of the tensor space and of the Frobenius norm on tensor." [gal30b+] 🤖 #CV #NA

🔗 https://arxiv.org/abs/2309.07819v1 #arxiv

https://creative.ai/system/media_attachments/files/111/075/324/698/029/716/original/de8be919272449db.jpg

https://creative.ai/system/media_attachments/files/111/075/324/766/307/312/original/fe047eed099eedb1.jpg

https://creative.ai/system/media_attachments/files/111/075/324/830/872/680/original/57db6211c9f18b86.jpg 
 📝 For a More Comprehensive Evaluation of 6DoF Object Pose Tracking 🔭

"The proposed benchmark uses a multi-view multi-object global pose refinement method to achieve sub-pixel and sub-millimeter alignment error for the YCBV dataset, and then introduces some improved evaluation methods to address the limitations of previous studies." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07796v1 #arxiv

https://creative.ai/system/media_attachments/files/111/075/088/627/598/911/original/71990383ec79ecca.jpg

https://creative.ai/system/media_attachments/files/111/075/088/687/070/802/original/b3a0453902e53125.jpg

https://creative.ai/system/media_attachments/files/111/075/088/749/837/937/original/04a6bbeb54f061d3.jpg

https://creative.ai/system/media_attachments/files/111/075/088/810/987/488/original/d5f0e1c0abd97296.jpg 
 📝 Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement 🔭

"By introducing a hybrid neural representation that separates low-frequency and high-frequency scene regions, a simple yet effective image sharpening and denoising technique, and a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07640v1 #arxiv

https://creative.ai/system/media_attachments/files/111/074/262/909/877/858/original/0cbfd3d387626f0f.jpg

https://creative.ai/system/media_attachments/files/111/074/262/994/189/084/original/beda3e22f2049425.jpg

https://creative.ai/system/media_attachments/files/111/074/263/078/765/741/original/e9e6b3c75f902bfa.jpg

https://creative.ai/system/media_attachments/files/111/074/263/167/855/565/original/ecf909a43a0dedfa.jpg 
 📝 Road Disease Detection Based on Latent Domain Background Feature Separation and Suppression 🔭

"Proposes a new LDBFSS(Latent Domain Background Feature Separation and Suppression) network which could perform background information separation and suppression without domain supervision and contrastive enhancement of object features." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07616v1 #arxiv

https://creative.ai/system/media_attachments/files/111/073/732/048/896/032/original/e5759b1601a4808d.jpg

https://creative.ai/system/media_attachments/files/111/073/732/115/516/762/original/987700c103197b80.jpg

https://creative.ai/system/media_attachments/files/111/073/732/170/972/683/original/d64d294667d30d02.jpg

https://creative.ai/system/media_attachments/files/111/073/732/232/418/476/original/4b2f6b3bd331148f.jpg 
 📝 Universality of Underlying Mechanism for Successful Deep Learning 🔭

"Each filter identifies small clusters of possible output labels, with additional noise selected as labels out of the clusters, and this feature is progressively sharpened with the layers, resulting in an enhanced signal-to-noise ratio and higher accuracy." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07537v1 #arxiv

https://creative.ai/system/media_attachments/files/111/073/319/327/872/147/original/230f36fb36c1e92b.jpg

https://creative.ai/system/media_attachments/files/111/073/319/458/241/250/original/12e0c37dd194fe4f.jpg

https://creative.ai/system/media_attachments/files/111/073/319/538/652/993/original/1979d02b1e6f79c4.jpg 
 📝 A Multi-Scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing 🔭

"By designing a new learnable blur kernel proximal mapping module and deep proximal mapping modules for image domain, and combining the alternating iterations of shrinkage thresholds, MGSTNet focuses on learning deep geometric prior features to enhance image restoration." [gal30b+] 🤖 #CV #IT

🔗 https://arxiv.org/abs/2309.07524v1 #arxiv

https://creative.ai/system/media_attachments/files/111/073/083/356/929/444/original/180c2cbbccb3ac69.jpg

https://creative.ai/system/media_attachments/files/111/073/083/450/749/573/original/c3950d18c79051f1.jpg

https://creative.ai/system/media_attachments/files/111/073/083/509/842/181/original/db8b8809fbafd2d7.jpg

https://creative.ai/system/media_attachments/files/111/073/083/571/290/705/original/2fc714b1dbbba803.jpg 
 📝 DePT: Decoupled Prompt Tuning 🔭

"The DePT framework decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, which preserves task-shared knowledge in the original feature space for achieving better generalization on new tasks." [gal30b+] 🤖 #CV

⚙️ https://github.com/Koorye/DePT
🔗 https://arxiv.org/abs/2309.07439v1 #arxiv

https://creative.ai/system/media_attachments/files/111/072/257/721/298/971/original/741f0a5ef9edc025.jpg

https://creative.ai/system/media_attachments/files/111/072/257/790/367/760/original/db006adecbf1da51.jpg

https://creative.ai/system/media_attachments/files/111/072/257/870/811/180/original/00b612bd0e9de35b.jpg

https://creative.ai/system/media_attachments/files/111/072/257/943/741/234/original/eb20e45840dd8fc3.jpg 
 📝 Physical Invisible Backdoor Based on Camera Imaging 🔭

"By using the camera imaging process and leveraging the CFA interpolation algorithm and the camera fingerprint feature, the proposed method can implement a physical invisible backdoor attack without changing nature pixels of the images." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.07428v1 #arxiv

https://creative.ai/system/media_attachments/files/111/071/845/016/249/555/original/0f581ac9a76ae8e7.jpg

https://creative.ai/system/media_attachments/files/111/071/845/100/453/176/original/210e0fd9d96ffd0c.jpg

https://creative.ai/system/media_attachments/files/111/071/845/163/616/272/original/78742d691dcbbb84.jpg

https://creative.ai/system/media_attachments/files/111/071/845/224/782/382/original/32bdf85e2445e1af.jpg