For what people mainly want CTV for it would be sufficient to commit to the number of _hashed_ outputs, and then only hash those outputs. Secondly, only commit to the input index (typically zero). That would allow additional inputs and change outputs to be added as needed. And to avoid N² hashing, commit to the outputs with a hashed linked list or tree. Thing is, this additional complexity doesn't save that many bytes compared to just using keyless anchors and CPFP. Particularly in the UTXO tree case where we're getting a path down a tree mined.
It can though. In a tree, n/2 anchor outputs and n/2 inputs which spend them. But if you're all paying (the same) agent to neighbor-boost, it's one tx for all of them...