Copilot Vision has existed for quite some time. I tested it a few months back when it only worked in the browser, and now it’s available for everyone via the Windows 11 Copilot app outside the United States (except EU regions). My last encounter with the tool wasn’t very convincing, but this Windows implementation takes it a step further.
When I tried Vision in the Edge browser, it didn’t work outside of it. So, it could only access the currently open tab and perform some actions on it. The biggest difference between the browser and desktop app implementation is the ability to select the window.
Yes, you’re no longer confined to a single browser tab and can use it on any open window. This opens doors to everything that cannot work inside the browser. It could be an app window with stats, a shell window with commands, or a game window, although I didn’t try that one.
I launched the app, clicked on the Vision icon, and it showed a menu to select any open Window. I started with an article about building a media server, and the results were kinda similar to before. The conversation with the AI was free-flowing, and it immediately stopped when I asked it to. I asked which OS the author uses for the project, and it was listed a little lower than the current screen. So, Vision couldn’t answer it.
It cannot access anything that’s not a part of the currently selected window, which means even if it’s a webpage, it can only see the portion that you see. It cannot scroll down and find anything else mentioned on a web page. It cannot click any button, but can guide you to it with a highlighting animation of a big, black arrow. So, AI interaction with the button is not possible, and you must click it.
Diving a little deeper
Last time, Vision couldn’t search the web for more information. But that’s not the case anymore. I opened a page on Windows Latest and asked for the details about the author, our EIC. It only said the name in the first attempt, but then I asked for his designation at the company. Vision replied with a no and asked my permission to search the web for more details.
After that, it correctly listed the designation of our EIC and even described more details about him. It obviously picked the data from his author page and did a little rephrasing on it, but the response was acceptable.
My next action was to present something more challenging. I used a screenshot of a shell command script result I used on DietPi. The response mentioned what the commands did, so Vision just reiterated them to me.
The next thing I did was to show just the commands and then ask about them.
Vision was pretty accurate with it and even described what each parameter did in the commands. It makes me believe that it must have been its knowledge base because it didn’t use the web search for more information.
To recheck it, I used another set of commands that I hadn’t shown before. So, I created a list of Docker commands and asked it.
It described what each command does, but stopped after the fourth one. I had to command it a few times to carry on and explain the rest. It was more or less accurate, but I’m not sure whether it’s fetching data from the web or producing it itself.
So, this was an overview of Copilot Vision on Windows 11 PC. If you are comfortable with Copilot data policies, give it a try. You don’t need to do anything out of the box because it’s baked into the app.