ChatGPT解决这个技术问题 Extra ChatGPT

FileSystemWatcher vs polling to watch for file changes

I need to setup an application that watches for files being created in a directory, both locally or on a network drive.

Would the FileSystemWatcher or polling on a timer would be the best option. I have used both methods in the past, but not extensively.

What issues (performance, reliability etc.) are there with either method?

FileSystemWatcher is a leaky abstraction and can not be relied upon for anything but the most basic cases. See here: stackoverflow.com/a/22768610/129130
Want to add a link for reference to this answer by Raymond Chen (Microsoft expert) on the topic of FileSystemWatcher's reliability. And his blog: The Old New Thing (search for FileSystemWatcher for example).

J
Jason Jackson

I have seen the file system watcher fail in production and test environments. I now consider it a convenience, but I do not consider it reliable. My pattern has been to watch for changes with the files system watcher, but poll occasionally to catch missing file changes.

Edit: If you have a UI, you can also give your user the ability to "refresh" for changes instead of polling. I would combine this with a file system watcher.


I've seen if fall down, too. The solution we've used is to wrap our own class around, where the wrapper class ALSO uses a timer to check on occasion that the watcher is still going.
We do something similar - once we've processed the file passed into the FileCreated event, we do a manual check for any other new files before returning. This seem to mitigate any problems occurring with lots of files arriving at once.
I believe we tested it in XP and Server 2003 on a local directory and a file share, and had XP machines in the field. We had problems with both local dir and file share. One of the probable causes we came up with was the copy/creation of a lot of files in a short amount of time in the directory.
Its not very constructive nor profesional to just state "i've seen a ghost one day". It seems that people down the thread, mentioning the msdn document about non-page-outable buffer overruns could explain your problems. Have you tried using Brent's approach ?
I just bought a gas sensor on Amazon and it amazed me how many people said it didn't work, when they obviously didn't calibrate it correctly or didn't even know about calibration... FileSystemWatcher has known limitations with high traffic from its buffer size. Almost guarantied that's the reason for it "failing". This is readily explained in documentation and there are work arounds that provide very reliable operation (as posted below). This isn't a good answer to just say "errr, something didn't work that one time, not sure why... nobody should rely on it".
佚名

The biggest problem I have had is missing files when the buffer gets full. Easy as pie to fix--just increase the buffer. Remember that it contains the file names and events, so increase it to the expected amount of files (trial and error). It does use memory that cannot be paged out, so it could force other processes to page if memory gets low.

Here is the MSDN article on buffer : FileSystemWatcher..::.InternalBufferSize Property

Per MSDN:

Increasing buffer size is expensive, as it comes from non paged memory that cannot be swapped out to disk, so keep the buffer as small as possible. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties to filter out unwanted change notifications.

We use 16MB due to a large batch expected at one time. Works fine and never misses a file.

We also read all the files before beginning to process even one...get the file names safely cached away (in our case, into a database table) then process them.

For file locking issues I spawn a process which waits around for the file to be unlocked waiting one second, then two, then four, et cetera. We never poll. This has been in production without error for about two years.


Buffer overflow? Oh, you mean stack overflow.
As of .NET 3.5: "You can set the buffer to 4 KB or larger, but it must not exceed 64 KB"
How are you using 16MB if the max internal buffer for FileSystemWatcher is 64KB?
@ Jarvis, a buffer is a temperary storage location configured to hold information as it is transmitted until it can be processed, this usually means a FIFO or Queue as you want to deal requests in the order they arrive however in some processes like recursion in programs a FILO or Stack structure is whats used, In this case we are definitely referring to the event queue buffer and not the programs call stack buffer
petermeinl.wordpress.com/2015/05/18/tamed-filesystemwatcher This post shares robust wrappers around the standard FileSystemWatcher (FSW) fixing problems commonly encountered when using it to monitor the file system in real-world applications.
b
bluish

The FileSystemWatcher may also miss changes during busy times, if the number of queued changes overflows the buffer provided. This is not a limitation of the .NET class per se, but of the underlying Win32 infrastructure. In our experience, the best way to minimize this problem is to dequeue the notifications as quickly as possible and deal with them on another thread.

As mentioned by @ChillTemp above, the watcher may not work on non-Windows shares. For example, it will not work at all on mounted Novell drives.

I agree that a good compromise is to do an occasional poll to pick up any missed changes.


The filesystem watcher can start raising a lot of events in quick succession. If you cannot execute your event handler at least as quickly as they are being fired, eventually the handler will start dropping events on the floor and you will miss things.
c
chilltemp

Also note that file system watcher is not reliable on file shares. Particularly if the file share is hosted on a non-windows server. FSW should not be used for anything critical. Or should be used with an occasional poll to verify that it hasn't missed anything.


Has Microsoft acknowledged that it isn't reliable on non-windows file shares? We certainly are experiencing this first hand since switching from a Windows share to a Linux based SMB share.
Not that I'm aware of. And I'm sure that it would simply be a blame game between the different vendors.
We've experienced problems with the file system watcher on mapped drives. If the map disconnects and then reconnects the file watcher no longer raises changes. Easily resolved but still a strike against the file system watcher IMHO.
b
bluish

Personally, I've used the FileSystemWatcher on a production system, and it has worked fine. In the past 6 months, it hasn't had a single hiccup running 24x7. It is monitoring a single local folder (which is shared). We have a relatively small number of file operations that it has to handle (10 events fired per day). It's not something I've ever had to worry about. I'd use it again if I had to remake the decision.


b
bluish

I currently use the FileSystemWatcher on an XML file being updated on average every 100 milliseconds.

I have found that as long as the FileSystemWatcher is properly configured you should never have problems with local files.

I have no experience on remote file watching and non-Windows shares.

I would consider polling the file to be redundant and not worth the overhead unless you inherently distrust the FileSystemWatcher or have directly experienced the limitations everyone else here has listed (non-Windows shares, and remote file watching).


b
bluish

I have run into trouble using FileSystemWatcher on network shares. If you're in a pure Windows environment, it might not be an issue, but I was watching an NFS share and since NFS is stateless, there was never a notification when the file I was watching changed.


I've hit the same problem, but it was unexpected to me as the FileSystemWatcher was on the same windows server which shares the folder using NFS. the fact of share a folder with NFS causes the filesystemwatcher to not see files created using the share remotely (i.e. from a Linux which map the share) while if i write a file on the very same folder under monitoring, the filesystemwatcher is triggered. it looks like NFS server writes files using a lower layer and the api layer which triggers fthe filesystemwatcher are not engaged, anyone have more info?
@Mosè I'm also facing the same issue. Have you got any solution?
not really a solution to the problem but as workaround i've ended up in (sadly) comparing difference in the filesystem structure at regular times and generating related events myself, with the correct choice of data structure it not so slow, just a little pressure on the filesystem for the listing
b
bluish

I'd go with polling.

Network issues cause the FileSystemWatcher to be unreliable (even when overloading the error event).


T
Treb

I had some big problems with FSW on network drives: Deleting a file always threw the error event, never the deleted event. I did not find a solution, so I now avoid the FSW and use polling.

Creation events on the other hand worked fine, so if you only need to watch for file creation, you can go for the FSW.

Also, I had no problems at all on local folders, no matter if shared or not.


s
spludlow

Returning from the event method as quickly as possible, using another thread, solved the problem for me:

private void Watcher_Created(object sender, FileSystemEventArgs e)
{
    Task.Run(() => MySubmit(e.FullPath));
}

T
ThunderGr

Using both FSW and polling is a waste of time and resources, in my opinion, and I am surprised that experienced developers suggest it. If you need to use polling to check for any "FSW misses", then you can, naturally, discard FSW altogether and use only polling.

I am, currently, trying to decide whether I will use FSW or polling for a project I develop. Reading the answers, it is obvious that there are cases where FSW covers the needs perfectly, while other times, you need polling. Unfortunately, no answer has actually dealt with the performance difference(if there is any), only with the "reliability" issues. Is there anyone that can answer that part of the question?

EDIT : nmclean's point for the validity of using both FSW and polling(you can read the discussion in the comments, if you are interested) appears to be a very rational explanation why there can be situations that using both an FSW and polling is efficient. Thank you for shedding light on that for me(and anyone else having the same opinion), nmclean.


What if you want to respond to file changes as quickly as possible? If you poll once per minute for example, you might have as much as 1 minute delay between a file changing and your application picking up on the change. The FSW event would presumably be triggered much before that. So by using both you are handling the events with as little delay as you can, but also picking up the missed events if there are any.
@rom99 Exactly my point. If the FSW is unreliable in cases you need quick response, there is no point using it, since you will have cases where there will be no quick response, thus, your application will be unreliable. Polling in shorter intervals, in a thread, would be what you need to do. By doing both, means you have a tolerance in response times that the polling covers, so, why not use only polling?
@ThunderGr "thus, your application will be unreliable." - In many cases, speed is not a prerequisite for reliability. The work must get done, but it can wait a while. If we combine slow, reliable polling with fast, unreliable FSW, we get an application that is always reliable and sometimes fast, which is better than reliable and never fast. We can remove FSW and achieve the same maximum response time by doing constant polling, but this is at the expense of the responsiveness of the rest of the application, so should only be done if immediate response is absolutely required.
Now why is the above a poor argument? Because, although we still need disk access, we need it less. Similarly, you can poll less. Just because we still check all the files doesn't mean the workload is the same. Your statement, "polling is expensive on CPU time with FSW or not," is false. By offloading the "immediacy" concern to FSW, we can change the polling to an idle, low-priority task, such that the busyness of the application at any given time is reduced drastically while still providing the "treat" of immediacy. You simply cannot achieve the same balance with polling alone.
@nmclean Thank you for taking the time and energy to clarify this in the way you did. When you put it that way, it surely makes much more sense. Just like there are times that a cache is not suitable to your specific problem, so the FSW(when it proves unreliable) may not be suitable. It turns out that you were right all along. I am sorry it took so much time for me to get it.
b
bluish

Working solution for working with create event instead of change

Even for copy, cut, paste, move.

class Program
{        

        static void Main(string[] args)
        {
            string SourceFolderPath = "D:\\SourcePath";
            string DestinationFolderPath = "D:\\DestinationPath";
            FileSystemWatcher FileSystemWatcher = new FileSystemWatcher();
            FileSystemWatcher.Path = SourceFolderPath;
            FileSystemWatcher.IncludeSubdirectories = false;
            FileSystemWatcher.NotifyFilter = NotifyFilters.FileName;   // ON FILE NAME FILTER       
            FileSystemWatcher.Filter = "*.txt";         
             FileSystemWatcher.Created +=FileSystemWatcher_Created; // TRIGGERED ONLY FOR FILE GOT CREATED  BY COPY, CUT PASTE, MOVE  
            FileSystemWatcher.EnableRaisingEvents = true;

            Console.Read();
        }     

        static void FileSystemWatcher_Created(object sender, FileSystemEventArgs e)
        {           
                string SourceFolderPath = "D:\\SourcePath";
                string DestinationFolderPath = "D:\\DestinationPath";

                try
                {
                    // DO SOMETING LIKE MOVE, COPY, ETC
                    File.Copy(e.FullPath, DestinationFolderPath + @"\" + e.Name);
                }
                catch
                {
                }          
        }
}

Solution for this file watcher while file attribute change event using static storage

class Program
{
    static string IsSameFile = string.Empty;  // USE STATIC FOR TRACKING

    static void Main(string[] args)
    {
         string SourceFolderPath = "D:\\SourcePath";
        string DestinationFolderPath = "D:\\DestinationPath";
        FileSystemWatcher FileSystemWatcher = new FileSystemWatcher();
        FileSystemWatcher.Path = SourceFolderPath;
        FileSystemWatcher.IncludeSubdirectories = false;
        FileSystemWatcher.NotifyFilter = NotifyFilters.LastWrite;          
        FileSystemWatcher.Filter = "*.txt";         
        FileSystemWatcher.Changed += FileSystemWatcher_Changed;
        FileSystemWatcher.EnableRaisingEvents = true;

        Console.Read();
    }     

    static void FileSystemWatcher_Changed(object sender, FileSystemEventArgs e)
    {
        if (e.Name == IsSameFile)  //SKIPS ON MULTIPLE TRIGGERS
        {
            return;
        }
        else
        {
            string SourceFolderPath = "D:\\SourcePath";
            string DestinationFolderPath = "D:\\DestinationPath";

            try
            {
                // DO SOMETING LIKE MOVE, COPY, ETC
                File.Copy(e.FullPath, DestinationFolderPath + @"\" + e.Name);
            }
            catch
            {
            }
        }
        IsSameFile = e.Name;
    }
}

This is a workaround solution for this problem of multiple triggering event.


u
user2819502

I would say use polling, especially in a TDD scenario, as it is much easier to mock/stub the presence of files or otherwise when the polling event is triggered than to rely on the more "uncontrolled" fsw event. + to that having worked on a number of apps which were plagued by fsw errors.