我需要一种强大而简单的方法来从简单的字符串中删除非法路径和文件字符。我使用了下面的代码,但它似乎没有做任何事情,我错过了什么?
using System;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
illegal = illegal.Trim(Path.GetInvalidFileNameChars());
illegal = illegal.Trim(Path.GetInvalidPathChars());
Console.WriteLine(illegal);
Console.ReadLine();
}
}
}
GetInvalidFileNameChars()
将从文件夹路径中删除 : \ 等内容。
Path.GetInvalidPathChars()
似乎没有去除 *
或 ?
试试这样的东西;
string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
foreach (char c in invalid)
{
illegal = illegal.Replace(c.ToString(), "");
}
但我必须同意这些评论,我可能会尝试处理非法路径的来源,而不是试图将非法路径改造成合法但可能是无意的路径。
编辑:或者使用正则表达式的可能“更好”的解决方案。
string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");
尽管如此,还是要问一个问题,为什么您首先要这样做。
原始问题要求“删除非法字符”:
public string RemoveInvalidChars(string filename)
{
return string.Concat(filename.Split(Path.GetInvalidFileNameChars()));
}
您可能想要替换它们:
public string ReplaceInvalidChars(string filename)
{
return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));
}
This answer was on another thread by Ceres,我真的很喜欢它简洁明了。
我使用 Linq 来清理文件名。您也可以轻松扩展它以检查有效路径。
private static string CleanFileName(string fileName)
{
return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}
更新
一些评论表明此方法不适用于他们,因此我提供了指向 DotNetFiddle 片段的链接,以便您可以验证该方法。
https://dotnetfiddle.net/nw1SWY
var invalid = new HashSet<char>(Path.GetInvalidPathChars()); return new string(originalString.Where(s => !invalid.Contains(s)).ToArray())
。性能可能不是很好,但这可能并不重要。
您可以像这样使用 Linq 删除非法字符:
var invalidChars = Path.GetInvalidFileNameChars();
var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();
编辑这是评论中提到的所需编辑的外观:
var invalidChars = Path.GetInvalidFileNameChars();
string invalidCharsRemoved = new string(stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray());
对于文件名:
var cleanFileName = string.Join("", fileName.Split(Path.GetInvalidFileNameChars()));
对于完整路径:
var cleanPath = string.Join("", path.Split(Path.GetInvalidPathChars()));
请注意,如果您打算将此用作安全功能,则更可靠的方法是扩展所有路径,然后验证用户提供的路径确实是用户应该有权访问的目录的子目录。
这些都是很好的解决方案,但它们都依赖于 Path.GetInvalidFileNameChars
,它可能不像您想象的那么可靠。请注意 Path.GetInvalidFileNameChars
上的 MSDN 文档中的以下注释:
不保证从此方法返回的数组包含文件和目录名称中无效的完整字符集。完整的无效字符集可能因文件系统而异。例如,在基于 Windows 的桌面平台上,无效路径字符可能包括 ASCII/Unicode 字符 1 到 31,以及引号 (")、小于 (<)、大于 (>)、竖线 (|)、退格 ( \b)、空 (\0) 和制表符 (\t)。
Path.GetInvalidPathChars
方法并没有更好。它包含完全相同的注释。
从用户输入中删除非法字符的最佳方法是使用 Regex 类替换非法字符,在代码中创建方法,或者使用正则表达式控件在客户端进行验证。
public string RemoveSpecialCharacters(string str)
{
return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}
或者
<asp:RegularExpressionValidator ID="regxFolderName"
runat="server"
ErrorMessage="Enter folder name with a-z A-Z0-9_"
ControlToValidate="txtFolderName"
Display="Dynamic"
ValidationExpression="^[a-zA-Z0-9_]*$"
ForeColor="Red">
"[^a-zA-Z0-9_.-]+"
对于初学者,Trim only removes characters from the beginning or end of the string。其次,您应该评估您是否真的想删除令人反感的字符,或者快速失败并让用户知道他们的文件名无效。我的选择是后者,但我的回答至少应该向您展示如何以正确和错误的方式做事:
StackOverflow question showing how to check if a given string is a valid file name。请注意,您可以使用此问题中的正则表达式来删除带有正则表达式替换的字符(如果您确实需要这样做)。
我使用正则表达式来实现这一点。首先,我动态构建正则表达式。
string regex = string.Format(
"[{0}]",
Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
然后我只需调用 removeInvalidChars.Replace 来进行查找和替换。这显然也可以扩展到覆盖路径字符。
new Regex(String.Format("^(CON|PRN|AUX|NUL|CLOCK\$|COM[1-9]|LPT[1-9])(?=\..|$)|(^(\.+|\s+)$)|((\.+|\s+)$)|([{0}])", Regex.Escape(new String(Path.GetInvalidFileNameChars()))), RegexOptions.Compiled | RegexOptions.Singleline | RegexOptions.CultureInvariant);
@
,即 String.Format(@"^CON| ... )"
我绝对喜欢 Jeff Yates 的想法。如果您稍微修改它,它将完美地工作:
string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
改进只是为了逃避自动生成的正则表达式。
这是一个对 .NET 3 及更高版本有帮助的代码片段。
using System.IO;
using System.Text.RegularExpressions;
public static class PathValidation
{
private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);
private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);
private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);
private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);
public static bool ValidatePath(string path)
{
return pathValidator.IsMatch(path);
}
public static bool ValidateFileName(string fileName)
{
return fileNameValidator.IsMatch(fileName);
}
public static string CleanPath(string path)
{
return pathCleaner.Replace(path, "");
}
public static string CleanFileName(string fileName)
{
return fileNameCleaner.Replace(fileName, "");
}
}
上面的大多数解决方案都结合了错误的路径和文件名的非法字符(即使两个调用当前返回相同的字符集)。我会首先将路径+文件名拆分为路径和文件名,然后将适当的设置应用于它们中的任何一个,然后再次将两者结合起来。
wvd_vegt
如果您删除或用单个字符替换无效字符,则可能会发生冲突:
<abc -> abc
>abc -> abc
这是避免这种情况的简单方法:
public static string ReplaceInvalidFileNameChars(string s)
{
char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
foreach (char c in invalidFileNameChars)
s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
return s;
}
结果:
<abc -> [1]abc
>abc -> [2]abc
抛出异常。
if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
{
throw new ArgumentException();
}
我写这个怪物是为了好玩,它可以让你往返:
public static class FileUtility
{
private const char PrefixChar = '%';
private static readonly int MaxLength;
private static readonly Dictionary<char,char[]> Illegals;
static FileUtility()
{
List<char> illegal = new List<char> { PrefixChar };
illegal.AddRange(Path.GetInvalidFileNameChars());
MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
}
public static string FilenameEncode(string s)
{
var builder = new StringBuilder();
char[] replacement;
using (var reader = new StringReader(s))
{
while (true)
{
int read = reader.Read();
if (read == -1)
break;
char c = (char)read;
if(Illegals.TryGetValue(c,out replacement))
{
builder.Append(PrefixChar);
builder.Append(replacement);
}
else
{
builder.Append(c);
}
}
}
return builder.ToString();
}
public static string FilenameDecode(string s)
{
var builder = new StringBuilder();
char[] buffer = new char[MaxLength];
using (var reader = new StringReader(s))
{
while (true)
{
int read = reader.Read();
if (read == -1)
break;
char c = (char)read;
if (c == PrefixChar)
{
reader.Read(buffer, 0, MaxLength);
var encoded =(char) ParseCharArray(buffer);
builder.Append(encoded);
}
else
{
builder.Append(c);
}
}
}
return builder.ToString();
}
public static int ParseCharArray(char[] buffer)
{
int result = 0;
foreach (char t in buffer)
{
int digit = t - '0';
if ((digit < 0) || (digit > 9))
{
throw new ArgumentException("Input string was not in the correct format");
}
result *= 10;
result += digit;
}
return result;
}
}
这似乎是 O(n) 并且不会在字符串上花费太多内存:
private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());
public static string RemoveInvalidFileNameChars(string name)
{
if (!name.Any(c => invalidFileNameChars.Contains(c))) {
return name;
}
return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
}
文件名不能包含来自 Path.GetInvalidPathChars()
、+
和 #
符号的字符以及其他特定名称。我们将所有检查合并为一个类:
public static class FileNameExtensions
{
private static readonly Lazy<string[]> InvalidFileNameChars =
new Lazy<string[]>(() => Path.GetInvalidPathChars()
.Union(Path.GetInvalidFileNameChars()
.Union(new[] { '+', '#' })).Select(c => c.ToString(CultureInfo.InvariantCulture)).ToArray());
private static readonly HashSet<string> ProhibitedNames = new HashSet<string>
{
@"aux",
@"con",
@"clock$",
@"nul",
@"prn",
@"com1",
@"com2",
@"com3",
@"com4",
@"com5",
@"com6",
@"com7",
@"com8",
@"com9",
@"lpt1",
@"lpt2",
@"lpt3",
@"lpt4",
@"lpt5",
@"lpt6",
@"lpt7",
@"lpt8",
@"lpt9"
};
public static bool IsValidFileName(string fileName)
{
return !string.IsNullOrWhiteSpace(fileName)
&& fileName.All(o => !IsInvalidFileNameChar(o))
&& !IsProhibitedName(fileName);
}
public static bool IsProhibitedName(string fileName)
{
return ProhibitedNames.Contains(fileName.ToLower(CultureInfo.InvariantCulture));
}
private static string ReplaceInvalidFileNameSymbols([CanBeNull] this string value, string replacementValue)
{
if (value == null)
{
return null;
}
return InvalidFileNameChars.Value.Aggregate(new StringBuilder(value),
(sb, currentChar) => sb.Replace(currentChar, replacementValue)).ToString();
}
public static bool IsInvalidFileNameChar(char value)
{
return InvalidFileNameChars.Value.Contains(value.ToString(CultureInfo.InvariantCulture));
}
public static string GetValidFileName([NotNull] this string value)
{
return GetValidFileName(value, @"_");
}
public static string GetValidFileName([NotNull] this string value, string replacementValue)
{
if (string.IsNullOrWhiteSpace(value))
{
throw new ArgumentException(@"value should be non empty", nameof(value));
}
if (IsProhibitedName(value))
{
return (string.IsNullOrWhiteSpace(replacementValue) ? @"_" : replacementValue) + value;
}
return ReplaceInvalidFileNameSymbols(value, replacementValue);
}
public static string GetFileNameError(string fileName)
{
if (string.IsNullOrWhiteSpace(fileName))
{
return CommonResources.SelectReportNameError;
}
if (IsProhibitedName(fileName))
{
return CommonResources.FileNameIsProhibited;
}
var invalidChars = fileName.Where(IsInvalidFileNameChar).Distinct().ToArray();
if(invalidChars.Length > 0)
{
return string.Format(CultureInfo.CurrentCulture,
invalidChars.Length == 1 ? CommonResources.InvalidCharacter : CommonResources.InvalidCharacters,
StringExtensions.JoinQuoted(@",", @"'", invalidChars.Select(c => c.ToString(CultureInfo.CurrentCulture))));
}
return string.Empty;
}
}
方法 GetValidFileName
将所有不正确的数据替换为 _
。
我认为使用正则表达式验证并指定允许的字符要容易得多,而不是尝试检查所有坏字符。请参阅以下链接:http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html
另外,搜索“正则表达式编辑器”,它们有很大帮助。有一些甚至可以为您输出 c# 中的代码。
扫描这里的答案,他们都**似乎涉及使用无效文件名字符的 char 数组。
诚然,这可能是微优化 - 但为了任何可能希望检查大量值是否为有效文件名的人的利益,值得注意的是,构建无效字符的哈希集将带来明显更好的性能。
过去,我非常惊讶(震惊)哈希集(或字典)在迭代列表上的速度有多快。对于字符串,这是一个非常低的数字(大约 5-7 项来自内存)。对于大多数其他简单数据(对象引用、数字等),神奇的交叉似乎是 20 项左右。
Path.InvalidFileNameChars“列表”中有 40 个无效字符。今天做了一个搜索,在 StackOverflow 上有一个很好的基准,显示哈希集将花费 40 个项目的数组/列表的一半多一点时间:https://stackoverflow.com/a/10762995/949129
这是我用于清理路径的辅助类。我现在忘记了为什么我有花哨的替换选项,但它是一个可爱的奖励。
额外的奖励方法“IsValidLocalPath”也是:)
(**那些不使用正则表达式的)
public static class PathExtensions
{
private static HashSet<char> _invalidFilenameChars;
private static HashSet<char> InvalidFilenameChars
{
get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}
/// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the
/// specified replacement character.</summary>
/// <param name="text">Text to make into a valid filename. The same string is returned if
/// it is valid already.</param>
/// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
/// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
/// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
{
StringBuilder sb = new StringBuilder(text.Length);
HashSet<char> invalids = InvalidFilenameChars;
bool changed = false;
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (invalids.Contains(c))
{
changed = true;
char repl = replacement ?? '\0';
if (fancyReplacements)
{
if (c == '"') repl = '”'; // U+201D right double quotation mark
else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
else if (c == '/') repl = '⁄'; // U+2044 fraction slash
}
if (repl != '\0')
sb.Append(repl);
}
else
sb.Append(c);
}
if (sb.Length == 0)
return "_";
return changed ? sb.ToString() : text;
}
/// <summary>
/// Returns TRUE if the specified path is a valid, local filesystem path.
/// </summary>
/// <param name="pathString"></param>
/// <returns></returns>
public static bool IsValidLocalPath(this string pathString)
{
// From solution at https://stackoverflow.com/a/11636052/949129
Uri pathUri;
Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
return isValidUri && pathUri != null && pathUri.IsLoopback;
}
}
public static class StringExtensions
{
public static string RemoveUnnecessary(this string source)
{
string result = string.Empty;
string regex = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex reg = new Regex(string.Format("[{0}]", Regex.Escape(regex)));
result = reg.Replace(source, "");
return result;
}
}
你可以清楚地使用方法。
用于从 Windows 文件命名的任何非法字符中清除字符串的一个衬垫:
public static string CleanIllegalName(string p_testName) => new Regex(string.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars())))).Replace(p_testName, "");
这是我的小贡献。一种在同一字符串中替换而不创建新字符串或字符串构建器的方法。它快速、易于理解,是本文中所有提及的一个很好的替代方案。
private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}
private static string ReplaceInvalidChars(string fileName, string newValue)
{
char newChar = newValue[0];
char[] chars = fileName.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if (InvalidCharsHash.Contains(c))
chars[i] = newChar;
}
return new string(chars);
}
你可以这样称呼它:
string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string legal = ReplaceInvalidChars(illegal);
并返回:
_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
值得注意的是,此方法将始终将无效字符替换为给定值,但不会删除它们。如果你想删除无效字符,这个替代方法可以解决问题:
private static string RemoveInvalidChars(string fileName, string newValue)
{
char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
bool remove = newChar == char.MinValue;
char[] chars = fileName.ToCharArray();
char[] newChars = new char[chars.Length];
int i2 = 0;
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if (InvalidCharsHash.Contains(c))
{
if (!remove)
newChars[i2++] = newChar;
}
else
newChars[i2++] = c;
}
return new string(newChars, 0, i2);
}
基准
如果性能是您所追求的,我会使用本文中的大多数方法执行定时测试运行。其中一些方法不会用给定的字符替换,因为 OP 要求清理字符串。我添加了用给定字符替换的测试,如果您的预期场景只需要删除不需要的字符,则其他一些替换为空字符。用于此基准测试的代码在最后,因此您可以运行自己的测试。
注意:本文中都提出了方法 Test1
和 Test2
。
第一次运行
replacing with '_', 1000000 iterations
结果:
============Test1===============
Elapsed=00:00:01.6665595
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test2===============
Elapsed=00:00:01.7526835
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test3===============
Elapsed=00:00:05.2306227
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test4===============
Elapsed=00:00:14.8203696
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test5===============
Elapsed=00:00:01.8273760
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test6===============
Elapsed=00:00:05.4249985
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test7===============
Elapsed=00:00:07.5653833
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test8===============
Elapsed=00:12:23.1410106
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test9===============
Elapsed=00:00:02.1016708
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test10===============
Elapsed=00:00:05.0987225
Result=M ary had a little lamb.
============Test11===============
Elapsed=00:00:06.8004289
Result=M ary had a little lamb.
第二轮
removing invalid chars, 1000000 iterations
注意:Test1 不会删除,只会替换。
结果:
============Test1===============
Elapsed=00:00:01.6945352
Result= M a ry h ad a li tt le la mb.
============Test2===============
Elapsed=00:00:01.4798049
Result=M ary had a little lamb.
============Test3===============
Elapsed=00:00:04.0415688
Result=M ary had a little lamb.
============Test4===============
Elapsed=00:00:14.3397960
Result=M ary had a little lamb.
============Test5===============
Elapsed=00:00:01.6782505
Result=M ary had a little lamb.
============Test6===============
Elapsed=00:00:04.9251707
Result=M ary had a little lamb.
============Test7===============
Elapsed=00:00:07.9562379
Result=M ary had a little lamb.
============Test8===============
Elapsed=00:12:16.2918943
Result=M ary had a little lamb.
============Test9===============
Elapsed=00:00:02.0770277
Result=M ary had a little lamb.
============Test10===============
Elapsed=00:00:05.2721232
Result=M ary had a little lamb.
============Test11===============
Elapsed=00:00:05.2802903
Result=M ary had a little lamb.
基准结果
方法 Test1
、Test2
和 Test5
是最快的。方法Test8
是最慢的。
代码
这是基准测试的完整代码:
private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}
private static string _invalidCharsValue;
private static string InvalidCharsValue
{
get { return _invalidCharsValue ?? (_invalidCharsValue = new string(Path.GetInvalidFileNameChars())); }
}
private static char[] _invalidChars;
private static char[] InvalidChars
{
get { return _invalidChars ?? (_invalidChars = Path.GetInvalidFileNameChars()); }
}
static void Main(string[] args)
{
string testPath = "\"M <>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
int max = 1000000;
string newValue = "";
TimeBenchmark(max, Test1, testPath, newValue);
TimeBenchmark(max, Test2, testPath, newValue);
TimeBenchmark(max, Test3, testPath, newValue);
TimeBenchmark(max, Test4, testPath, newValue);
TimeBenchmark(max, Test5, testPath, newValue);
TimeBenchmark(max, Test6, testPath, newValue);
TimeBenchmark(max, Test7, testPath, newValue);
TimeBenchmark(max, Test8, testPath, newValue);
TimeBenchmark(max, Test9, testPath, newValue);
TimeBenchmark(max, Test10, testPath, newValue);
TimeBenchmark(max, Test11, testPath, newValue);
Console.Read();
}
private static void TimeBenchmark(int maxLoop, Func<string, string, string> func, string testString, string newValue)
{
var sw = new Stopwatch();
sw.Start();
string result = string.Empty;
for (int i = 0; i < maxLoop; i++)
result = func?.Invoke(testString, newValue);
sw.Stop();
Console.WriteLine($"============{func.Method.Name}===============");
Console.WriteLine("Elapsed={0}", sw.Elapsed);
Console.WriteLine("Result={0}", result);
Console.WriteLine("");
}
private static string Test1(string fileName, string newValue)
{
char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
char[] chars = fileName.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if (InvalidCharsHash.Contains(chars[i]))
chars[i] = newChar;
}
return new string(chars);
}
private static string Test2(string fileName, string newValue)
{
char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
bool remove = newChar == char.MinValue;
char[] chars = fileName.ToCharArray();
char[] newChars = new char[chars.Length];
int i2 = 0;
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if (InvalidCharsHash.Contains(c))
{
if (!remove)
newChars[i2++] = newChar;
}
else
newChars[i2++] = c;
}
return new string(newChars, 0, i2);
}
private static string Test3(string filename, string newValue)
{
foreach (char c in InvalidCharsValue)
{
filename = filename.Replace(c.ToString(), newValue);
}
return filename;
}
private static string Test4(string filename, string newValue)
{
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(InvalidCharsValue)));
filename = r.Replace(filename, newValue);
return filename;
}
private static string Test5(string filename, string newValue)
{
return string.Join(newValue, filename.Split(InvalidChars));
}
private static string Test6(string fileName, string newValue)
{
return InvalidChars.Aggregate(fileName, (current, c) => current.Replace(c.ToString(), newValue));
}
private static string Test7(string fileName, string newValue)
{
string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
return Regex.Replace(fileName, regex, newValue, RegexOptions.Compiled);
}
private static string Test8(string fileName, string newValue)
{
string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
return removeInvalidChars.Replace(fileName, newValue);
}
private static string Test9(string fileName, string newValue)
{
StringBuilder sb = new StringBuilder(fileName.Length);
bool changed = false;
for (int i = 0; i < fileName.Length; i++)
{
char c = fileName[i];
if (InvalidCharsHash.Contains(c))
{
changed = true;
sb.Append(newValue);
}
else
sb.Append(c);
}
if (sb.Length == 0)
return newValue;
return changed ? sb.ToString() : fileName;
}
private static string Test10(string fileName, string newValue)
{
if (!fileName.Any(c => InvalidChars.Contains(c)))
{
return fileName;
}
return new string(fileName.Where(c => !InvalidChars.Contains(c)).ToArray());
}
private static string Test11(string fileName, string newValue)
{
string invalidCharsRemoved = new string(fileName
.Where(x => !InvalidChars.Contains(x))
.ToArray());
return invalidCharsRemoved;
}
我推出了自己的方法,这似乎比这里发布的其他方法要快得多(尤其是正则表达式,它是如此的慢),但我没有测试所有发布的方法。
https://dotnetfiddle.net/haIXiY
第一种方法(我的)和第二种方法(也是我的,但旧的)也对反斜杠进行了额外的检查,所以基准并不完美,但无论如何它只是给你一个想法。
我的笔记本电脑上的结果(100 000 次迭代):
StringHelper.RemoveInvalidCharacters 1: 451 ms
StringHelper.RemoveInvalidCharacters 2: 7139 ms
StringHelper.RemoveInvalidCharacters 3: 2447 ms
StringHelper.RemoveInvalidCharacters 4: 3733 ms
StringHelper.RemoveInvalidCharacters 5: 11689 ms (==> Regex!)
最快的方法:
public static string RemoveInvalidCharacters(string content, char replace = '_', bool doNotReplaceBackslashes = false)
{
if (string.IsNullOrEmpty(content))
return content;
var idx = content.IndexOfAny(InvalidCharacters);
if (idx >= 0)
{
var sb = new StringBuilder(content);
while (idx >= 0)
{
if (sb[idx] != '\\' || !doNotReplaceBackslashes)
sb[idx] = replace;
idx = content.IndexOfAny(InvalidCharacters, idx+1);
}
return sb.ToString();
}
return content;
}
在 InvalidCharacters
属性期间,方法不会“按原样”编译,请检查小提琴以获取完整代码
public static bool IsValidFilename(string testName)
{
return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
}
这将是你想要的,并避免碰撞
static string SanitiseFilename(string key)
{
var invalidChars = Path.GetInvalidFileNameChars();
var sb = new StringBuilder();
foreach (var c in key)
{
var invalidCharIndex = -1;
for (var i = 0; i < invalidChars.Length; i++)
{
if (c == invalidChars[i])
{
invalidCharIndex = i;
}
}
if (invalidCharIndex > -1)
{
sb.Append("_").Append(invalidCharIndex);
continue;
}
if (c == '_')
{
sb.Append("__");
continue;
}
sb.Append(c);
}
return sb.ToString();
}
我认为这个问题还没有完全回答......答案只描述了干净的文件名或路径......而不是两者。这是我的解决方案:
private static string CleanPath(string path)
{
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
List<string> split = path.Split('\\').ToList();
string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"\"));
returnValue = returnValue.TrimEnd('\\');
return returnValue;
}
我创建了一个扩展方法,它结合了几个建议:
在哈希集中保留非法字符过滤掉 ascii 127 以下的字符。由于 Path.GetInvalidFileNameChars 不包括所有可能使用 0 到 255 的 ascii 代码的无效字符。请参阅此处和 MSDN 定义替换字符的可能性
资源:
public static class FileNameCorrector
{
private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());
public static string ToValidFileName(this string name, char replacement = '\0')
{
var builder = new StringBuilder();
foreach (var cur in name)
{
if (cur > 31 && cur < 128 && !invalid.Contains(cur))
{
builder.Append(cur);
}
else if (replacement != '\0')
{
builder.Append(replacement);
}
}
return builder.ToString();
}
}
这是一个用替换字符替换文件名中所有非法字符的函数:
public static string ReplaceIllegalFileChars(string FileNameWithoutPath, char ReplacementChar)
{
const string IllegalFileChars = "*?/\\:<>|\"";
StringBuilder sb = new StringBuilder(FileNameWithoutPath.Length);
char c;
for (int i = 0; i < FileNameWithoutPath.Length; i++)
{
c = FileNameWithoutPath[i];
if (IllegalFileChars.IndexOf(c) >= 0)
{
c = ReplacementChar;
}
sb.Append(c);
}
return (sb.ToString());
}
例如下划线可以用作替换字符:
NewFileName = ReplaceIllegalFileChars(FileName, '_');
如果您必须在项目中的许多地方使用该方法,您还可以制作一个扩展方法并在项目中的任何位置调用它以获得字符串。
public static class StringExtension
{
public static string RemoveInvalidChars(this string originalString)
{
string finalString=string.Empty;
if (!string.IsNullOrEmpty(originalString))
{
return string.Concat(originalString.Split(Path.GetInvalidFileNameChars()));
}
return finalString;
}
}
您可以将上述扩展方法称为:
string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string afterIllegalChars = illegal.RemoveInvalidChars();
string
有意义?
或者你可以做
[YOUR STRING].Replace('\\', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();
不定期副业成功案例分享
GetInvalidPathChars()
可能包含GetInvalidFileNameChars()
不会包含的字符是不合逻辑的。您没有对“过早”优化采取正确性。您只是在使用错误的代码。