ChatGPT解决这个技术问题 Extra ChatGPT

Length of string in bash

How do you get the length of a string stored in a variable and assign that to another variable?

myvar="some string"
echo ${#myvar}  
# 11

How do you set another variable to the output 11?


f
fedorqui

To get the length of a string stored in a variable, say:

myvar="some string"
size=${#myvar} 

To confirm it was properly saved, echo it:

$ echo "$size"
11

With UTF-8 stings, you could have a string length and a bytes length. see my answer
You can also use it directly in other parameter expansions - for example in this test I check that $rulename starts with the $RULE_PREFIX prefix: [ "${rulename:0:${#RULE_PREFIX}}" == "$RULE_PREFIX" ]
Could you please explain a bit the expressions of #myvar and {#myvar}?
@lerneradams see Bash reference manual →3.5.3 Shell Parameter Expansion on ${#parameter}: The length in characters of the expanded value of parameter is substituted.
F
F. Hauri - Give Up GitHub

UTF-8 string length

In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

Length of an argument, working sample

Argument work same as regular variables

showStrLen() {
    local bytlen sreal oLang=$LANG oLcAll=$LC_ALL
    LANG=C LC_ALL=C
    bytlen=${#1}
    printf -v sreal %q "$1"
    LANG=$oLang LC_ALL=$oLcAll
    printf "String '%s' is %d bytes, but %d chars len: %s.\n" "$1" $bytlen ${#1} "$sreal"
}

will work as

showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'

Useful printf correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    printf " - %-14s is %2d char length\n" "'$string'"  ${#string}
done

 - 'Généralités' is 11 char length
 - 'Language'     is  8 char length
 - 'Théorème'   is  8 char length
 - 'Février'     is  7 char length
 - 'Left: ←'    is  7 char length
 - 'Yin Yang ☯' is 10 char length

Not really pretty output!

For this, here is a little function:

strU8DiffLen() {
    local charlen=${#1} LANG=C LC_ALL=C
    return $(( ${#1} - charlen ))
}

or written in one line:

strU8DiffLen() { local chLen=${#1} LANG=C LC_ALL=C;return $((${#1}-chLen));}

Then now:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strU8DiffLen "$string"
    printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \
        "'$string'" ${#string} $((${#string}+$?))
  done 

 - 'Généralités'  is 11 chars length, but uses 14 bytes
 - 'Language'     is  8 chars length, but uses  8 bytes
 - 'Théorème'     is  8 chars length, but uses 10 bytes
 - 'Février'      is  7 chars length, but uses  8 bytes
 - 'Left: ←'      is  7 chars length, but uses  9 bytes
 - 'Yin Yang ☯'   is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.


You may also need to set LC_ALL=C and perhaps others.
@F.Hauri But, it none the less follows that on some systems your solution will not work, because it leaves LC_ALL alone. It might work fine on default installs of Debian and it's derivatives, but on others (like Arch Linux) it will fail to give the correct byte length of the string.
thanks for taking something simple and convoluting it :)
@thistleknot I'm sorry, 對不起 Sometime simple is just an idea.
@F8ER In order to prevent forks. For sample: Trying to replace return by echo, adding OFF=$(strU8DiffLen....) and replacing ? by OFF in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!)
d
dmatej

I wanted the simplest case, finally this is a result:

echo -n 'Tell me the length of this sentence.' | wc -m;
36

sorry mate :( This is bash... the cursed hammer that sees everything as a nail, particularly your thumb. 'Tell me the length of this sentence.' contains 36 characters. echo '' | wc -m => 1. You'd need to use -n: echo -n '' | wc -m => 0... in which case it's a good solution :)
Thanks for the correction! Manual page says: -n do not output the trailing newline
a
admirabilis

You can use:

MYSTRING="abc123"
MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c)

wc -c or wc --bytes for byte counts = Unicode characters are counted with 2, 3 or more bytes.

wc -m or wc --chars for character counts = Unicode characters are counted single until they use more bytes.


Seriously? a pipe, a subshell and an external command for something that trivial?
this handles something like mylen=$(printf "%s" "$HOME/.ssh" | wc -c) whereas the accepted solution fails and you need to myvar=$HOME/.ssh first.
This isn’t any better than ${#var}. You still need LC_ALL / LANG set to an UTF-8 locale, otherwise -m will return byte count.
g
gniourf_gniourf

In response to the post starting:

If you want to use this with command line or function arguments...

with the code:

size=${#1}

There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:

if [ -z "$1" ]; then
    #zero length argument 
else
    #non-zero length
fi

See GNU and wooledge for a more complete list of Bash conditional expressions.


Z
Zane

If you want to use this with command line or function arguments, make sure you use size=${#1} instead of size=${#$1}. The second one may be more instinctual but is incorrect syntax.


Part of the problem with "you can't do <invalid syntax>" is that, that syntax being invalid, it's unclear what a reader should interpret it to mean. size=${#1} is certainly valid.
Well, that's unexpected. I didn't know that #1 was a substitute for $1 in this case.
It isn't. # isn't replacing the $ -- the $ outside the braces is still the expansion operator. The # is the length operator, as always.
I've fixed this answer since it is a useful tip but not an exception to the rule - it follows the rule exactly, as pointed out by @CharlesDuffy
t
thistleknot

Using your example provided

#KISS (Keep it simple stupid)
size=${#myvar}
echo $size

@Angel The question was about setting a variable to the output of the length command, and this question answers that.
M
Mukesh Shakya

Here is couple of ways to calculate length of variable :

echo ${#VAR}
echo -n $VAR | wc -m
echo -n $VAR | wc -c
printf $VAR | wc -m
expr length $VAR
expr $VAR : '.*'

and to set the result in another variable just assign above command with back quote into another variable as following:

otherVar=`echo -n $VAR | wc -m`   
echo $otherVar

http://techopsbook.blogspot.in/2017/09/how-to-find-length-of-string-variable.html


T
Troublemaker-DV

I know that the Q and A's are old enough, but today I faced this task for first time. Usually I used the ${#var} combination, but it fails with unicode: most text I process with the bash is in Cyrillic... Based on @atesin's answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:

$ cat draw_border.sh
#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
local BPAR="$1"
local BPLEN=`echo $BPAR|wc -m`
local OUTLINE=\|\ "$1"\ \|
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(($BPLEN+1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER
echo $OUTLINE
echo $OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"

And what this sample produces:

$ draw_border.sh
+-------------+
| Généralités |
+-------------+
+----------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+----------------------------------+
+--------------+
| pure ENGLISH |
+--------------+

First example (in French?) was taken from someone's example above. Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.

I used echo $BPAR|wc -m instead of printf ... in order to not rely on if the printf is buillt-in or not.

Above I saw talks about trailing newline and -n parameter for echo. I did not used it, thus I add only one to the $BPLEN. Should I use -n, I must add 2.

To explain the difference between wc -m and wc -c, see the same script with only one minor change: -m was replaced with -c

$ draw_border.sh
+----------------+
| Généralités |
+----------------+
+---------------------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+---------------------------------------------+
+--------------+
| pure ENGLISH |
+--------------+

Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message. Hope, it will save some one some time :-)

p.s. Russian text says "here is one more"

p.p.s. Working "two-liner"

#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER"\n"\|\ "$1"\ \|"\n"$OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"

In order to not clutter the code with repetitive OUTBORDER's drawing, I put the forming of OUTBORDER into separate command


a
ahuemmer

Maybe just use:

echo $myvar | wc -c