If it is environment-independent, what is the theoretical maximum number of characters in a Python string?
Contents hide
Answers:
Method 1
Method 2
Method 3
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
With a 64-bit Python installation, and (say) 64 GB of memory, a Python string of around 63 GB should be quite feasible, if not maximally fast. If you can upgrade your memory beyond 64 GB, your maximum feasible strings should get proportionally longer. (I don’t recommend relying on virtual memory to extend that by much, or your runtimes will get simply ridiculous;-).
With a typical 32-bit Python installation, the total memory you can use in your application is limited to something like 2 or 3 GB (depending on OS and configuration), so the longest strings you can use will be much smaller than in 64-bit installations with high amounts of RAM.
Method 2
I ran this code on an EC2 instance.
def create1k(): s = "" for i in range(1024): s += '*' return sdef create1m(): s = "" x = create1k() for i in range(1024): s += x return sdef create1g(): s = "" x = create1m() for i in range(1024): s += x return sprint("begin")s = ""x = create1g()for i in range(1024): s += x print(str(i) + "g ok") print(str(len(s)) + ' bytes')
and this is the output
[<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e48187d6c991978196a48d94c9d5d4c9d4c9d4c9d5d2dc">[emailprotected]</a> ~]$ time python hog.py begin0g ok1073741824 bytes1g ok2147483648 bytes2g ok3221225472 bytes3g ok4294967296 bytes4g ok5368709120 bytes5g ok6442450944 bytes6g ok7516192768 bytes7g ok8589934592 bytes8g ok9663676416 bytes9g ok10737418240 bytes10g ok11811160064 bytes11g ok12884901888 bytes12g ok13958643712 bytes13g ok15032385536 bytes14g ok16106127360 bytes15g ok17179869184 bytes16g ok18253611008 bytes17g ok19327352832 bytes18g ok20401094656 bytes19g ok21474836480 bytes20g ok22548578304 bytes21g ok23622320128 bytes22g ok24696061952 bytes23g ok25769803776 bytes24g ok26843545600 bytes25g ok27917287424 bytes26g ok28991029248 bytes27g ok30064771072 bytes28g ok31138512896 bytes29g ok32212254720 bytes30g ok33285996544 bytes31g ok34359738368 bytes32g ok35433480192 bytes33g ok36507222016 bytes34g ok37580963840 bytes35g ok38654705664 bytes36g ok39728447488 bytes37g ok40802189312 bytes38g ok41875931136 bytes39g ok42949672960 bytes40g ok44023414784 bytes41g ok45097156608 bytes42g ok46170898432 bytes43g ok47244640256 bytes44g ok48318382080 bytes45g ok49392123904 bytes46g ok50465865728 bytes47g ok51539607552 bytes48g ok52613349376 bytes49g ok53687091200 bytes50g ok54760833024 bytes51g ok55834574848 bytes52g ok56908316672 bytes53g ok57982058496 bytes54g ok59055800320 bytes55g ok60129542144 bytes56g ok61203283968 bytes57g ok62277025792 bytes58g ok63350767616 bytes59g ok64424509440 bytes60g ok65498251264 bytes61g ok66571993088 bytes62g ok67645734912 bytes63g ok68719476736 bytes64g ok69793218560 bytes65g ok70866960384 bytes66g ok71940702208 bytes67g ok73014444032 bytes68g ok74088185856 bytes69g ok75161927680 bytes70g ok76235669504 bytes71g ok77309411328 bytes72g ok78383153152 bytes73g ok79456894976 bytes74g ok80530636800 bytes75g ok81604378624 bytes76g ok82678120448 bytes77g ok83751862272 bytes78g ok84825604096 bytes79g ok85899345920 bytes80g ok86973087744 bytes81g ok88046829568 bytes82g ok89120571392 bytes83g ok90194313216 bytes84g ok91268055040 bytes85g ok92341796864 bytes86g ok93415538688 bytes87g ok94489280512 bytes88g ok95563022336 bytes89g ok96636764160 bytes90g ok97710505984 bytes91g ok98784247808 bytes92g ok99857989632 bytes93g ok100931731456 bytes94g ok102005473280 bytes95g ok103079215104 bytes96g ok104152956928 bytes97g ok105226698752 bytes98g ok106300440576 bytes99g ok107374182400 bytes100g ok108447924224 bytes101g ok109521666048 bytes102g ok110595407872 bytes103g ok111669149696 bytes104g ok112742891520 bytes105g ok113816633344 bytes106g ok114890375168 bytes107g ok115964116992 bytes108g ok117037858816 bytes109g ok118111600640 bytes110g ok119185342464 bytes111g ok120259084288 bytes112g ok121332826112 bytes113g ok122406567936 bytes114g ok123480309760 bytes115g ok124554051584 bytes116g ok125627793408 bytesTraceback (most recent call last): File "hog.py", line 25, in <module> s += xMemoryErrorreal 1m10.509suser 0m16.184ssys 0m54.320s
memory error after 116GB.
[<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bdd8de8f90c8ced8cffdd4cd908c8d908d908d908c8b85">[emailprotected]</a> ~]$ python --versionPython 2.7.12[<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="345157061941475146745d44190504190419041905020c">[emailprotected]</a> ~]$ free -m total used free shared buffers cachedMem: 122953 430 122522 0 11 113-/+ buffers/cache: 304 122648Swap: 0 0 0
Tested on EC2 r3.4xlarge instance running 64-bit Amazon Linux AMI 2016.09
Short answer would be: if you have over 100GB of RAM, one Python string can use up that much memory.
Method 3
9 quintillion characters on a 64 bit system on CPython 3.10.
That’s only if your string is made up of only ASCII characters. The max length can be smaller depending on what characters the string contains due to the way CPython implements strings:
- 9,223,372,036,854,775,758 characters if your string only has ASCII characters (
U+00
toU+7F
) or - 9,223,372,036,854,775,734 characters if your string only has ASCII characters and characters from the Latin-1 Supplement Unicode block (
U+80
toU+FF
) or - 4,611,686,018,427,387,866 characters if your string only contains characters in the Basic Multilingual Plane (for example if it contains Cyrillic letters but no emojis, i.e.
U+0100
toU+FFFF
) or - 2,305,843,009,213,693,932 characters if your string might contain at least one emoji (more formally, if it can contain a character outside the Basic Multilingual Plane, i.e.
U+10000
and above)
On a 32 bit system it’s around 2 billion or 500 million characters. If you don’t know whether you’re using a 64 bit or a 32 bit system or what that means, you’re probably using a 64 bit system.
Python strings are length-prefixed, so their length is limited by the size of the integer holding their length and the amount of memory available on your system. Since PEP 353, Python uses Py_ssize_t
as the data type for storing container length. Py_ssize_t
is defined as the same size as the compiler’s size_t
but signed. On a 64 bit system, size_t
is 64. 1 bit for the sign means you have 63 bits for the actual quantity, meaning CPython strings cannot be larger than 2⁶³ – 1 bytes or around 9 million TB (8EiB). This much RAM would cost you around 40 billion dollars if we multiply today’s price of around $4/GB by 9 billion. On 32-bit systems (which are rare these days), it’s 2³¹ – 1 bytes or 2GiB.
CPython will use 1, 2 or 4 bytes per character, depending on how many bytes it needs to encode the “longest” character in your string. So for example if you have a string like 'aaaaaaaaa'
, the a
‘s each take 1 byte to store, but if you have a string like 'aaaaaaaaa😀'
then all the a
‘s will now take 4 bytes each. 1-byte-per-character strings will also use either 48 or 72 bytes of metadata and 2 or 4 bytes-per-character strings will take 72 bytes for metadata. Each string also has an extra character at the end for a terminating null, so the empty string is actually 49 bytes.
When you allocate a string with PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
(see docs) in CPython, it performs this check:
/* Ensure we won't overflow the size. */ // [...] if (size > ((PY_SSIZE_T_MAX - struct_size) / char_size - 1)) return PyErr_NoMemory();
Where PY_SSIZE_T_MAX
is
/* Largest positive value of type Py_ssize_t. */#define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))
which is casting -1
into a size_t
(a type defined by the C compiler, a 64 bit unsigned integer on a 64 bit system) which causes it to wrap around to its largest possible value, 2⁶⁴-1 and then right shifts it by 1 (so that the sign bit is 0
) which causes it to become 2⁶³-1 and casts that into a Py_ssize_t
type.
struct_size
is just a bit of overhead for the str
object’s metadata, either 48 or 72, it’s set earlier in the function
struct_size = sizeof(PyCompactUnicodeObject); if (maxchar < 128) { // [...] struct_size = sizeof(PyASCIIObject); }
and char_size
is either 1, 2 or 4 and so we have
>>> ((2**63 - 1) - 72) // 4 - 12305843009213693932
There’s of course the possibility that Python strings are practically limited by some other part of Python that I don’t know about, but you should be able to at least allocate a new string of that size, assuming you can get get your hands on 9 exabytes of RAM.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
FAQs
What is the maximum length of a string in Python? ›
Maximum length of a string is platform dependent and depends upon address space and/or RAM. The maxsize constant defined in sys module returns 263-1 on 64 bit system.
What is the maximum string size limit? ›While an individual quoted string cannot be longer than 2048 bytes, a string literal of roughly 65535 bytes can be constructed by concatenating strings.
What is the length of string in Python? ›To calculate the length of a string in Python, you can use the built-in len() method. It takes a string as a parameter and returns an integer as the length of that string. For example, len(“educative”) will return 9 because there are 9 characters in “educative”.
What is the maximum length of a character data type Python? ›The default length is 1, and the maximum length is 65000 octets (bytes). VARCHAR is a variable-length character data type. The default length is 80, and the maximum length is 65000 octets.
What is the maximum length set in Python? ›Method 4: Python Max Length of String in Set
To find the maximum length of a string in a given set, use the max(my_set, key=len) function to obtain the string with the maximum length and then pass this max string into the len() function to obtain the number of characters of the max string.
Python min() function is used to get the min alphabetical character from the string. The Python max() function returns the max alphabetical character from the string. It can also use to find the most considerable item between two or more parameters.