Here’s everything about SyntaxError: invalid character in identifier in Python.
You’ll learn:
- The meaning of the error SyntaxError: invalid character in identifier
- How to solve the error SyntaxError: invalid character in identifier
- Lots more
So if you want to understand this error in Python and how to solve it, then you’re in the right place.
Let’s get started!
- IndentationError: Unexpected Unindent in Python
- How to Solve ImportError: Attempted Relative Import With No Known Parent Package (Python)
- How to Solve ‘Tuple’ Object Does Not Support Item Assignment (Python)
- 3 Ways to Solve Series Objects Are Mutable and Cannot be Hashed (Python)
- 9 Examples of Unexpected Character After Line Continuation Character (Python)
Understand SyntaxError: Invalid Character in Identifier in Python
The error SyntaxError: invalid character in identifier occurs when invalid characters somehow appear in the code. Following is how such a symbol can appear in the code:
- Copying the code from the site such as stackoverflow.com
- Copying from a PDF file such as one generated by Latex
- Typing text in national encoding or not in US English encoding
Problematic characters can be arithmetic signs, parentheses, various non-printable characters, quotes, colons, and more.
You can find non-printable characters using the repr() function or special text editors like Vim. Also, you can determine the real codes of other characters using the ord() function.
However, you should copy the program text through the buffer as little as possible.
This habit will not only help to avoid this error but will also improve your skills in programming and typing. In most cases, retyping will be faster than looking for the problematic character in other ways.
Let’s dive right in:
First, What Is an Identifier in Python?
The identifier in Python is any name of an entity, including the name of a function, variable, class, method, and so on.
PEP8 recommends using only ASCII identifiers in the standard library. However, PEP3131 allowed the use of Unicode characters in identifiers to support national alphabets.
The decision is rather controversial, as PEP3131 itself writes about. Also, it recommends not using national alphabets anywhere other than in the authors’ names.
Nevertheless, you can use such variable names, and it will not cause errors:
переменная = 5
變量 = 10
ตัวแปร = 15
print(переменная + 變量 + ตัวแปร)
30
Don’t Blindly Copy and Paste Python Code
Most often, the error SyntaxError: invalid character in identifier occurs when code is copied from some source on the network.
Along with the correct characters, you can copy formatting characters or other non-printable service characters.
This, by the way, is one of the reasons why you should never copy-paste the code if you are looking for a solution to your question somewhere on the Internet. It is better to retype it yourself from the source.
For novice programmers, it is better to understand the code fully and rewrite it from memory without the source with understanding.
Zero-Width Space Examples
One of the most problematic characters to spot is zero-width space. Consider the code below:
def bubble(lst):
n = len(lst)
for i in range(n):
for j in range(0, n-i-1):
if lst[j] > lst[j+1]:
lst[j], lst[j+1] = lst[j+1], lst[j]
print(bubble([4, 2, 1, 5, 3]))
File "<ipython-input-18-b9007e792fb3>", line 1
def bubble(lst):
^
SyntaxError: invalid character in identifier
The error pointer ^ points to the next character after the word bubble, which means the error is most likely in this word. In this case, the simplest solution would be to retype this piece of code on the keyboard.
You can also notice non-printable characters if you copy your text into a string variable and call the repr() function with that text variable as an argument:
st = '''def bubble(lst):
n = len(lst)
for i in range(n):
for j in range(0, n-i-1):
if lst[j] > lst[j+1]:
lst[j], lst[j+1] = lst[j+1], lst[j]
print(bubble([4, 2, 1, 5, 3]))'''
repr(st)
'def bub\\u200bble(lst): \\n n = len(lst) \\n for i in range(n):\\n for j in range(0, n-i-1):\\n if lst[j] > lst[j+1]:\\n lst[j], lst[j+1] = lst[j+1], lst[j] \\n \\nprint(bubble([4, 2, 1, 5, 3]))'
You see that in the middle of the word bubble, there is a character with the code \u200b.
This is exactly the zero-width space. It can be used for soft hyphenation on web pages and also at the end of lines.
It is not uncommon for this symbol to appear in your code if you copy it from the well-known stackoverflow.com site.
Detect Non-Printable Characters Examples
The same problematic invisible characters can be, for example, left-to-right and right-to-left marks.
You can find these characters in mixed text: English text (a left-to-right script) and Arabic or Hebrew text (a right-to-left script).
One way to see all non-printable characters is to use special text editors. For example, in Vim this is the default view; you will see every unprintable symbol.
Let’s look at another example of code with an error:
a = 3
b = 1
c = a — b
File "<ipython-input-11-88d1b2d4ae14>", line 1
c = a — b
^
SyntaxError: invalid character in identifier
In this case, the problem symbol is the em dash. There are more than five types of dashes. In addition, there are hyphenation signs and various types of minus signs.
Try to guess which of the following characters will be the correct minus:
a ‒ b # figure dash
a – b # en dash
a — b # em dash
a ― b # horizontal bar
a − b # minus
a - b # hyphen-minus
a ﹣ b # small hyphen minus
a - b # full length hyphen minus
These lines contain different Unicode characters in place of the minus, and only one line does not raise a SyntaxError: invalid character in identifier when the code is executed.
The real minus is the hyphen-minus character in line 6. This is a symbol, which in Unicode and ASCII has a code of 45.
You can check the character code using the ord() function.
However, if you suspect that one of the minuses is not a real minus, it will be easier to remove all the minuses and type them from the keyboard.
Below are the results that the ord() function returns when applied to all the symbols written above. You can verify that these are, indeed, all different symbols and that none of them is repeated:
print(ord("‒"))
print(ord("–"))
print(ord("—"))
print(ord("―"))
print(ord("−"))
print(ord("-"))
print(ord("﹣"))
print(ord("-"))
8210
8211
8212
8213
8722
45
65123
65293
By the way, the ord() function from the zero width space symbol from the bubble sort example will return the code 8203.
Above, you saw that this symbol’s code is 200b, but there is no contradiction here. If you translate 200b from hexadecimal to decimal, you get 8203:
ord("")
8203
More Non-Printable Characters Examples
Another example of a problematic character is a comma. If you are typing in Chinese, then you put “,”, and if in English, then “,”.
Of course, they differ in appearance, but it may not be easy to find the error right away. By the way, if you retype the program on the keyboard and the problem persists, try typing it in the US English layout.
The problem when typing can be, for example, on Mac OS when typing in the Unicode layout:
lst = [1, 2, 3]
lst += [4,5,6]
File "<ipython-input-16-2e992580002d>", line 2
lst += [4,5,6]
^
SyntaxError: invalid character in identifier
Also, when copying from different sites, you can copy the wrong character quotation marks or apostrophes.
Still, these characters look different, and the line inside such characters is not highlighted in the editor, so this error is easier to spot.
Below are the different types of quotation marks. The first two lines are correct, while the rest will throw SyntaxError: invalid character in identifier:
st = 'string'
st = "string"
st = ‘string‘
st = `string`
st = 〞string〞
st = "string"
st = ‟string‟
Another symbol worth noting are brackets. There are also many types of them in Unicode. Some are similar to legal brackets.
Let’s look at some examples. The top three are correct, while the bottom three are not:
tpl = (1, 2, 3)
lst = [1, 2, 3]
set_ = {1, 2, 3}
tpl = ⟮1, 2, 3⟯
lst = ⟦1, 2, 3⟧
set_ = ❴1, 2, 3❵
Another hard-to-find example is the wrong colon character. If the colon is correct, then many IDEs indent automatically after newlines.
The lack of automatic indentation can be indirect evidence that your colon is not what it should be:
for _ in range(3):
print("Ok")
File "<ipython-input-25-6428f9dbfe4c>", line 1
for _ in range(3):
^
SyntaxError: invalid character in identifier
Here’s more Python support:
- 9 Examples of Unexpected Character After Line Continuation Character
- 3 Ways to Solve Series Objects Are Mutable and Cannot be Hashed
- How to Solve ‘Tuple’ Object Does Not Support Item Assignment
- ImportError: Attempted Relative Import With No Known Parent Package
- IndentationError: Unexpected Unindent in Python (and 3 More)