This section is about using loops to look though every letter of a string. For example, finding the first z, or counting how many z’s there are. But it’s really about using loops and indexes to look through any kind of list. Searching strings is just an easy way to practice.
Once we know how to look through letters in a string, we can use the same tricks to search a list of Customers or enemy monsters.
If you remember, way back I wrote a string is made from individual characters. If you
have string w="cowbell"; it’s really a list of 7 characters. w = "cow"+"bell";
is also a list of 7 characters. It doesn’t remember those two parts. Even
""+"cowb"+""+"ell" works out to the exact same list of 7 characters.
The characters in a string are numbered, starting from 0. Here’s the traditional picture of some strings with the numbering underneath:
"cat" is obvious – it has three letters, numbered 0, 1, 2. I mixed things up in
poem, but it’s just 14 characters, numbered from 0 to 13. Spaces, dashes, and
numbers all count as 1 character each.
To look up one character, use the number, with square brackets around it:
The official name for the number in square brackets is an index.
Most computer numbering starts at 0. We usually call that a zero-based index. Here’s a fun list of stuff that happens automatically because we start from zero:
After reading that, it seems like it would have been easier to start from 1, but you get used to 0, and it does work out.
Obviously, indexes are always ints. You have to pick out one character (what
would w[1.5f] even be?)
Using an index really gives us a character. We can assign w[0] to a character, or compare it to one. If you remember, the name is char and they use single-quotes:
Indexes always start at 0. There’s no way to tell a string to start numbering itself at 1. Here’s an example where I try to trick the computer (but it doesn’t work):
Using indexing can give us an exciting new error. w="cat" has indexes 0, 1, 2. What should happen when someone uses w[3] or w[50]? It should be some sort of error. But w[3] isn’t wrong. It’s the correct way to ask for the 4th letter, and w could have had four letters. Maybe it will by the time we look.
So, we can’t give an error ahead of time, only when we come to the line and have a problem. That’s a run-time error, which we haven’t seen much before. This program crashes:
The error message is IndexOutOfRangeException: Array index is out of range. That’s not too bad, especially since now we know an index is the number in the []’s.
Using a negative index gives the same error. You’d think the computer would
know ahead of time that -1 is never legal. But it’s simpler to have off-the-edge in
either direction be the same error.
This is the first place we’ll be getting regular run-time errors, so we should know the funny way Unity handles them. In a regular program, a crash quits the program – tablets go back to the screen, PC programs stall, shut off and bring up the “submit a report” window. If you build and Release a program made with Unity, it crashes that way, too.
But in the Editor it tries to be friendlier. It gives the red error and quits out of Update or Start, but then keeps running more Updates. For example, this will print A, but never B, spraying errors:
Each time it hits w[3] it crashes, skipping the B, but then runs Update again. So don’t let a blast of red errors scare you – it’s just the same run-time error over and over.
This is another of those things that uses a real rule for now it’s magic. You can find the length of a string w by using w.Length. Examples:
The most clever and useful trick with indexes is that you can use variables and formulas. We already knew you could do that with other numbers, but the way it works with indexes is just so extra clever.
These two examples use an all-number equation as the index. They aren’t good for anything, except as examples:
These next three are more realistic, using an actual variable, num, for the index. Even cooler, since num is a variable, we can change it in-between lookups:
This is really no different than the rule “any place that takes an int, can use
anything that works out to be an int.”
The most common trick using a variable index is in a loop. Suppose w has length ten. It’s numbered 0 to 9. If we make a loop counting 0 to 9, we can use the loop variable to look at each position:
There’s one thing left to make it perfect – we should look up the current length of w and use that to end the loop:
It’s easy to be off by 1, so let’s double-check the math: suppose w is "goat". That means w.Length is 4, and w is numbered 0 to 3. The loop will run while i<4 and counts 0,1,2,3. It exactly hits every index it should. Everything seems to check out.
It’s good to check the weird examples, too: what if w is just "a". The loop runs while i<1. That’s once, with i=0, which is the ’a’. It’s perfect.
This is going to be our standard look-at-every-letter loop.
We can fill in the body of our basic loop to do a few things. I’m going to write these as functions because I think they look nicer.
We can count how many copies of a letter are in a string. I think it makes sense here to have the letter be a char input:
The loop driver is showing me every letter, in w[i]. Inside the loop,
if(w[i]==countMe) is asking “is the current letter the one we want?”
We can use the loop to make a new string from the old one, one letter at a time. This example adds a slash after each letter:
The same idea can reverse the string. We just add each letter to the front of our answer. I’m adding a testing line, so we can watch it work:
That seems pretty impressive for such a short loop.
A note: this returns a reversed copy. We’ve seen this sort of function before. If we
want to change a string to be backwards we’d use it like ani=reverse(ani);.
Removing a letter can be done by making a 1-letter-at-a-time copy, but using an if to skip the letter we don’t want. We don’t really skip it, we just don’t copy it over:
remove("elephant ear", ’e’ ); would give us "lphant ar". It also works if the letter isn’t there – it copies everything with no changes. remove("abc", ’X’) gives you back "abc".
Removing spaces is fun (spaces are perfectly good characters). remove("in a tent", ’ ’) gives us "inatent".
If we want to remove two letters, we can use it twice:
Very similar to remove is doing a replace. We still check for that letter, but instead of skipping it, we add the replacement letter. Since the loop body uses w[i] twice, I’m copying w[i] into a temp character variable:
Once we know about indexes, it seems natural to use them as inputs and outputs. For example, removing the 3rd letter from a string, or finding the position of the first ’a’.
Everything will use zero-based indexes. If something tells us a letter is at 1, that
means it’s the second letter. If we want to remove the 5th letter, we’ll send it a
4.
This function removes a letter from whatever index you say. It does the usual trick of building a string from letters, skipping one:
The if is the interesting part. Before, we compared letters, using w[i]. Now we’re comparing index numbers, so just using i.
Some fun uses: w=removePos(w,0); gets rid of the first letter of w. And
w=removePos(w, w.Length-1); gets rid of the last. w.Length-1 is the mini-formula
for the last letter.
Here’s a version which removes a range. The first number is the start index, the second is how many to remove. I’m doing it that way, since everyone else does.
The math for the ending index is fencepost stuff. removeRange(w, 3, 4) says to remove 4 things, starting at index 3. That works out to 3, 4, 5 and 6. The formula for the end index to skip works out to start+howMany-1 (testing: 6 is 3+4-1). The function:
It’s the same as removing one thing, except the if checks for a range. Notice how it’s a rare “not-in-range” test.
Some examples:
For fun, here’s another way to write removeRange, using two loops. Loop one adds everything before the range. Loop two adds everything after:
I like this because you have think about not being off by 1. The first loop uses
<startIndex, since we don’t want that one. The second loop begins at endIndex+1,
since we don’t want the letter at endIndex either.
We can also write a function for the opposite thing – getting only letters in the range. That’s officially called a substring. This will loop only over the indexes we want:
The loop from startIndex to endIndex is obviously right, but only because we’ve been using that math a lot.
This one can crash on bad indexes. If the start is negative, it crashes right away.
If howMany is too big it goes off the end and crashes.
A different type of function searches for a letter and tells us the position, as an index. If a word has 10 letters, it tells us either a 0-9 or -1 for not found. It cheats using the return-from-middle trick when it finds one. The function:
It looks funny seeing return -1; at the end. But, it’s like any other
return-from-middle – the answer is -1 only if we didn’t find our letter and quit
early.
Here’s the same thing, but not using return-from-middle. The condition says to stop when we find it or get to the end:
ans=-1; is for the fall-through trick. If we never find the letter, ans stays at -1. The loop test also needed to quit when ans got a value (otherwise we’d keep going and return the last instead of the first).
This is more awkward, which is why people use return-from-middle.
We can improve find-the-first by adding the ability to say where to start looking – find the first ’e’ past index 4. All we do is start at whatever index they tell us:
A fun use of this is finding the second place something is in a string. Find the first one starting from 0, then look again from just past there:
To find the last ’e’ in a string we can use a backwards string loop. This is our first backwards loop, but it’s not as exciting as we’d hope:
It’s nice to double-check these. w.Length-1 is the last one. And it runs i>=0, which means it hits 0. So it hits exactly last to first.
I’d like to check if one string starts with another, for example if "theater" starts with "thi" (almost, but not quite). We can do that using a loop which walks through both strings at the same time.
The loop will run 0, 1, 2, comparing the matching letters. To make it simpler, I’ll say the strings have to be in order: longer one first, then the shorter one to check:
The new thing here is using the same index for both strings. In two side-by-side
lists it’s a common trick. But you have to be careful you don’t run off the end of the
shorter one.
We can make a loop to check whether a string has the same two letters in a row: for examples "ballet" (yes) and "elephant" (no). We’ll compare each letter to the one after it, using w[i+1]. This is our first real use of index math.
We have to stop at the second-to-last position, since we can’t compare the last letter to the one after it:
We can do this another way. Instead of comparing to the letter after i, we can compare to the letter before it, We shift the loop by +1: start on the second letter and go all the way to the end:
A funny thing is that both versions compare the same sequence: 0 and 1, 1 and 2
…. But they “feel” different, and easier-to-read code is important (I prefer the i-1
version: skipping 0 is easier, and when you see a loop starting at 1 you know
something funny is happening).
This next one is very different. A palindrome is the same forward and backward. The fun ones are sentences, like “Madam, I’m Adam” (if you remove spaces and punctuation and ignore upper/lowercase). But it’s really anything where the back half is the front half reversed: abcdcba. Notice there’s only one d. When the length is odd the middle letter can be anything.
The plan is to make two counters, at the first and last letter. A loop will compare them, then move both towards the middle. We quit when they cross. We’ve never written a loop like this, but the plan sounds good:
This is the first loop we’ve ever written where 2 things in the test are changing at once.
Adding a test print is a nice way to check (especially when we get wrong answers). Let’s try printing the letters, and the next indexes:
Running lets us watch it. It might help us find bugs, or to realize our idea was just wrong:
If we hate the double-moving-vars plan, we can rewrite it as a regular loop. We’ll go from 0 to the middle, using a formula to find the opposite position:
It’s comparing the exact same spots as version 1. I even re-used i2. Formulas like
(w.Length-1)-i are easy to get wrong, especially to be off-by-one. I usually write
out a test string with numbering.
Here’s one last loop with even trickier index math. It checks whether a string ends with a certain word. Here’s a picture of how we line up "edwinton" and "ton" to check for a match:
We obviously want a simple loop going through "ton", comparing matching letters. The problem is the indexes in the words aren’t the same – they’re all off by 5. We need to compute the “shift” and then compare using it.
It takes a lot of thinking, but the off-by is the difference in their lengths: 8 for ”edwinton” minus 3 for ”ton” means we skip the first 5 letters. Otherwise it’s like the startsWith loop:
This is two tricks in one: using a single index for two arrays, and doing math inside of the []’s. We should watch how it runs with debugging lines:
A sample test: