Char array vs string in C

C语言中使用字符数组保存字符串,null('\0')字符表示字符串的结束。也就是说在C语言中字符串是以一个以null('\0')字符结尾的字符数组。例如:

char label[] = "Single";
在内存中的保存形式如下[1]:
------------------------------
| S | i | n | g | l | e | \0 |
------------------------------

C语言strlen(s)方法是利用字符串结尾字符null('\0')来计算字符串的长度。例如
char label[10] = "Single";
在内存中的保存形式如下[1]:
------------------------------------------
| S | i | n | g | l | e | \0 |   |   |   |
------------------------------------------
最后3个字节没有用上,仍属于字符数组,但是却不属于字符串。
	char label[10] = "Single";

	printf("len = %d\n", strlen(label) );
	printf("size = %d\n", sizeof(label) );
Result:
len = 6 
size = 10

对于字符串,

	char* label = "Single";

	printf("len = %d\n", strlen(label) );
	printf("size = %d\n", sizeof(label) );
Results:

len = 6 
size = 7




字符数组和字符串差别不仅仅只有以上几点,它们不同初始化方法, 所分配的内存位于程序内存空间的不同区域. 例如:

	char *c = "abc";
	c[1] = 'a';
能通过编译,程序运行时却崩溃了。但是下面的代码却一切正常。
	char c[]="abc";
	c[1] = 'a';


在多任务操作系统中的每一个进程都运行在一个属于它自己的内存沙盘中。这个沙盘就是虚拟地址空间(virtual address space),在32位模式下它总是一个4GB的内存地址块。这些虚拟地址通过页表(page table)映射到物理内存,页表由操作系统维护并被处理器引用[2][3]。下图是一个Linux进程的标准的内存段布局[2]:



程序进程使用的内存一般分为 代码段(Code or Text),只读数据段(RO Data),已初始化数据段(RW Data),未初始化数据段(RSS),堆(heap)和栈(stack). 代码段、只读数据段、读写数据段、未初始化数据段属于静态区域,而堆和栈属于动态区域[4]。 字符串和字符数组的不同初始化方法,它们所分配的内存位于程序内存空间的不同区域。 例如[4]:

const char ro[] = { "this is read only data" }; //只读数据区
static char rw_1[] = { "this is global read write data" }; //已初始化读写数据段
char BSS_1[100]; //未初始化数据段
const char *ptrconst = "constant data"; //字符串放在只读取数据段

int main()
{
	short b; //在栈上,占用2个字节

	char a[100]; //在栈上开辟100个字节, 它的值是其首地址

	char s[] = "abcdefg"; //s在栈上,占用4个字节,"abcdefg"本身放置在只读数据存储区,占8个字节

	char *p1; //p1在栈上,占用4个字节

	char *p2 = "123456"; //p2 在栈上,p2指向的内容不能改,“123456”在只读数据区

	static char rw_2[] = { "this is local read write data" }; //局部已初始化读写数据段

	static char BSS_2[100]; //局部未初始化数据段

	static int c = 0; //全局(静态)初始化区

	p1 = (char *) malloc(10 * sizeof(char)); //分配内存区域在堆区

	strcpy(p1, "xxxx"); //“XXXX”放在只读数据区,占5个字节

	free(p1); //使用free释放p1所指向的内存

	return 0;
}



栈区(stack)内存—由编译器自动分配释放,存放函数的参数值,局部变量的值等。其操作方式类似于数据结构中的栈。 栈区内存可用于保存 函数内部的动态变量,函数的参数和函数的返回值。在函数调用时,第一个进栈的是主函数中后的下一条指令(函数调用语句的下一条可执行语句)的地址,然后是函数的各个参数,在大多数的C编译器中,参数是由右往左入栈的,然后是函数中的局部变量。注意静态变量是不入栈的[5]。


堆区(heap)内存的分配和释放是由程序员所控制的,程序结束时由操作系统回收。 使用方法:C中是malloc函数,C++中是new标识符[5].


由于程序内存空间各存储区域功能的差别,不同内存区域所允许的操作不同。在编程时,如不注意它们的差别,会引发编译或运行错误,例如:

int main(){
    char *pa = "Hello, world."; 
    return 1;
}
文本字符串"Hello, world."存储在代码区,不可修改[7]。字符指针pa保存在栈区,可以修改。但是如果试图通过pa修改字符串内容,代码编译正常,但程序运行时会引发异常[6], 例如:
int main(){
    char *pa;
    pa = "Hello, world.";
    pa[2]='a';
    return 1;
}
为了避免这类运行异常,可把pa看成指向常量的指针,将代码改写成:

const char *pa  = "Hello, world.";
这样如果试图修改pa的内容,在编译时即可报错:
you cannot assign to a variable that is const

如果使用字符数组,例如:

char c[] = "abc";
是在栈顶分配4个字节,分别在这四个字节存放'a','b','c','\0'。 栈区内存允许修改,因此上述修改字符数组的内容的代码编译运行正常。


初始化:

如果在初始化char array时字符串的长度大于字符数组声明的长度,例如
	char label[10] = "SingleSingle123213213121";

	printf("len = %d\n", strlen(label) );
	printf("size = %d\n", sizeof(label) );
编译时会触发"array bounds overflow"错误


char array在单独声明时必须要使用常量指定数组长度,例如:

const int len = 10;
        //...
	char label_array[len];
如果不指定数组长度

char label_array[];
编译时会触发"unkonw size" 错误


如果使用变量,
	int len = 10;
	char label_array[len];
编译时会触发 "error C2131: expression did not evaluate to a constant note: failure was caused by non-constant arguments or reference to a non-constant symbol"



下面是用于测试字符数组和字符串之间差别的代码

//
//  Char* operation
//

#include <stdio.h>
#include <string.h>
#include <stdlib.h>


/*从字符串的左边截取n个字符*/
char * left(char *dst,char *src, int n)
{
    char *p = src;
    char *q = dst;
    int len = strlen(src);
    if(n>len) n = len;
    /*p += (len-n);*/   /*从右边第n个字符开始*/
    while(n--) *(q++) = *(p++);
    *(q++)='\0'; /*有必要吗?很有必要*/
    return dst;
}


/*从字符串的中间截取n个字符*/
char * mid(char *dst,char *src, int n,int m) /*n为长度,m为位置*/
{
    char *p = src;
    char *q = dst;
    int len = strlen(src);
    if(n>len) n = len-m;    /*从第m个到最后*/
    if(m<0) m=0;    /*从第一个开始*/
    if(m>len) return NULL;
    p += m;
    while(n--) *(q++) = *(p++);
    *(q++)='\0'; /*有必要吗?很有必要*/
    return dst;
}


struct Card{
	char*  _pData = NULL;
	int _fieldNumber;
	int _fieldSize;
	bool _alive;

	void show()
	{
		if (NULL != _pData)
		{
			printf("data is: %s \n", _pData);
		}else{
			printf("data pointer is NULL. \n" );
		}
		return;
	}

	char* get( int index ){
//		char result[9];
//		int startLocation = (index*_fieldSize + 8) - 8;
//		mid(result, _pData, _fieldSize, startLocation );
//		return result;
		//memset(buffer,9,sizeof(buffer))

		if (NULL == _pData)
		{
			printf("data pointer is NULL. \n" );
			return NULL;
		}

		char* result =  (char *)malloc(_fieldSize+1);
		int startLocation = (index*_fieldSize + 8) - 8;
		mid(result, _pData, _fieldSize, startLocation );
		return result;
	}

	void pushEnetry( char* entryData )
	{
		 int entryLen = strlen(entryData);
		 int len = 0;

		if (NULL != _pData)
		{
			len = strlen(_pData);
		}

		 printf("pushEnetry: len = %d \n", len);

		 char *theEntry = (char *)malloc((_fieldSize)* sizeof(char));
		 {

		 }

		 int newLen = len + entryLen;

		 char* newpData =  (char *)malloc((newLen+1)* sizeof(char));

		char *p = entryData;
		char *q = newpData;
		char* oldData = _pData;
		 while( len-- )
		 {
			 *(q++) = *(oldData++);
		 }

		 while( entryLen-- )
		 {
			 *(q++) = *(p++);
		 }
		 *(q++)='\0';

		 printf("newpData = %s \n", newpData);


////
//		 memcpy(newpData, _pData, len * sizeof(char));
//
//		 //memcpy(&newpData[len], entryData, entryLen * sizeof(char));
//
//
//		 newpData += len;
//
//
//		  newpData[newLen]='\0';

		if (NULL != _pData)
		{
			free(_pData);
		}

		 _pData = newpData;

	}
};


void test_1()
{
	char* text = "1234567890abcdefghijklmn";
	printf("string is: %s \n", text );

    char* result_1 =  (char *)malloc(8);   // = new char[8];
    mid(result_1, text, 7, 5 );
	printf("string is: %s\n", result_1 );
}

void test_2()
{
	Card myCard;

	char* text = "1234567890abcdefghijklmn";
	//myCard._pData = text;
	myCard._fieldSize = 8;
	myCard.show();


	char* result;
	result = myCard.get(0);
	if (NULL != result)
	{
		printf("string is: %s\n", result);
	}

	char* foo = "foo";
	myCard.pushEnetry(foo);

	// myCard.show();

	result = myCard.get(0);
	if (NULL != result)
	{
		printf("string is: %s\n", result);
	}

	return;

}


void test_3(char pa[])
{
	printf("test_3: len = %d\n", strlen(pa) );
	printf("test_3: size = %d\n", sizeof(pa) );
	return;
}



void printArray(int data[], int length)
{
//    for(int i(0); i < length; ++i)
//    {
//        std::cout << data[i] << ' ';
//    }
//    std::cout << std::endl;
}


const int len = 10;
int main()
{
	test_1();
	test_2();
	return 0;
}
 


[8]对字符数组和字符串的差别做了较为详细的解释:

"Okay, I'm going to have to assume that you mean SIGSEGV (segmentation fault) is firing in malloc. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array's bounds. This is usually nowhere near the point where you call malloc."


"malloc stores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next time malloc is called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn't and starts interpreting random memory as part of the heap. Eventually its luck runs out."[8]

"Note that free can have the same thing happen, if the block being released or the free block list is messed up."[8]

"How you catch this kind of error depends entirely on how you access the memory that malloc returns. A malloc of a single struct usually isn't a problem; it's malloc of arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault."[8]

"So the first thing to try is range checking. You mention that this appeared at the customer's site; maybe it's because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven't tested. It's hard to say without more specifics. You should consider writing something to sanity check your input data."[8]

 

References:
[1] https://www.cs.bu.edu/teaching/cpp/string/array-vs-ptr/
[2] http://duartes.org/gustavo/blog/comments/anatomy.html
[3]http://www.cnblogs.com/lancidie/archive/2011/06/26/2090547.html
[4] http://jingyan.baidu.com/article/4665065864601ff549e5f8a9.html
[5] http://blog.csdn.net/codingkid/article/details/6858395
[6] http://www.cnblogs.com/nzbbody/p/3553222.html
[7] http://www.cnblogs.com/dejavu/archive/2012/08/13/2627498.html   

[8] http://stackoverflow.com/questions/7480655/how-to-troubleshoot-crashes-in-malloc

[9] http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory/

相关文章
相关标签/搜索